More analysis and more data are needed to predict how different dies will interact in the same package.
Managing thermal and mechanical stress in multi-die assemblies will require a detailed knowledge of how and where a device will be used, how it will be packaged, and where stresses could cause problems at any point during its expected lifetime.
This includes everything from workload-dependent thermal gradients to mechanical and electrical stress, which may become more pronounced over time with aging effects such as electromigration and dielectric breakdown. Current state-of-the-art GPUs run at about 500 watts, but that number could climb to 1,000 watts/cm2, making it particularly difficult to dissipate heat due to higher utilization of transistors in AI applications. This, in turn, can cause mechanical deformation — warping, cracking, and delamination — due to thermal mismatch between materials.
In the past, thermal modeling and management often were separate tasks, tied to circuit design and computing architecture, but in multi-die assemblies they need to be addressed together.
“The stress problem that most everybody has been focusing on is the reliability issue,” said John Ferguson, senior director of product management for Calibre 3D IC solutions at Siemens EDA. “This mostly comes from manufacturing, when chips are put together, heated and then cooled, and all these materials expand and contract at different rates. It causes delamination, which can pull them apart. The connections then can be lost, which is a real issue.”
Electrical stress adds another challenge. “We’ve known since 90nm that even small stresses — like how close your gate is to the edge of the diffusion, or to the edge of the well — will impact the electrical behavior, which ultimately can impact timing potentially,” Ferguson said. “Now you’ve got all of these extra stresses that are from different materials, and you’re drilling holes through chips. You’ve got bumps and all kinds of features that expand and contract at different rates, even if they don’t change in use —although they probably do, to some extent. The question is whether that’s in the noise or not? We don’t know yet, but we do know some stresses morph the crystal lattice in the silicon. Some electrons are closer, some are further apart. That messes everything up. A lot of the effort going forward will be how to learn from this and create new design rules. You may not always have to do extensive analysis of it, but right now we need to understand what we have.”

Fig. 1: 2.5D and 3D-IC. Source: Siemens EDA
Another challenge is that stresses in a multi-die system are interdependent. Mechanical stresses can impact thermal stresses, and vice versa. “When the semiconductor is being manufactured, during the assembly phase, you’re going to build your substrate, then you’re going to put another die on top, solder it at a high temperature, and then cool it down,” said Lang Lin, principal product manager at Ansys, now part of Synopsys. “That process is repeated for each layer. During this process, the system goes through this thermal cycling event, which stretches the materials, building stress into the whole system. There has to be a limit there. If you keep stretching it, you’ll break it, so one question we need to answer is whether the 3D-IC is tolerating the stress during the manufacturing thermal cycling event.”
Accounting for this requires more intensive modeling. “The whole manufacturing process can be simulated with a model, and the temperature can also be a parameter in the system,” Lin said. “This means we can build a stress model, which is a dynamic model, such that from step 1 to step 100, you will see the elapse of the stress behavior of the system.”
Foundry data also needs to be factored in. EDA tool providers work closely with the foundries to provide that data to design teams in the form of PDKs, but data about stress is a recent addition.
“Stress is a big topic now for the foundries,” Lin noted. “Every foundry recognizes the importance of considering stress and warpage analysis in the manufacturing process stage. They provide us guidance, as well as material properties, for example. They show how to model each of the components, such as the temperature cycling sequence. The more data we get from the foundry, the more we can make the simulation correlate to the digital twin of the manufacturing machine.”
Some of this modeling is quite detailed. For example, Ansys recently released a feature that shows anisotropic thermal conduction where one layer is horizontal, the next one vertical. The heat flows along these in different directions, unlike an isotropic design where conduction is uniform.
![]()
Fig. 2: Global simulation (top), and thermal simulation for die (bottom left) and interposer (bottom right). Source: Ansys/Synopsys
There are many other sources of stress. “One is the manufacturing and assembly process, in which the thermal cycles happen,” said Amlendu Shekhar Choubey, senior director of product management at Synopsys. “There are different materials involved in different ways, and some of these impacts can’t be recovered from as they do not go back to their original state, so it all needs to be taken into account in the design process. Also, the structural integrity of the whole stack must be considered. You have hybrid or solder bonds. Do they break because of the expansion coefficients of different materials? Do they get misaligned? When you are doing your design, how do you take all this into account and provide enough margin that, after all this process, the device has structural integrity?”
After the assembly process, the material and device behavior must be modeled to determine how the devices will behave under stress. “Within the variation, there is a distribution of all the device properties, let’s say, in standalone silicon, but when it goes through all these cycles, how does that envelope change, and how much margin should you give in your design to take that into account?” Choubey said. “In other words, the manufacturing and assembly process will impact the structural integrity and the device behavior.”
State-of-the-art packaging techniques change and exacerbate the way heat flows out of the IC. Design teams need to pay extra attention and model any additional effects. “Ignoring these effects or improperly modeling them can lead to reliability and performance issues,” said Matt Ozalas, master application development engineer and scientist at Keysight Technologies.
Prior generations of hardware consisted of on-chip heat sources which dissipated through a well-designed backside thermal ground path, usually thermal vias and epoxy, resulting in a predictably low thermal resistance that could be modeled simply as a passive “lumped element” resistor/capacitor network.
“Designers could use straightforward rules of thumb and basic math — for example, in spreadsheets — to get a decent estimate of how their ICs would perform thermally,” Ozalas explained. “3D stacking changes this in two ways. First, the path to dissipate heat becomes much more complex. If an IC is flipped onto a package, the heat flows through bump interconnects, which were not necessarily designed for heat transfer. This means they likely have higher thermal R, and potentially more variation in thermal R than prior approaches, especially when cascaded through multiple vertical levels. Also, the old tried and true ‘simple’ approaches no longer work to model the thermal stress on the chip.”
To further complicate things, there can be new heat sources at different levels in a multi-die stack. As a result, the heat can cascade, and heat sources can interact across technologies. “For example, a heat source on IC 1 impacts the electrical performance of a transistor on IC 2,” Ozalas said. “The impact of these changes is that you now need to do an electrothermal simulation where both electrical circuit performance and thermal performance are considered simultaneously, because this paradigm can no longer be modeled as a simple passive RC network. This is more complex, and it requires additional data from foundries in the form of thermal stack-ups and material characteristics for multiple IC and package technologies, not just the particular IC technology that you happen to be using for your design.”
Thermal stress issues
Thermal stress is a system problem. It starts with one die, spreads to other dies, to the package, to the PCB, and to the system enclosure. And in real 3D-ICs, the challenges are even more difficult to solve.
“Everyone knows that in the past, for thermal analysis, it was more focused on the package, PCB, and system side,” said Albert Zeng, senior software engineering director at Cadence, during a panel at this year’s Design Automation Conference. “But now, because of 3D-IC, the power on a single die becomes so big that all the chip design companies must think about thermal issues,. For example, at the earliest design stage when they’re designing the floor plan or designing the stack for the 3D-IC, they have to run early thermal analysis to find a better system architecture at the very beginning that’s thermal friendly.”
Thermal control involves every level, from the chip to the data center. “People are starting to run more and more thermal analyses to test their thermal management systems to see how they respond,” Zeng said. “As a result, we see the need for thermal analysis more on the chip side, along with transient power analysis. Another trend is that thermal is not just a single effect on the chip bonds. Especially for 3D-IC, you also have thermal-induced stress, so that’s also needed there since there could be a thermal-induced impact on the timing, on the power. So on the chip side, thermal becomes the center of the multi-phasic analysis that all different tools — such as power analysis, timing analysis, and even stress analysis — have to interact with the thermal effects and feedback.”
In any multi-die assembly, stacking two or more active dies results in thermal stress. Heat dissipated from a lower die faces a higher resistance if it is routed through layers of silicon rather than a thin layer of packaging.
“System design and physical partitioning choices would be much more efficient and effective with a thermal modeling tool that the architect could use independently of a packaging expert,” said Rick Bye, director of product management and marketing at Arteris. “It would allow them to make floorplan tradeoffs with locations of hot IP, including their vertical position (i.e., which die), and where on each die to place the hot IP. Mechanical stresses are much more problematic with a 3D die stack-up compared to a monolithic design, with varying overhangs and underhangs between the stacked die, and varying locations of potentially thousands of through-silicon vias (TSVs) connecting the die. Further, the thickness of the die will vary significantly, with the upper die needing to be much thinner than a typical monolithic die to accommodate the TSVs. Similar to thermal modeling, any experienced packaging engineer should be able to effectively model such a system, but what is really needed is a tool to easily enable the device architect to make tradeoffs with physical partitioning that will affect relative die size, die overhangs or underhangs, and TSV locations, to come up with the most mechanically optimized implementation that minimizes stress.”
Data movement across the die-to-die boundaries of a 3D-IC presents many challenges. The designer must avoid the introduction of bottlenecks that constrain performance without burdening the die floorplan with too many area-consuming TSVs. “This requires an interconnect fabric or network on chip (NoC) IP design tool that is multi-die aware, to enable the architect to make high-level tradeoffs between different physical and logical partitions, and different die-to-die connectivity options,” Bye said.
Device properties impacted by stress
Once a device is in the field and operating, different die will heat at different rates. And because of the materials involved, as well as different workloads, their response to that heat will be different.
“Even after the manufacturing process, you have the same risks,” Synopsys’ Choubey said. “This means the design team needs to make sure this thermo-mechanical stress does not cause any structural problems, and that the contacts stay intact. They also need to understand that different die will see different temperatures and different stresses, which will have an impact on the properties of the devices that are in those die. Modeling the impact of temperature is not that difficult. We have been doing that for a very long time. Even in monolithic design, an accurate simulation helps a lot, because then you know the exact temperatures you need to simulate for, and that keeps the number of corners manageable.”
All forms of stress have an impact on device properties. “Let’s say you have a couple of dies in the stack, and one heats more than the other,” Choubey said. “If their thermal expansion coefficients are different, those dies will see different stress. That stress, on top of the temperature, can impact the behavior of the devices and those die. How do we model that? That’s another aspect of stress that is evolving. I have not seen many customers grappling with that, and how to model that. We are still in the process of understanding the impact of temperature and modeling the temperature impact, but stress is another factor that impacts the device properties, and that needs to be modeled and simulated for.”
Mitigating stress in 3D-IC design
Stacking dies is a rapidly evolving aspect of the industry, and a hotbed of activity for EDA tools vendors. All of that will be essential for stacking dies, particularly for full 3D-IC designs.
“Engineers are ingenious and have innovated new and interesting ways to get heat out of heterogeneous packages,” Keysight’s Ozalas noted. “The ability to run electrothermal simulations and perform advanced thermal modeling enables engineers to design more efficient physical components to pull out heat from 3D-IC packages, so via and interconnect structures are constantly improving and evolving. There are also some more exotic approaches. One recent example is microfluidic cooling, where a substance like deionized water is pumped through tiny nozzles and pipes inside of VIA structures to actively pull heat out of the IC at its generation source. This adds even more considerations to the modeling efforts because now you have to account for fluid flow, etc. But such approaches have the potential to be more efficient than passive techniques for removing heat without the huge area required for passive structures like heat sinks, because you can remove heat at its source so it does not spread and impact other parts of the design.”
William Wang, CEO of ChipAgents, agreed that 3D-IC introduces new stress factors beyond conventional 2D designs, and that the new stress factors for 3D-IC include thermo-mechanical strain from stacked dies, inter-die latency, and coupling challenges due to TSVs and micro-bumps, and additional timing/power closure complexity across layers. “These stresses complicate design verification at the RTL level, since early models rarely capture downstream reliability issues like warpage or TSV cracking. From a tool perspective, EDA vendors are adding thermal- and stress-aware placement, extraction, and multi-die signoff flows, but they rely heavily on accurate foundry data such as material properties, TSV parasitics, and stress-aware compact models. For front-end teams, this means abstract die-to-die timing/power models must be integrated into RTL simulation and verification.”
AI is playing a much bigger role here, as well. New tools by ChipAgents and others can accelerate analyses by auto-generating stress-aware testbenches, correlating RTL intent with physical effects, and suggesting partitioning strategies that minimize cross-die stress.
Conclusion
Multi-die assemblies are inevitable, given reticle limits and the incessant demands for more performance to process more data faster. Various types of stress, individually and together, are likewise inevitable.
“There are thermal packaging engineers who have dealt with it for years, but they tend not to be in the core of the chip design world, which now must worry about thermal throughout the design, and early on in the floor-planning stages,” noted Marc Swinnen, director of product marketing at Ansys, part of Synopsys. “How are you going to distribute these chips? How are you going to place them? You need to have some reasonably good thermal data early on, or you may end up in a pathological situation that you can’t fix at the end, and you have to go back to the start and redo your whole partitioning because you messed up the thermal. So it’s not a last-minute packaging step of, ‘Is everything okay?’ It’s part of the design flow. And it’s a physics they’re not familiar with. There’s more data from the foundry required, so that definitely is a learning curve. On the other hand, these tools are mature, and they work pretty well.”
Things will get even trickier with chiplets, Siemens EDA’s Ferguson added. “If you have two of the same chiplet and you put them into a 3D package, they might not behave the same because they have different stresses, they have different temperatures, or they have other impacts. Just because it worked by itself doesn’t mean it fits in any scenario. You must be aware and be guard-banding to make sure they’re where you need them to be.”
Leave a Reply