Heterogenous integration is pushing chip and package designers to consider multi-physics effects as early as the initial architectural planning stage; new tools may be needed.
Thermal and mechanical stresses are creating significant challenges in heterogeneous chiplet assemblies, increasing the time and cost required to work through all the possible physical effects, dependencies, and interactions, and driving demand for new tools.
Unlike in the past, when various components were crammed into a planar SoC on a relatively thick substrate, the new substrates are being thinned out to reduce the distance that signals must travel. That, in turn, has reduced the effectiveness of silicon substrates to dissipate heat, causing warpage and uneven heating and cooling due to lattice mismatch, which stresses the interconnects and makes it difficult to maintain contact across many thousands of micro-bumps. The result is reduced performance, yield, and much more complex system design.
“Stress, as it is coming up now in the context of multi-die design, is relatively new,” said Sutirtha Kabir, R&D director at Synopsys. “But mechanical stress is not new. In the simplest case of packaging and PCBs, they bond with BGA balls. In the interposer and package, the I/O is C4 bumps. Then, during the manufacturing process, solder reflow, and all the mechanical things that happen, the system is subject to stress. You expect every layer, the package, and the PCB to be planar. But during the manufacturing process that doesn’t hold, so you will get stress effects. These are mostly assembly stress effects/packaging-related stress effects. As you are building things up, for instance, you have the PCB, then the package, and that stress is going to come in simply because of the manufacturing process. That has always been there. Then, whichever company manufactures the package/PCB, whether an OSAT or a foundry, will tell the design team about this as a constraint.”
Work with foundries has shown a growing awareness of the need to model the bending and the warping of these substrates.
“It’s inevitable. If you take a piece of metal, make one end warm while the other one stays cooler, it’s going to differentially expand, and it’s going to bend,” noted Marc Swinnen, director of product marketing at Ansys. “It’s nature’s reaction to differences in temperature or difference in expansion. But then the question is, does that introduce stresses, since stresses have electrical consequences, as well? When chips are made, the transistors are built with built-in strain. A decade or so ago, strain was added as a part of the building of the transistor, because this on-chip strain changes the behavior of the transistor in a beneficial way, so it’s seen as good. But now, you’re applying strain to your chip from external sources, i.e., thermoelectric strain. Doesn’t that change the resistance on the wire? Doesn’t that change the behavior of the transistors? Yes, it does, but that loop isn’t closed. We can see that thermal creates stresses, the stresses cause bending, and that has an impact on transistor behavior. This is why the foundries now want to close the loop to say, ‘Since you can calculate the stress, here are the formulas for telling us how the resistivity and other parameters change on these chips. You can re-simulate and see if the chip still works if you push on it.’”
Fig. 1: Top left: Ansys schematic, showing linked subsystems that couple thermal and structural models. Top right: package warpage at 20°C, dead-bug view (looking at the bottom of the substrate). Bottom right: package warpage with “power-on” thermal gradient applied, dead-bug view (same color scale as 20°C warpage). Bottom left: temperature profile boundary condition as imported from Icepak. Source: Ansys
Others agree. “When the ICs are heated during operation, the materials expand at different rates, leading to strain that can cause warping, cracks, or delamination,” said Melika Roshandell, product management group director at Cadence. “This can lead to electrical performance, interconnect reliability, and thermal issues.”
The challenges grow as more heterogeneous devices are packaged together, said Andy Heinig, head of Efficient Electronics in Fraunhofer IIS’ Engineering of Adaptive Systems Division. “This is primarily due to the much smaller structures, which also incorporate a larger variety of materials with significantly different coefficients of thermal expansion (CTEs).”
For these advanced 2.5D and 3D packages, chip-package interaction (CPI) and the resulting stress-strain fields must be carefully considered. Today, engineers typically use finite element analysis (FEA) solvers at an early stage of the package design to help with this.
“Regions of high stress may experience reliability issues and reduced lifetime as the likelihood of interconnect failures or even die cracking/delamination increases in these areas,” explained Andras Vass-Varnai, 3D-IC solution engineer at Siemens EDA. “Beside mechanical issues, strain also impacts the electrical behavior of the transistors, as it impacts the mobility of charge carriers in the device, affecting macro parameters such as the threshold Voltage (Vth). For analog designs, this may have an unforeseen impact on the behavior of the circuit if not considered early on. Devices may experience thermo-mechanical stress in operation due to the rapidly changing thermal profiles and the CTE mismatches, leading to increased stress concentrations among different structural layers in the assembly.”
Built-in stress also may develop during the reflow process as the device cools down at the end of the profile, Vass-Vernai said. “This initial stress may have significant impact on the reliability of the package on the long run.”
Cadence’s Roshandell explained that the best way to get ahead of these issues is with simulation software to model the distribution of stress within advanced packages, particularly those with stacked dies. “This helps predict failure modes before manufacturing and avoid costly product failures. Other practices that can help include using materials with compatible CTEs, appropriate bonding techniques, and performing multi-physics analysis to incorporate thermal and stress effects to integrate an efficient heat dissipation solution.”
Multi-physics modeling is becoming essential, given that thermal, mechanical, and electrical effects are increasingly interrelated and interdependent in these multi-chiplet assemblies. Engineers must consider the interactions between these different physical domains, creating a growing need for tools that can simulate these multi-physics effects together.
“When you sell a GPU, a soft IP, before anyone signs a deal they want to know the PPA — the process node, their chosen target synthesis frequency, and all that stuff,” said Dan Wilkinson, technology fellow at Imagination Technologies. “And they want to know it very early on, before you might reasonably be expected to have such data. If you haven’t laid down their process, how would you know? And it comes back to this concern about thermals and power integrity. It’s big performance issue, because if you’re designing a mobile phone, if it’s getting too hot or you’re using too much energy, you’re going to get throttled. The performance will look bad. So the characterization is power. Area is much easier to quantify, and predicting the throughput is much easier. And if you’ve got a cycle-accurate simulator, you can quantify workload throughput. But power is very difficult, and you need to know the power and your thermal envelope.”
Further, to better predict and manage the stress, it is essential to systematically develop and measure a far greater number of test systems. Fraunhofer’s Heinig noted that these tests will form the basis for creating a comprehensive database of materials and simulation parameters. “With sufficient data, much more accurate simulations can be performed before manufacturing, allowing for better stress prediction and optimization of designs,” he said. “Stress will have a particularly significant impact in analog blocks, where precision is critical, but it will also affect other areas, such as when transistors are placed close to through-silicon vias (TSVs).”
Architecture considerations are key
Chiplets almost always add stress to a design. “The wire density, whether it’s actual electrical wires or equivalent bandwidth, means you’re going to have to pay something to maintain that across the chiplet boundary,” said Elad Alon, CEO of Blue Cheetah. “That’s just fundamental physics. Within a given chiplet, you get the 40nm, 100nm, 120nm fine-pitch density of wires that you get across any 2D boundary. But once you’re in this 2.5D or 3D integrated device, you’re now in the microns to tens of microns, maybe even 100-plus microns. If you can run those wires faster and make that overhead as low as you can, at the end of the day, there is a physical reason why it’s not the same thing. So the main thing people have to be cognizant of is, if you’re going to introduce this data interface, it behooves you to do it in a space architecturally where whatever overhead you end up picking up, whether the overhead is higher power or higher area or higher latency, or some mix between those three, you know it’s an appropriate place.”
These considerations need to happen from the outside in, with the outside being the chiplet boundary — even though during the architecture phase the chiplet boundary may not be decided on completely. “That’s the baseline starting point that is worthwhile for people to think about,” Alon said. “Another consideration is that these things really do tend to be much more intimately linked with the details of the SoC itself. This isn’t like the PCIe world where we just say, ‘I’ve got a PCIe interface. It’s fully interoperable. Everything is there, and it works.’ There is a lot of overhead that one pays to do that type of interface and that type of design. What you’re typically doing in a chiplet is saying, ‘This would have been a single SoC if I could have built it that way, but for one reason or another, that doesn’t make sense. That’s not optimal in one way or another.’ And so, when you’re starting to break things up in that way, you’re starting from a point where your overheads were very, very low, and your latencies were very low. You had many wires, and very high bandwidth, and the point is that the more one understands what you’re doing with that bus, what the traffic patterns are, what you can tolerate in terms of latency versus throughput and latency determinism, the more you can engineer the data interface to align as best as possible with that specific application.”
The emphasis as always is on early planning and analysis, but that now includes many more elements than in the past. “The first thing you need to understand is how to partition the system,” said Letizia Giuliano, vice president of IP product marketing and management at Alphawave Semi. “You need to understand how much bandwidth is needed to move data from one die to another one — how to pack all these wires into the same package. This gets into package design and electrical testing of the silicon interposer.”
Stacking more chiplets vertically in a 3D-IC only magnifies those challenges and the possible interactions that can impact stresses inside a package. “The number of chiplets can increase by a lot, and that becomes much more complicated,” Giuliano said.
This is why power delivery networks need to be designed with the whole 3D-IC system in mind. Early prototyping and analysis can help avoid issues later in the design process. System-level considerations are key to all of this, because 3D-ICs behave more like systems than traditional single-chip designs, and stress and strain effects can propagate across different dies and components. This means designers need to consider the entire package, including interposers and PCBs, which was not required for traditional planar devices.
“It is important to consider the behavior of the packages during assembly to a PCB,” said Siemens’ Vass-Varnai. “Advanced packages with high compute power capabilities are usually large area designs with thousands of BGA balls connecting the components to a PCB. As the device cools back, some of the solder balls at the corners/edges may solidify earlier than others, ‘anchoring’ down the component. But due to the warpage of the board and package, the still-molten solder balls may tear off from the PCB, leading to open connections. This phenomenon is called ‘hot tear,’ and makes the final assembly of advanced packages especially challenging,”
Thermal-mechanical strain is frequently discussed in this area in the context of differential expansion, but there are other sources of strain, as well. “There’s strain while you’re assembling the chip,” said Ansys’ Swinnen. “There’s a lot of pressure on these chips when they are squished together because they have micro-bumps. That’s a serious amount of pressure, and your chip could be subjected to acceleration or even mechanical stress, such as someone stepping on the device. How does all that strain feed back? It’s a new multi-physics loop that is being closed as we speak.”
However, the IC designer is unlikely to measure that stress. “It’s a system sort of phenomenon,” Synopsys’ Kabir noted. “How that gets reflected into the design is as a constraint, and you need to mitigate that stress effect. Some of that could be, in the simplest sense, redundancy. Or you must assume maybe 1% of your bumps or bonds will fracture because of warpage the stress on them, and you must take that into account. Another thing that happens that causes the stress and delineation of connections is something called thermal shrinkage. Because of CTE difference, you’re bringing two different materials at the interface, and when temperature is applied on them, they will grow and shrink at different rates, because their coefficients of thermal expansion are different. That’s potentially going to move your bump locations, and what you thought would be a valid connection is not a valid connection anymore. These things happen. They are physical phenomenon, and designers know of ways to mitigate them, which is not necessarily to go measure stress and apply that.”
Are new tools and methodologies needed?
Whether traditional EDA tools are sufficient for 3D-IC stress and strain analysis remains to be seen, but there is agreement that tools are needed to handle multi-scale simulations, from transistor to system level. That requires the integration of different types of analysis tools (thermal, mechanical, electrical). It is unclear if this means additional simulation runs will be required, or if tools will need to be swapped or updated. Another looming question is how all of this impacts the engineering team.
“It’s still early days, but usually these things start off with a threshold, as in, you simulate it with a certain maximum amount of strain, then at the back end you make sure the strain doesn’t exceed what you simulated,” Swinnen explained. “The problem with threshold approach is you must assume the worst case across the board, across every single device, which is highly unrealistic, and it’s very pessimistic. It’s not going to be worst case everywhere. And so usually the methodology shifts over time to, ‘We want a better resolution. We can’t assume worst case. Can you tell us exactly where the strain is so we can then re-simulate now with the real strains back-annotated onto the circuit, and not have to assume worst case all the time but see where it impacts the device?’ Then you get more accurate.”
This is true of a lot of analyses in use today. “Voltage drop was done the same way, and still with that transition, people assumed voltage drop says, ‘Let’s assume that I can tolerate 10 millivolts of voltage drop on my power supply. I simulate everything with a 10 millivolt voltage drop, and then, when I do my analysis, as long as nowhere is over 10 millivolts, I’m fine, because I’ve simulated with worse than that.’ That’s a simple methodology still used a lot,” Swinnen said. “The problem is, it’s expensive. You’ve just sacrificed 10 millivolts of performance and speed right off the top with no evidence that it’s necessary. It’s ‘just-in-case.’ It’s margining. It’s guard-banding. People now say, ‘I want to assume a much lower voltage, say just 5 millivolts. Let me simulate and find the voltage drop in every individual device, and then back-annotate those that have a big voltage drop.’ And then maybe it’s okay. Maybe it’s 12 millivolts, but that’s not a particularly timing-sensitive circuit and, yeah, it slows down, but it’s still fine. You can get much better resolution, much better performance by doing better accuracy. That same methodology will probably happen with strain. First, you have just a general threshold, and then later you can refine that with actual device-by-device or region-by-region analysis that allows you to refine and re simulate.”
Synopsys’ Kabir expects solutions to include a combination of pre-silicon testing coupled with thermal monitors, for example. One of the big challenges there is figuring out where to put the test and thermal monitors. “It also has to bring data back, because maybe the first and second generations, we would see that what we are getting in the lab is actually not what we got with the analysis,” he said. “This means the analysis has to be calibrated and has to continue for some generation of product. The challenge is that every product area is different. Not everything is subject to thermal and thermal-mechanical stress the same way. Mobile might be different than HPC and AI and automotive. The question is, are we even getting there in real time? Otherwise, what happens is you over-design, and you’ll get a lot of margin. That’s how you mitigate. These are huge systems. Can you really do a full system, multi-die, thermal-mechanical stress analysis in a predictable way? Then, you need to do it early enough. In terms of early exploration, how does this all play out? I honestly don’t know, but we’re going to learn.”
Material properties and manufacturing processes also play a crucial role in strain and stress, so working with OSATs and foundries to obtain accurate material data is needed for precise stress and strain modeling. Manufacturing processes can introduce additional stresses that need to be accounted for. And as new packaging technologies emerge, their impact on stress and strain needs to be understood.
Practically speaking, to determine the impact of strain and stress, design teams may observe the extent of die warpage for their designs, striving to keep it below certain limits. “X-ray or other optical techniques can be used to do so, but the impact on reliability is a lot harder to measure, and most designs would go through extensive reliability testing before they are released to customers,” Siemens’ Vass-Varnai said. “These standard tests, such as thermal cycling or thermal shock, are designed to mimic potential environmental exposure and trigger failure modes. Unfortunately, they are costly and take a long time. Therefore, the more one can do virtually, using a digital twin approach, the better. Also, understanding long-term interconnect behavior under different environmental conditions is extensively researched by academia and the industry.”
In short, follow foundry and OSAT guidelines. Work closely with foundries and OSATs to follow their guidelines and leverage their expertise in managing 3D-IC thermal and mechanical issues. By considering these factors early and throughout the design process, engineers can better avoid problematic stress and strain issues in advanced packages and multi-die designs.
Conclusion
To avoid problems with strain and stress in 3D-IC and multi-die designs, industry experts highlight key areas:
Because stress and strain have become critical considerations in advanced packaging design, methodologies are needed for modeling and analysis to ensure reliable and high-performance integrated systems.
“Heterogenous integration is pushing chip and package designers to consider multi-physics effects, such as stress, strain, thermal performance, SI/PI as early as the initial architectural planning stage,” Vass-Varnai noted. “As the design matures, additional verification can ensure that the final product will meet operational and reliability requirements. EDA vendors play a key role as they create and provide connected architect, design and multi-physics simulation and verification tools/workflows to allow package architects to make right decisions regarding processes, floor plans, and material choices early on.”
—Ed Sperling contributed to this report.
Leave a Reply