Engineers are finding ways to effectively thermally dissipate heat from complex modules.
Placing multiple chips into a package side-by-side can alleviate thermal issues, but as companies dive further into die stacking and denser packaging to boost performance and reduce power, they are wrestling with a whole new set of heat-related issues.
The shift to advanced packaging enables chipmakers to meet demands for increasing bandwidth, clock speeds, and power density for high performance compute, AI, and other uses. This change alleviates heat issues by spreading out the chips, but it complicates thermal analysis because hot spots on one chip affect the heat profile of neighboring chips. Interconnection speeds between chips are also slower in modules than they are in SoCs.
“Before the world went to multi-core and so on, you were dealing with one chip that maxed out at around 150 watts per square centimeter on a chip, which was a single-point heat source,” said John Parry, industry lead for electronics and semiconductor at Siemens Digital Industries Software. “You could dissipate that heat in all three directions, so you could go to some pretty high power densities. But when you have one chip, put another chip next to it, and then another chip next to that, they mutually heat each other. That means you can’t tolerate anything like the same power level per chip, which makes the thermal challenge harder.”
This is one of the main reasons 3D-IC stacking has been slow to reach the market. While the concept makes sense from a power efficiency and integration standpoint — and works well in 3D NAND and HBM — when logic is included it’s another story. Logic chips generate heat, and the denser the logic and the higher the utilization rate for processing elements, the greater the heat. This makes logic stacking rare, which explains the popularity of 2.5D flip-chip BGA and fan-out designs (see figure 1).
Fig.1: To meet power density, bandwidth, and thermal dissipation requirements, a high-density VIPack platform includes RDL- and TSV-based interconnection in six architectures. Source: ASE
Choosing the right package
With the plethora of options available to designers, choosing the best package, and integrating the chips inside it, are critical to performance. The components, silicon, TSVs, copper pillars, etc., all have different thermal coefficients of expansion (TCEs), which impacts assembly yield and long-term reliability.
“Generally speaking, if you’re going to be leaving something off for a long time, it can be to your advantage to actually turn it off,” said Steven Woo, Rambus fellow and distinguished inventor. “But if you’re basically going to be turning it off and turning it on at a much higher frequency — for example, every 100th of a second — you could potentially run into a thermal cycling issue. PCBs, solder balls, and silicon are all going to expand and contract at different rates. So it’s not unusual to see thermal cycling failures at the corners of a package, where solder balls can crack. So people might put extra grounds or extra power there, so that if you lose that connection, it’s not going to sink the chip.”
Popular flip-chip BGA packages with CPUs and HBM are currently around 2,500 mm2. “We’re seeing one big die potentially becoming four or five smaller die,” said Mike McIntyre, director of software product management at Onto Innovation. “So in general, things have to grow because you have to have all that I/O so these chips can talk to one another. So you can distribute the heat. And depending on the application, that may help you slightly. But some of that gets compensated for by the fact that you’ve got I/O to drive between the die now, whereas you used to have an internal bus in the silicon doing that communication.”
Ultimately, it becomes a system challenge, with a series of complex tradeoffs that only can be dealt with at the system level. “We can realize a lot of new things with advanced packaging, but designs are much more complex now,” said Andy Heinig, group leader for advanced systems integration at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “We have many more interactions when you make everything so close together. You have to check your flow. You have to check the power distribution. And it makes it very difficult to design such systems.”
In fact, some devices are so complex that it’s difficult to change out components easily in order to customize these devices for domain-specific applications. This is why many advanced package products are for very high-volume or price-resilient components, such as a server chip.
Progress in chiplet module simulation and test
Nonetheless, engineers are finding new ways to perform thermal analyses for package reliability before package modules are built. For instance, Siemens provided an example of a two-ASIC based module incorporating a fan-out redistribution layer (RDL) mounted atop a multilayer organic substrate in a BGA package. It used two models, one for the RDL-based WLP and the second for the multilayer organic substrate BGA. These package models are parametrically specified, including substrate layer stack-up and BGA prior to bringing in EDA information, and enable early material evaluation and die placement choices. Next, the EDA data is imported, and for each model, a material map enables a detailed thermal description of the copper distribution in all layers. The final thermal dissipation simulation (see figure 2) considers all materials except the metal lid, TIM, and underfill materials.
Fig. 2: Thermal modeling of two ASICs with RDL fan-out WLP and a separate thermal model for the organic BGA shows top and cross-sectional views of heat dissipatated through substrates and interconnects and up toward the metal lid. Source: Siemens
Eric Ouyang, director of technical marketing at JCET, together with engineers at JCET and Meta, compared the thermal performance of a monolithic die, multichip module, 2.5D interposer and 3D stacked dies with one ASIC and two SRAMs versus a single die.[1] The apples-to-apples comparison kept the server ambient, heat sink with vacuum chamber, and TIMs the same. Thermally, the 2.5D and MCM performed better than the 3D or monolithic chip. Ouyang and JCET colleagues devised a resistor matrix and power envelope plots (see figure 3), which can be used during early module design to determine if input power levels for different chips and set junction temperatures can be combined reliably prior to time-consuming thermal simulations. As shown, a safe region highlights power ranges on each chip that satisfy the reliability criteria.
Ouyang explained that during design, circuit architects may have an idea of power levels of various chip to be placed in the module but may not know if the power levels are within the reliability range. The plots identify safe power regions for up to three chips in a chiplet module. The team has developed an automatic power calculator for more chips.
Fig. 3: In a 2.5D interposer layout, the red region represents safe power levels for one ASIC and two SRAMs die that keep Tj-Ta < 95°C. Source: JCET
Quantifying thermal resistances
How heat moves through a silicon chip, board, glue, TIM, or package lid is well understood. Standard methods exist to track temperature and resistance values at each interface, which are a function of temperature differences and power.
“The thermal path is quantified by three key values — thermal resistance from device junction to ambient, thermal resistance from junction to case [at top of package], and thermal resistance from junction to board,” said JCET’s Ouyang. He notes that at a minimum, JCET’s customers require ɵja , ɵjc and ɵjb , which they then use in system designs. They may request that a given thermal resistance not exceed a particular value and that the package design deliver that performance. (See JEDEC’s JESD51-12, Guidelines for Reporting and Using Package Thermal Information, for details.)
Fig. 4: Thermal resistance from chip to package to board quantifies a package’s ability to dissipate heat. Source: JCET
Detailed thermal simulation is the least expensive way of exploring material and configuration options. “The simulation of the operating chips typically identifies one or more hot spots, so we can add copper into the substrate below the spot to help dissipate the heat or change the lid material and add a heat sink, for instance. With multiple die packages, we can change the configuration or consider new approaches to prevent thermal cross talk. There are several ways of optimizing for high reliability and thermal performance,” said Ouyang. Typically, packages are designed with certain maximum levels. Ouyang notes that system integrators might specify that thermal resistances ɵja, ɵjc and ɵjb, not exceed certain values. Commonly, silicon junction temperature is kept below 125°C.
Following simulation, packaging houses perform design-of-experiments (DOEs) to arrive at the final package configuration. But because the DOE step, which uses a specifically designed test vehicle, is time consuming and more expensive, simulation is exploited first.
Selecting TIMs
In packages, more than 90% of the heat dissipates out the top of the chip through the package to a heat sink, typically anodized aluminum-based with vertical fins. Thermal interface materials (TIMs) with high thermal conductivity are placed between the chip and package to help transfer heat. Next-generation TIMs for CPUs include metal sheet alloys (like indium and tin), and silver sintered tin, which conduct 60W/m-K and 50W/m-K, respectively.
As companies make the transition from large SoCs to chiplet modules, a greater variety of TIMs with different properties and thicknesses are needed.
For high-density systems, the thermal resistance of the TIM between the chip and package is having a greater impact on overall thermal resistance of packaged modules, according to YoungDo Kweon, senior director of R&D at Amkor, in a recent talk.[2] “The power trend is increasing dramatically, especially for logic, so we are concerned with keeping low junction temperature to ensure reliable semiconductor operation,” Kweon said. He added that while TIM vendors supply thermal resistance values for their materials, thermal resistance from chip to package (ɵjc), in practice, is influenced by the assembly process itself, including the bond quality between the chip and TIM and the contact area. He noted that testing in a controlled environment with actual assembly tools and bonding materials is essential to understanding the actual thermal performance and selecting the best TIM for customer qualification.
Voids are a particular problem. “The way materials behave in packages is quite a challenge. You’ve got the material property of the adhesive or glue, and the way the material actually wets the surface can affect the overall thermal resistance that material presents, the contact resistances,” said Siemens’ Parry. “And it’s very dependent on how the material flows into incredibly small imperfections on the surface. If imperfections are not filled by the glue, it represent an extra resistance to the heat flow.”
Dealing with heat differently
Chipmakers are widening their scope for how to solve thermal constraints. “If you drop the size of the die, it might be a quarter of the area, but the package might be the same. So there might be some signal integrity differences because of the bond wires from the outside package going into the die,” said Randy White, memory solutions program manager at Keysight Technologies. “The wires are longer, there’s more inductance, so there’s that electrical part. If you quarter the area of the die, it’s going faster. How do you dissipate that much energy in a small enough space? That’s another critical parameter that has to be studied.”
This has led to significant investments in bonding research at the leading edge, and the focus — at least for now — seems to be on hybrid bonding. “If I have these two chips, and there’s little bumps between them, there are air gap spaces between these chips,” said Rambus’ Woo. “That is not the best thermally conductive way to move heat up and down the stack. You might fill the air gaps with something, but even then that’s not as good as a direct silicon contact. So the hybrid direct bonding is one thing people are doing.”
But hybrid bonding is expensive and is likely to remain confined to the high-performance processor type applications, with TSMC being one of the only companies offering the technology at present. Still, the promise in enormous for combining photonics on CMOS chips or GaN on silicon.
Conclusion
The initial idea behind advanced packaging was that it would work like LEGO sets — chiplets developed at different process nodes could be assembled together and thermal issues would be lessened. But there are tradeoffs. Distances that signals need to travel matter, both from a performance and a power standpoint, and circuits that are always on, or the need to keep portions dark, affect thermal properties. Just breaking a die into multiple parts for better yield and flexibility isn’t as simple as it might appear. Every interconnect in the package must be optimized and hot spots are no longer confined to a single chip.
Early modeling tools that can be used to rule in or out different combinations of chiplets are providing a big boost to designers of complex modules. Thermal simulation and the introduction of new TIMs will remain essential in this age of increasing power density.
—Ed Sperling contributed to this report
References
1. E. Ouyang, J. Gu, Y. Jeong, M. Liu, “Thermal Design of a Chiplet Module using Monolithic Die and 2.5D/3D Packages,” Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITHERM) 2022.
2. Y.D. Kweon, “High Performance TIM for Lidded FCBGA Products,” Semiconductor360 Live Europe + Israel, 2021, https://www.youtube.com/watch?v=StakqaRul7k
Related Stories
DRAM Thermal Issues Reach Crisis Point
Increased transistor density and utilization are creating memory performance issues.
Future Challenges For Advanced Packaging
OSATs are wrestling with a slew of issues, including warpage, thermal mismatch, heterogeneous integration, and thinner lines and spaces.
Thermal Floorplanning For Chips
Many factors influence how hot a die or IP will get, but if thermal analysis is not done, it can result in dead or under-performing systems.
Mapping Heat Across A System
Addressing heat issues requires a combination of more tools, strategies for removing that heat, and more accurate thermal analysis early in the design flow.
Leave a Reply