Beating The Heat In 3D Packages

Thermal management is the biggest performance and reliability bottleneck in multi-die assemblies.

popularity

Key Takeaways:

  • Thermal management is a central design constraint, requiring early, thorough planning.
  • Accurate thermal simulation requires AI-driven adaptive meshing and
    real-world validation.
  • Innovative STCO strategies can drastically reduce GPU peak temperature.

As HPC and AI accelerators push power densities to 1kW and beyond, the heat generated by rapidly switching transistors is becoming increasingly difficult to dissipate.

Engineers are turning to finite element modeling with adaptive meshing to accurately simulate thermal conduction profiles. And new methods, such as an active measurement wafer with heaters and temperature sensors, can bridge simulation and experimentation, and ultimately improve the design and lifecycle of multi-die packages.

To better “beat the heat,” various organizations are devising ways of validating simulation results with real-world experimental data. For example:

  • AMD developed a package-level, software-programmable thermal evaluation vehicle that evaluates thermal patterns, thermal interface materials (TIMs), and cooling needs, alongside silicon development;
  • Fraunhofer IIS/EAS developed an active thermal test wafer that directly measures thermal profiles, shows how heat propagates, couples between chiplets, and dissipates upon cooling;
  • Amkor showed how a product prototype with heater die and sensors can verify simulation accuracy, and
  • Imec optimized an HBM-on-GPU architecture that reduced the peak GPU temperature from >140 °C to < 71 °C through STCO.

From afterthought to top priority
During the days of monolithic chip packaging, engineers used equations to approximate the device’s junction temperature based on resistance values for the chip and various parts of the package.

“Thermal modeling was always important for high-power devices, to make sure that during operation the maximum junction temperature at the transistor level, Tj did not exceed the maximum recommended temperature during operation — for example, 105°C for CMOS or approximately 85°C for DRAM,” said Mike Kelly, vice president of Chiplets/FCBGA Integration at Amkor Technology.

For many years, engineers could rely on shorthand methods of calculating junction temperature. “When there was one silicon die in a single IC package (e.g., flip Chip BGA), there definitely were shorthand methods for estimating the maximum power that an IC could generate, and the maximum junction temperature would still be within the silicon die manufacturer’s specifications,” Kelly said. “The thermal problem was set up as a series of thermal resistances (for the substrate, the lid, the die, the underfill, etc.), and the equations governing conduction and convection could be solved together. If you knew the total power generated by the single die, all of the thermal resistances and the surrounding air temperature, you could determine the die junction temperature.”

Now, with multi-die packaging, the approach to thermal management is entirely different. Using what is now referred to as system-level technology co-optimization (STCO) methods, thermal modeling begins early in the design process to optimally place the chiplets, minimize thermal coupling between chiplets, prevent thermal runaway (chip heats up, more power is required to drive devices, they heat up further, etc.), and reduce the need for expensive cooling solutions.

“For these high-end systems, like the GPUs and the microprocessors, we’re on the very edge of what we can do,” said Marc Swinnen, director of product marketing for the Semiconductor Division at Synopsys. “Thermal simulation has moved way upfront into the prototyping stage, which is the opposite end of the design flow from where it traditionally was performed. Now, thermal is a central part of multi-die design.”

Localizing hot spots with AI
To make 3D thermal simulations a part of the EDA flow, engineers employ a finite element method, which segments the chip and package regions into different sizes of polygons and then models the heat flow as a function of the material it is flowing through, such as silicon, dielectrics, copper, or underfill materials.

“So you break up the shape into lots of little polygons, and then you solve the thermal equations for each tiny area, and that gives you a picture of what’s happening. The hard part is the meshing,” said Swinnen. “Once you have the mesh, applying the thermal equations is straightforward, but building the mesh is difficult because you need to balance accuracy and speed. A finer mesh is more accurate but time-consuming because there are more polygons. A large mesh is faster but less accurate because it can’t capture the fine detail. So the obvious answer is an adaptable variable mesh. Of course, this presupposes that you know where all the hot spots are, but that’s where AI comes in. AI gives you a good idea as to where the hot spots will be, and that allows you to build the mesh efficiently, which saves you a lot of time in reaching the thermal solution.”

There are several ways that elevated temperature negatively affects the performance and long-term reliability of devices. For one, hotter chips use more power than cooler chips. In a worst-case scenario, thermal runaway ensues as increased power output leads to a hotter chip, which requires more power, which heat up more, and so on. Also, heat from one chip tends to spread to its neighboring chips, turning what was once a chip-level problem into a system-level problem. And when chip temperatures get too high, solder bumps can reach their melting point.

The heat output from a chip is dynamic, changing as the workload changes. “Early thermal modeling at the prototyping stage is vital to avoid strategic errors in placing chiplets in the advanced package that will necessitate a restart of the design or add unexpected cooling costs to the product,” said Swinnen. “The key here is not just average temperature, but also peak temperature.”


Fig. 1: Thermal analysis uses adaptive meshing to provide greater detail in hotter zones. Source: Synopsys

Others agree on the importance of choosing the right mesh size. “As mesh size becomes smaller, the hot spot becomes clearer and more locally concentrated. While a smaller mesh better captures local peak temperatures, it may predict an overly pessimistic thermal risk when the obtained TG (thermal gradient) value is used to represent the temperature increase of the entire IP block,” explained Jae-Gyung Ahn of AMD. [1] “In terms of computational effort, a 200µm mesh size requires approximately 1,200% more solver time than a 1,000µm mesh size, while a 500µm mesh size increases solver time by only 40% relative to the 1,000µm mesh size. Therefore, the optimum mesh size must be carefully determined by evaluating the locations of hot spots and actual worst-case locations for EM risk.” In this particular example, AMD recommended applying 20µm or 100µm mesh size at the local peak power density locations to balance computational efficiency and accuracy.

Swinnen pointed to another challenge for thermal simulation. “An under-appreciated factor in thermal modeling is generating the appropriate activity. Indeed, all thermal output from a chip is caused by the chip actively processing data. More activity, more power used. The problem lies in the slow time constants governing thermal conduction as compared to the very fast time constants governing electrical switching. That means you need very long sequences of activity to model thermal effects. For example, if you want to capture 1 second of activity in a 1GHz microprocessor, you need 1 billion activity vectors – far more vectors than are used in functional simulation of timing analysis. So, where can one get long streams of realistic chip activity to feed into thermal simulators? The answer is from hardware emulators that can simulate the system at the RTL level for these sorts of time-spans. And then the activity needs to be ‘profiled’ (pre-processed, consolidated, or condensed) into a simple pattern that is meaningful for a thermal solver engine.”

Though the industry has long used passive test structures to confirm modeling results, such approaches do a poor job of accommodating heterogeneous thermal loads and migrating hot spots such as those found in AI accelerators.

In response to the strong need for dynamic emulation of realistic workloads, Fraunhofer IIS’ Engineering of Adaptive Systems Division developed a thermal test platform that combines fine-grained programmable heating elements with high-resolution sensing structures. Andy Heinig, head of the Department for Efficient Electronics at Fraunhofer IIS/EAS, described how this real-world thermal testing wafer provides insight into heat propagation through different materials, as heat couples from chip to chip, helping to determine the efficacy of cooling strategies. [2]

Fraunhofer’s programmable digital heating blocks and thermal sensors enable both static and dynamic emulation of realistic chiplet workloads. This approach builds on the concept of basic blocks of digital cells that act as reconfigurable heat sources. By selecting the number, size, and spatial arrangement of blocks, various power densities can be achieved to activate one fine-grained hot spot or multiple regions across the wafer for system-level thermal measurements. Time-dependent switching patterns make it possible to mimic the behavior of shifting hot spots. The temperature sensors have high enough resolution to capture thermal gradients.

“The proposed architecture allows fine-grained programmability, modular scalability, and dynamic reconfiguration. This means the wafer is not bound to a single experiment but can be reused across a wide range of studies,” explained Heinig. “Researchers can tailor the heating patterns to specific workloads, vary the intensity and spatial extent of thermal stress, and collect detailed datasets for model calibration and design evaluation. This platform is designed to expand the design space for future packaging technologies by reproducing the combined effects of power delivery networks, thermal interface materials and heterogenous chip placement, allowing researchers to identify reliability limits and safe operating conditions.”

Another key contributor to the thermal problem is power delivery. The high current densities associated with AI accelerators and high-performance GPUs increase the resistive losses in the power delivery network, causing Joule heating. The industry is tackling this challenge in a variety of ways, one of which involves moving the PDN to the wafer backside. Power-aware and placement-aware floorplanning can help reduce peak power density.

New processes, such as backside power distribution networks and hybrid bonding, solve pressing interconnect problems, but they exacerbate thermal issues. “When two or more silicon dies are stacked, whether by hybrid bonding or conventional copper-pillar bumps, the combined heat energy produced by facing die must be carefully comprehended early in the design cycle,” noted Kelly. “This requires the ability to estimate junction temperatures of all dies in the stack as functional blocks are floor-planned. In that manner, appropriate silicon die clocking strategies can be implemented, the die stack’s performance optimized, and junction temperature constraints addressed at the same time. Ideally, the EDA tools can handle this kind of thermal optimization, using in-house thermal simulation capability.  Alternatively, other thermal simulation programs can be used during the floor-planning iterations.”

Importantly, even the best simulations require real-world experimental validation.

Package-level thermal vehicle proves useful throughout product lifecycle
Accurate modeling of thermal behavior in multichip systems is becoming important to all stages of a product’s lifecycle.  “It is necessary to evaluate/manage the thermal aspects of a chip throughout the development cycle, starting from initial planning stage to beyond customer board deployment,” explained Suresh Parameswaran, principal member of AMD’s technical staff, in a recent article. [3] The AMD team developed thermal evaluation vehicles for 2D and 3D packages with the goals of evaluating the architecture and floorplan while the chips are still under development, cross-collaboration of 3D simulator results and chip-level measurements, and providing early evaluation of packaging choices such as thermal interface materials (TIMs) and cooling methods. The group emphasized the thermal vehicle’s simple implementation, software programmability, and rapid feedback of on-die temperature measurements.


Fig. 2: Example of an early-stage thermal prototyping of a multi-die assembly with thermal modeling of interposer, memory stacks, and central system-on-chip. Source: Synopsys

The modern approach uses a combination of FE modeling and a product prototype. “These days the most common approaches simulate the package using finite element software, or to build product replicas using ‘heater dies,’ which are specially designed silicon dies with on-board heaters. The power that would typify an actual product is input into the die or several dies,” Amkor’s Kelly said. “In this configuration, the temperature sensors are usually built into the heater die, so power is inputted, and the junction temperature is measured, usually at different locations on the heater die.”

Because the building of product replicas is very expensive, oftentimes only one package configuration is built to validate the FEA simulations and double-check the model’s accuracy. Then, many other product configurations can be estimated exclusively using the FEA software.

Engineers perform finite element meshing in conjunction with the cooling mechanism, for instance, considering both the heat generated by the chips, the spreading of that heat to neighboring chips and interposer or bridge die, as well as heat removal by heat sinks, fans, spot-based liquid cooling — or immersive liquid cooling, which is still being developed.

When mechanical comes in
In multi-die stacking scenarios, more often than not, mechanical stress and mismatch of the CTE of materials is so great that mechanical changes must be modeled in addition to thermal changes. “Modeling thermal effects is mathematically very similar to modeling mechanical effects, and we combine both capabilities in our 3D-IC analysis products,” said Swinnen. “Given the right material properties from the foundry (as an encrypted tech file, to protect their process trade secrets), it is possible to calculate the Von Mises forces at every point. This allows the calculation of stresses and displacement (warpage) at every point. Stress and warpage are two sides of a coin. Stiffer materials will warp less, but at the cost of higher internal stresses. More flexible materials will experience lower stress but greater displacement.”


Fig. 3: Example of an interposer warpage analysis. Source: Synopsys

Building four HBMs on top of a GPU
Illustrating the importance of cross-layer thermal co-design, a research group at imec led by Yukai Chen explored how a combination of STCO and technology-level mitigation strategies could bring a 3D configuration of four 12-high HBM stacks atop a GPU from an initial junction temperature of > 140°C to a level comparable with 2.5D implementation, or 70.8°C. The GPU dissipates 414 watts, while each HBM dissipates 40 watts under AI workloads.

The HBMs are stacked along the GPU’s short edges and interfaced via microbumps to connect to signal layers. (The GPU uses a backside power delivery network.) First, the researchers removed the base logic dies from the four HBM stacks, because they are functionally redundant. This change improves thermal coupling between the GPU and HBMs. DRAMs are hybrid bonded together. The lid side is cooled by microchannel or jet impingement cooling, removing 30kW/m2K. The laminate underside is air-cooled (200W/m2K). Thermal behavior was characterized using Synopsys Icepak.

Next, pairs of adjacent DRAM stacks were merged by replacing the molding compound used between them in 2.5D configurations with thermal silicon, which improved both vertical and lateral heat spreading. The top DRAM die was also thinned to shorten vertical heat paths. In addition, thermal silicon was selectively placed on GPU hot spots identified using thermal maps. The STCO strategies included double-sided cooling and reducing the GPU’s core frequency by half, which significantly lowered peak GPU temperature from 120 °C to 99 °C. Subsequent frequency scaling and additional application of thermal silicon brought the peak GPU temperature to 71 °C, comparable to the 2.5D configuration (69 °C).

“The results not only provide insights into managing severe thermal constraints, but also demonstrate how STCO can substantially enhance the thermal feasibility and performance of future 3D GPU architectures,” said the imec team.

Conclusion
As the industry increases its adoption of advanced processes such as hybrid bonding, backside power delivery networks, and multi-die packaging, it will rely more heavily on thermal modeling programs with adaptive meshing to balance computation time with model accuracy. Both thermal and mechanical behavior, which are co-dependent, can be modeled simultaneously.

Experimental replicas or prototypes with heating and sensing elements provide valuable confirmation of model results. System technology co-design approaches, such as using thermal silicon blocks to help dissipate heat, can significantly reduce the GPU peak temperature in an HBM/GPU 3D stack running reasonable AI workloads.

References

  1. J.-G. Ahn et al., “Estimation of Product Lifetime with Highly Varying Die Temperature,” 2025 IEEE International Reliability Physics Symposium (IRPS), Monterey, CA, USA, 2025, pp. 1-6, doi: 10.1109/IRPS48204.2025.10983831.
  2. A. Heinig, “Active Measurement Wafer for Thermal Characterization in Chiplet-Based Systems,” 2025 IEEE 27th Electronics Packaging Technology Conference (EPTC), Singapore, Singapore, 2025, pp. 1-4, doi: 10.1109/EPTC67330.2025.11392679.
  3. S. Parameswaran, et al., “Novel Programmable Package-level 3D Thermal Evaluation System,” 2024 23rd IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), Aurora, CO, USA, 2024, pp. 1-6, doi: 10.1109/ITherm55375.2024.10709406.


Leave a Reply


(Note: This name will be displayed publicly)