Coming In Hot: Requirements For Successful Thermal Management In 3D-IC

Why thermal management has risen to the top of 3D-IC concerns and what can be done about it.

popularity

As the speed, density, and capabilities of electronics have all increased, power has become a first order driver in almost all electronic systems. For instance, it’s well recognized that heat is often the number one limiting factor in 3D-IC design. High-speed chips stacked close together in a small housing cause things to heat up fast. One of the most common designer responses to overheating is to lower the clock speed and thus degrade system performance. However, that’s really just treating the symptom, not the cause. Is there a way to better anticipate and mange heat by design as the industry adopts 2.5D/3D-IC design?

Thermal analysis has typically been done as an afterthought – often only at the board level. That approach is no longer tenable for successful multi-die design where thermal resilience must be designed in, not fixed after the fact. It is useful to briefly consider what has changed in today’s technology that is causing so much anxiety about thermal management.

Challenges for 3D-IC thermal management

Transistor density: Despite rumors of its demise, Moore’s Law is still squeezing ever more transistors into the same physical area, which naturally increases the power density of chips at more advanced nodes.

2.5D/3D integration density: 3D-IC integration of multiple chiplets close together on a substrate, or stacked on top of each other, adds yet more circuit density on top of Moore’s Law and exacerbates the cooling problem

High performance: 3D-IC architectures are primarily being used for high-performance applications that are intrinsically high-power. Consider that if a multi-die system is not high-performance, then it’s more cost effective to implement it as a traditional printed circuit board (PCB). So, 3D systems that are inherently difficult to cool are also most often the ones pushing the limits of high performance.

Heterogeneous integration and thermal gradients: Monolithic chips are physically small and homogenous, so their temperature gradients are typically small and relatively easily modeled. Heterogeneous 3D-IC architectures, on the other hand, include a variety of materials and components distributed over relatively large distances. In other words; they’re much less uniform than monolithic chips. This has 3 consequences for thermal reliability:

  • Significant thermal gradients can persist across the order-of-magnitude larger distances on a 3D-IC module, compared to a monolithic chip
  • Heterogeneous materials used to assemble a multi-die system can have differential thermal expansion across the range of operating temperatures. This leads to mechanical stress and warpage
  • There is a fundamental mismatch between the time constants governing electrical activity those for thermal conduction, and this becomes more meaningful as the system gets larger. This means that the profiling of 3D-IC system activity must be done over much longer time frames than for monolithic chips in order to capture the variability of thermal inputs.

Fig. 1: Thermal and mechanical analysis results from Ansys showing temperature gradients on a 2.5D-IC and resulting mechanical warpage at 20 degrees and 245 degrees Celsius.

Higher temperatures: One might think that PCBs have always had to deal with heterogeneity and large physical distances, so what makes 3D-IC different? The answer is higher power density and higher temperatures. A modern graphics processor (GPU) or machine learning (ML) chiplet can easily consume several hundred watts. Putting two such chiplets in the same 3D system leads to much higher temperatures and more severe thermal effects than you would find on a regular PCB.

Manufacturing tolerances: Interconnections between chiplets rely on microbumps with a solder thickness of less than 10 microns. The solder volume of a microbump is approximately two orders of magnitude smaller than a traditional flip-chip joint. This means that even slight bending or warping of the interposer substrate poses significant reliability risks. In addition, these microbumps are being called on to collectively carry hundreds of watts of power – any local overheating can lead to thermal failure of these tiny structures.

Now that we have identified some of the reasons why thermal management has risen to the top of every 3D-IC designer’s list of concerns, let us see what technology and methodology solutions are available to successfully manage thermal design.

Capabilities for successful 3D-IC thermal management

Ansys has seen a surge in the number of its customers engaging on 2.5D or 3D-IC design projects. Based on our experience, here’s some of the capabilities designers have deployed for effective thermal management in production 3D-IC projects:

Early thermal floor planning / prototyping: Traditionally, thermal analysis is left for package designers to do at the very end of the design process. But placing a pair of hot chiplets too close together can doom a 3D-IC design right from the start. As a first order mission-critical parameter, thermal analysis and system feasibility must be “shifted left” and prototyped very early in the floor planning stage. The two key challenges for prototyping are (a.) to quickly build early physical models that capture physical dimensions and material properties and (b.) combine these with early RTL-level power estimates (see activity vectors below).

AI/ML system optimization: The complexity of 3D-IC designs can be staggering with many more degrees of freedom than in a monolithic chip and more intense interactions between components than on a PCB. Optimizing such a system for thermal or any other primary goals is a daunting and tedious task if done manually. Increasingly, artificial intelligence and machine learning (AI/ML) algorithms that are being used to quickly and efficiently scan the solution space for an optimal system configuration. Ansys optiSlang is an example of such a tool that can guide designers through an impossible tangle of competing and conflicting design choices.

High-capacity analysis tools: 3D-IC projects are much larger than a single chip and are growing faster than simulation algorithms can improve. The answers for increased capacity are hierarchy (see below) and distributed computing (the cloud). Tools for thermal, power must be cloud-optimized, high-capacity solutions that provide efficient distributed computing capabilities. Current techniques often ignore sections of a full design so as to reduce the computational workload. This becomes difficult to justify as elements get closer and more interdependent.

Reduced-order models (ROMs): Another essential tool for managing the size and complexity of 3D-IC projects is a hierarchical capability with reduced-order models. Ansys, for example, generates chip thermal models (CTM), chip power models (CPM), chip signal models (CSM), chip electrostatic models (CESM) and more to slash the amount of data needed at the system level. These make it possible to do full-system simulations with multiple complex chips while maintaining accurate results.

Fig. 2: Ansys Icepak uses Computational Fluid Dynamics simulation to model fans blowing air over a pair of heat sinks. This system level thermal analysis integrates with chip thermal models (CTM) to converge at a temperature solution.

Multiphysics solvers: It is clear from the challenges discussed above that 3D-IC design pulls together physical effects that used to be considered only by the packaging team or only by the board-level team. 3D-IC thermal analysis is intertwined with many of these and requires concurrent multiphysics analysis. For example, Ansys CPM predicts how the temperature of a chip affects the power it produces, as well as the other way round. Another example is how foundry electromigration limits and sheet resistance are temperature dependent, while joule heating – derived from Ansys’ signoff voltage drop analysis – alters the local thermal picture.

Computational fluid dynamics (CFD): Establishing the environmental thermal “boundary conditions” requires the modeling of heat sinks, convection, and forced air flow from cooling fans. This vital step relies on fluid modeling at the system level that interacts with the chip thermal model (CTM) to converge on a junction temperature for the 3D-IC assembly. As a leader in system level engineering simulation, Ansys has decades of experience in chip power analysis, system level thermal analysis, and CFD simulation that can all be applied as a multiphysics solution.

Long activity vectors: Activity is the primary driver of power and for thermal analysis purposes, hardware emulators are the most realistic source of long-duration activity information. Thermal flows require millions of activity vectors to reliably model usage modes (e.g., audio processing only, video and audio, boot-up mode, etc.) for long enough to capture the slower thermal effects. Thermal analysis cannot assume static or average temperatures. If the activity levels in two chips in a stack are synchronized such that their activity is causing them to get hot at the same place at the same time, averages don’t paint a reliable picture of how the device will perform. It’s important to link emulators to thermal management tools in a way that can give early power estimates, runs quickly, and gives actionable feedback. Ansys has developed direct, streaming interfaces with hardware emulators from multiple vendors at the RTL level. These allow the generation and profiling of very long activity vectors into Ansys analysis tools without the need for huge intermediate data files.

Fig. 3: Detailed thermal analysis by Ansys RedHawk-SC Electrothermalof a chiplet on interposer, with accurate temperature and current results down to each individual connection bump.

Mechanical stress/warpage: 3D-IC assemblies are far more mechanically fragile than a monolithic chip. Differential thermal expansion due to steep thermal gradients that vary in time causes the supporting interposer to warp, twist, and generate mechanical stresses across the connecting microbumps. This is a major reliability risk that can easily lead to field failures after a small number of thermal cycles. Once again, Ansys’ array of system simulation tools features mechanical solvers integrated with 3D-IC design tools to give immediate graphical feedback on mechanical stresses and deformations due to temperature.

This list, while not complete, covers many of the capabilities that should be on your checklist when considering a move towards 2.5D/3D design flow. Many more considerations could be added – like checking whether you foundry supports and endorses the thermal analysis tools you plan to use (see Ansys Collaborates with TSMC to Deliver Thermal Analysis Solution for 3D IC Designs). What should be clear above all is the truly multiphysics nature of the challenge that combines – and exceeds – the traditional challenges of chip, package, and system design. With one foot squarely in the world of semiconductor power analysis and the other foot leading the way is system-level analysis, Ansys multiphysics solutions can help 3D-IC design teams aggregate their expertise to meet multi-die thermal challenges head on. For more information about Ansys RedHawk-SC Electrothermal, the multiphysics solution for analyzing multi-die chip packages and interconnects, tune in to this webinar, Electrothermal Signoff for 2.5D and 3D-IC Systems [SemiWiki] or visit the Ansys website.



Leave a Reply


(Note: This name will be displayed publicly)