Early Chip-Package-System Thermal Analysis

Why it’s so important to model self-heat effects and junction temperature variation.

popularity

Next-generation automotive, HPC and networking applications are pushing the requirements of thermal integrity and reliability, as they need to operate in extreme conditions for extended periods of time.

FinFET designs have high dynamic power density, and power directly impacts the thermal signature of the chip. Thermal degradation typically occurs over an extended period of chip operation. However, for certain applications like cloud computing that demand the almost continuous operation of the chip over shorter durations, this impact can be accelerated. Hence analyzing realistic application vectors and profiling the thermal signature of the chip based on various workloads to ensure high coverage become a key requirement.

Localized self-heating of finFET devices could cause up to 30°Celsius variation on the chip, which can further cause interconnect temperature variation of 10 to 15°C, directly impacting the MTTF (Mean Time to Failure) of the device. Moreover, the temperature variations on chip also adversely impact the performance of the chip. Early analysis and detection of thermal hotspots is critical to ensure a successful and safe product while meeting the stringent time-to-market schedule requirements. By performing early chip, package and system thermal analysis, designers can systematically predict the power profile of the chip for various scenarios and its impact on power-thermal reliability.

Profile power for real applications to identify thermal-critical windows early

Conventional design methodologies tend to lack coverage for real application scenarios such as streaming high-definition multimedia or OS boot-up. These scenarios span tens to hundreds of milliseconds in duration and are impractical to process. Critical power conditions due to real application activity are thus typically uncovered very late in the design flow when the chip is in the field, putting the design and schedule at risk. Recent breakthroughs in early RTL level fast power profiling can run several orders of magnitude faster than traditional interval-based power analysis methodologies. By profiling power at the RTL stage for real application level stimuli, designers can identify power- and thermal-critical windows early in the design flow. RTL power profiling can process very long activity files — hundreds of gigabytes in size, constituting hundreds of milliseconds of activity — in a few hours, as opposed to days or even weeks with standard approaches. That makes analysis of such large activity sets possible.

This data then can be used for key design decisions, such as the physical implementation of the SoC, software stack optimization, and cooling requirements for the entire system for various workloads. Early thermal profiling also can maximize coverage for system-level design by enabling the simulation of various chip thermal models that capture different operating scenarios in a variety of sequences, thus reducing the need for design margins and costly design iterations.

Evolving semiconductor reliability requirements
Semiconductor reliability requirements are rapidly evolving. New applications such as ADAS/self-driving cars and drones are pushing the limits of system reliability. Increased functionality and power density in next generation FinFET designs are leading to self-heating of the devices and joule heating of the wires. This results in a wide variation of on chip temperature.

With narrower wire widths, the EM limits also are shrinking. Hence, increased temperature leads to more EM violations on-chip, which are getting hard to fix. Without actual thermal analysis, it is hard to predict if your chip is being under designed or over designed. Overdesign leads to longer convergence cycles while under design leads to field failures. In addition, advanced 2.5/3D and wafer-level packaging technologies are blurring the boundaries between the die and the package, while creating more thermal hot spots that will impact both the chip and system level EM and ESD. That also can increase the chance of thermal-induced stress and can lead to warping and contact separation, causing long-term reliability issues that will ultimately render the product useless.

Hence, a comprehensive thermal analysis solution — one that can model self-heat effects and overall junction temperature variation of the die to perform thermal-aware reliability and thermal induced stress analyses — is required for an accurate signoff methodology.

Attend the ANSYS webinar on Emerging Thermal Performance and Reliability Requirements for ADAS and Autonomous Systems to learn how multiphysics simulations can address key reliability requirements for automotive electronics with thermal, thermal-aware electromigration (EM) and thermal-induced stress analyses across the spectrum of chip, package and system.



Leave a Reply


(Note: This name will be displayed publicly)