System-Aware Full-Chip Power Integrity And Reliability

Why this is a requirement for today’s SoCs.

popularity

At the core of every electronics system is a chip that has to meet multiple conflicting requirements such as increased functionality, best power efficiency, highest reliability, lowest design cost and short design schedule. Meeting these requirements poses a major challenge, especially for systems on chip (SoCs) that are designed using advanced processes.

Ensuring that the SoCs meet power and reliability requirements and that they function properly within the electronics system requires a simulation-based solution supported by a suite of accurate multi-domain models.

The challenges
Use of advanced technologies such as finFETs enables higher levels of integration, allowing designers to pack more functionality into a single SoC — functions that were once on separate components such as a GPS, graphics, computers, phone, RF, etc. These devices operate at lower supply voltages, reducing the available noise margin. At the same time, these devices drive more current and are packed more closely together, resulting in increased current density, and consequently more noise and reliability issues. FinFET devices also generate more heat compared to planar transistors due to poor heat dissipation. That, along with higher device density, can cause thermal reliability issues, which in turn impacts electromigration (EM). Electrostatic discharge (ESD), electromagnetic interference (EMI) and electromagnetic compatibility (EMC) are additional reliability concerns that need to be addressed as SoCs and systems increase in complexity.
And these challenges continue to grow as designs migrate to stacked dies.

Even after SoC designers have thoroughly verified that their chip meets the power and reliability requirements, if their design did not consider the context of the full system, there is a probability for system failure resulting in slipped schedule and increased cost arising from the required design alterations. Designing SoCs traditionally has been done using a “siloed” approach. Independent design teams are responsible for a portion of the design. They do their designs based on an assumption of some boundary conditions. Often this results in either over-design, where there is too much unnecessary margin built in, or under-design, where an unplanned-for condition arises due to the combined impact of the different parts. Over-designing can result in larger than required macro/IPs, extra metal layers in the SoC, or additional capacitors and traces in the package/board — all adding to the overall cost of the final system. Under-designing can cause failed functionality, missed performance targets or delays in the schedule.

To ensure power integrity and reliability from SoC to system, analysis and verification must be a part of the entire chip-package-system (CPS) design process, from library/IP to chip to package, board and system. It requires a common design environment that allows the sharing of data and models, and enables concurrent system-wide optimization.

Coping with power budgeting, power/signal integrity, reliability and stress
Key technical challenges in the design of an electronics system include power budgeting, power/signal integrity, and component and system reliability (electrical, thermal, mechanical stress, and system-level regulatory requirements for EMI and EMC).

Modern mobile handsets or tablet PCs are very good examples of such systems. They use SoCs that incorporate a variety of functions such as radio (Bluetooth, WiFi, FM), video, camera, audio, LCD, and voice. The SoC needs to adequately support these functions, while maintaining a long battery life, and preventing things like thermal impact (overheating) from affecting performance or device life. The designers need to meet very stringent requirements, including a tight power budget (Figure 1), limited power noise margins, and power integrity and reliability with respect to thermal, EM and ESD influences. In designs with multiple on-chip radios, EMI and EMC are especially critical, both at the chip and system levels.

Ansys1
Figure 1. Power/Thermal/Battery Life

As SoC power supply voltage becomes smaller, the noise level (L di/dt) goes up (Figure 2). This trend is expected to get worse as the industry moves to finFET technologies, which enable the use of very low supply voltages, thus reducing the amount of available power noise margin. Complex low-power designs using 100+ unique domains, on-chip regulators, power/clock gating, etc., make dynamic voltage-drop-induced failures a serious concern for designers.

ansys2
Figure 2. Supply Voltage/Power Noise Trends

Power noise can have detrimental effects on chip-to-chip communication. Signal noise coupling and power noise variation can affect the signal quality on parallel buses. To accurately predict the effects of signal and power integrity (SI and PI) on clock jitter requires a holistic simulation of a chip’s I/O DDR interface. This simulation must include the entire I/O bank, devices and their associated on-chip power grid, package and board parasitics (self and mutual) on both signal and power/ground traces, and the termination load.

Traditional SoC design flows did not put much emphasis on thermal, EM and ESD analyses. That is no longer the case for today’s SoCs built on advanced finFET processes. The elevated temperatures have a significant impact on silicon function and EM. Self-heat in finFET-based designs further exacerbates thermal and reliability issues. Package and board contributions to thermal challenges also need to be considered, as power is dissipated from board to package and die, and cooling elements are found at the system level. Similarly, while most chip or package design teams have little reason to be concerned about radiation noise, this is not the case with SoCs that go into mission-critical applications used in automotive, aerospace and defense markets. Accurate EMI verification ensures interference-free operation and adherence to regulatory compliances required in these markets.

Desired simulation-based approach
In today’s highly competitive market, a proper simulation-based design flow allows both top-down and bottom-up analysis frameworks to meet the requirements of system power efficiency, integrity and reliability, while meeting shorter design schedules at a lower cost.

The top-down process starts with power budgeting at the system level for each of its subcomponents. For example, defining the thermal and battery life requirements for the components in a smartphone allows each product team to work within a target power budget. In an SoC, this can be applied to budgeting the power for each functional block as early as the microarchitecture or RTL design stages. RTL can be quickly analyzed to determine power consumption in a block/IP for various operating modes, and identify opportunities to eliminate wasted power during idle modes. This approach allows for tougher power specs to be assigned and met, and also eliminates late surprises during implementation.

In the bottom-up process, each sub-section of a design is individually verified for power/signal integrity and reliability. This can be done in a virtual prototyping environment. Each component is modeled using the appropriate level of detail and simulated with the other components that make up a subsystem. For example, an IC with dozens of IPs and 100+ unique power domains can be verified using the following approach:

1. Each IP is verified for integrity and reliability.
2. A model of verified IP is created and simulated within the context of the SoC.
3. At the SoC level, all of the IPs and the 100+ voltage domains are simulated using appropriate package and board parasitic models.
4. At the board-level, the simulation includes the current draw and parasitics from all ICs on the board.

Accurate models play a crucial role in a simulation-based methodology. Some examples of such models include the RTL power model (RPM), the custom macro model (CMM), the chip power model (CPM), the chip thermal model (CTM), and the chip-package co-analysis model (CPA).

ANSYS PowerArtist, the RTL design-for-power platform, identifies clock periods and activity modes that are likely to cause increased voltage drops or reliability issues, and is captured in RPM. This model can be directly used by RedHawk for a range of simulations from early power grid prototyping to cycle-accurate static and dynamic voltage drop analyses.

Totem, the transistor-level power, noise and reliability simulation platform for analog, mixed-signal and custom digital designs, can generate a CMM, which is a compact, optimized power model that contains both electrical and physical data for the custom macro. In addition, detailed transistor (“mmx view”) and cell-level (“cell view”) models can be generated. These models can then be used within ANSYS RedHawk, the sign-off platform for SoC power, noise, and reliability, for accurate mixed-signal simulation.

Chip Power Model (CPM) is a SPICE-accurate model (Figure 3) of the full-chip power delivery network. It contains spatial and temporal switching current profiles, as well as the parasitics of non-linear on-chip devices including decaps, loading capacitance and power/ground coupling capacitance. CPM represents the power delivery network of the entire die, with ports at the die-level c4 bumps and/or pads. It accurately models the electrical response of the chip for a wide range of frequencies from DC to multi-GHz, thereby enabling analysis, diagnostics and power integrity verification at the system level.

ansys3
Figure 3. Chip Power Model (CPM)

The Chip Thermal Model (CTM) is generated using location and activity-specific temperature-dependent power data, and layer-by-layer metal density information from the chip. ANSYS Sentinel-TI uses the CTM along with a detailed package-on-board thermal model and thermal boundary conditions to quickly and accurately predict the temperature dependent on-chip power and produce a converged temperature map using an iterative power-temperature simulation flow. The detailed package-on-board model can be created within Sentinel-TI while the thermal boundary conditions are generated from ANSYS Icepak, a system-level computational fluid dynamics solution.

As a chip’s impact on system-level power, noise and reliability are important factors that affect the performance and cost of the end product, it is equally critical to understand the impact of package parasitics on the chip’s performance to design a robust and sign-off quality SoC. Chip-Package Co-analysis (CPA) uses accurate 3D FEM modeling to extract high resolution (per-bump) physical RLCK parasitics of the package. In addition, RedHawk-CPA provides a fully integrated chip-package co-analysis environment that enables designers to perform package-aware SoC static and dynamic power analyses, as well as chip-aware package static IR drop and AC hot spot analysis.

Simulation-based flow
Today’s SoC designs used in applications such as automotive, smartphones, tablets, health care, etc., need to be signed-off for power, in addition to function and timing. Without a robust power delivery network (PDN), both functionality and performance will be impacted. Poor PDN design can also lead to reliability issues such as EM and ESD. Thermal effects only exacerbate the impact of these issues on the design.

ansys4
Figure 4: Required simulations

Early in the chip design process, PowerArtist is used for RTL power analysis, optimization, and model generation.  Totem and RedHawk are used at the IP/analog/mixed-signal and SoC levels of a design, respectively, to analyze power, noise and reliability (Figure 5).

ansys5
Figure 5. RTL to GDS Power Noise Closure

Chip-level reliability such as thermal, EM and ESD factors are addressed using the foundry-certified solutions Sentinel/RedHawk/Totem for thermal-aware power and signal EM, and ANSYS PathFinder for ESD.

At the package, board and system levels, ANSYS SIwave is used for power/signal integrity analysis,  HFSS provides 3-D accurate electromagnetic analysis including ESD and EMI, and  Icepak is used for system-level electronics cooling analysis.

Summary
A simulation-driven product development process is required to meet the challenges of rising power consumption, tighter design schedules, and shrinking design margins. A methodology in which both top-down and bottom-up considerations are incorporated is necessary. To make this methodology successful, a strong collaborative framework needs to be adopted by all the involved parties. In this way, both data and specifications can be protected and shared in an effective manner, while enabling the co-design and co-analyses needed to solve these growing challenges.