Transient Power Problems Rising

At 10/7nm, power management becomes much more difficult; old tricks don’t work.

popularity

Transient power is becoming much more problematic at 10/7nm, adding yet another level of complexity for design teams already wrestling with power issues caused by leakage, a variety of power management techniques to control dynamic power, and leakage current.

At each new node there is less headroom for engineering teams to address these problems, and more likelihood that what they do in one area will affect another. Threshold voltages already are pushing the limit, design margins are shrinking, and a push toward greater heterogeneity means more devices with their own power requirements are being crammed into these designs. Transient power is one more factor to deal with, and it adds to the collective problems of power and signal integrity.

Transient power shows up in two basic areas:

  • Single-cycle transient power, which is dynamic power usage within a clock cycle, such as current demand peaks at clock edges. Single-cycle transient power causes high frequency noise and cell-level dynamic voltage drop (DVD).
  • Multi-cycle transient power, which occurs across multiple cycles due to cycle-to-cycle changes in activity across application use-cases. This is a lower frequency noise problem than single-cycle transient power.

Single-cycle transient power is exacerbated by synchronous clocks. It is best dealt with reducing peak power and DVD by clock scheduling, as well as physical level optimizations of chip-level power delivery network, said Tobias Bjerregaard, CEO of Teklatech. In contrast, multi-cycle transient power is a lower frequency noise problem further worsened by clock and power gating.

“When large parts of the clock network or power network are switched on, the inductive package leads need time to respond,” Bjerregaard explained. “This causes low frequency ringing and multi-cycle voltage droop effects. Having a strong (low-resistance) on-chip power network can actually make this effect worse, as the package L to chip C can cause resonance. Solutions are found at the chip-package-system level, but also in terms of power gating strategies.”

Until new nodes were introduced, this was primarily a second-order effect. But at 10/7nm, engineers are finding out how these two domains influence each other.

“In its generic form, transient power concerns system impedance and noise response,” Bjerregaard said. “When optimizing to reduce single-cycle noise at the chip-level, it also impacts the low frequency component of the current demand beneficially. But the more precise nature of the power shaping impacts the frequency spectrum. Another factor — more indirect but equally important — is that using dynamic power shaping optimization of the design, a smaller power grid and less on-chip is possible at the same DVD target. This not only makes better use of on-chip resources, leading to better area utilization—which is especially important at 10 and 7nm processes—but also has a beneficial impact on the system-level RLC resonance: higher R, lower C equals less ringing.”

Localized switching also affects low-frequency power integrity. “Where you have instantaneous current demand in one area, the transistors immediately see a very high voltage drop,” said Arvind Shanmugvel, director of application engineering at Ansys. “Then, when there is global switching, it causes a huge di/dt (a change in current over time). That, combined with the package inductance, essentially causes L (di/dt) voltage drop. These are the two phenomenon that directly affect the power integrity of a chip. An additional aspect is very slow moving transients with high magnitude that can cause a high thermal transient in the system, possibly impacting the overall thermal integrity and breaking the thermal design power (TDP) of the system, among other things.”

IP to the rescue?
Because SoC designs are dominated by reusable IP blocks, IP companies look at ways to reduce the leakage power in the IP blocks that are licensed.

One of the power-saving methodologies widely employed is power gating a certain portion of the design, which is generally used to put it into sleep mode. “Enough information about the state of the design needs to be preserved when the circuit wakes up,” said Ravi Thummarukudy, CEO of Mobiveil. “This is accomplished by identifying retention flops in the IP and at the SoC level. It’s equally important to make sure that introducing such flops in the chip will not increase the area or create functional issues. If IP blocks are power-gate ready, the life of SoC designers is much easier.”

In complex SoCs with many hierarchical blocks, it is common to see blocks that are not gated properly causing a power bug, said Andy Ladd, president and CEO of Baum. But verifying whether global and local gating are correctly applied is a difficult job. Getting it right requires designers to analyze the transient power or power waveforms.

Further, he said that IR drop analysis of an SoC is critical to avoid excessive IR-drop noise on chip power grids. But this is a time-consuming process, so only a few cycles of stimulus are typically applied to model current sources. The problem is that a comprehensive analysis of transient power waveforms under a number of scenarios is necessary to get the correct stimulus that corresponds to the interval of peak power consumption.

Concerns about security add yet another dimension to this problem. Side-channel attacks present a danger to chip designers, where security information is leaked through power consumption behavior. Preventing this kind of breach requires an understanding of transient power, skill sets that normally don’t go together.

“Designers need to analyze the transient power waveform for a variety of usage scenarios to make sure that their chip will not be susceptible to side-channel attacks. In all of these cases, fast and accurate transient power analysis is key to optimizing and validating the design,” Ladd said.

ESD, a special case
One special case of transient power is electrostatic discharge (ESD), and it is a particularly troublesome issue in high-performance logic.

ESD has a long history in semiconductors, dating all the way back to studies done by Bell Labs in the early 1960s, when researchers were concerned about the impact of lightning strikes on chips. But for the most part, these issues have largely been contained through extra circuitry, or margin. At advanced nodes, those margins are shrinking, so containing ESD becomes more difficult.

“In these systems we want all of the signal transfers done correctly, all the zeros and ones going from one point to another without interruption in order for the digital signal to go through so the phone or TV or computer will work just fine,” said Zhen Mu, a senior principal product engineer in the Custom IC & PCB Group at Cadence. “For ESD, it’s like a sudden, varied, very high voltage applied to a system. For example, if you plug a USB into a computer or suddenly contact something on the motherboard on the computer or the power port is connect to a cell phone, all of that could generate a sudden high-voltage event. This is because the human body has a lot of charge, and if you suddenly find a way to discharge, when the discharge loop formed in this case, then a current is created. The high-voltage itself doesn’t really do much, but when the loop of current is created, there will be transient power because the power is a multiplication of the voltage and current. That is where the damage will occur.”

If ESD occurs, and this type of phenomenon happens, the problems can cause all sorts of issues for circuits or signals. Moreover, those issues get worse as the voltage and current rise, potentially burning the IC or other components on a board. This can happen when something gets too hot, and it can cause signal loss even if it completely doesn’t destroy the system. Usually, when phenomenon subsides, the system goes back up, Mu said.

ESD is a special case that is dealt with on the EDA side as well as lab testing.


Fig. 1: ESD susceptibility map on PCB trace with ESD protection device. Source: ANSYS.

Early prevention
At advanced nodes, transient power is an issue that needs to be dealt with upfront in the design cycle.

“We cannot avoid transient power requirements,” Shanmugvel said. “The architecture demands that you have these kinds of transients. But how do we build around it? And how do we avoid the issues that come along with power transients? The first step is to analyze and design for the transients. We need to be able to analyze all types of transients, and we need to be able to make designs robust in a way that can handle all of this transients. This is a three-pronged approach. One is on the chip side, the second is on the package side, and the third is on the system side. Chip-level power integrity is one aspect where thousands of different vectors must be profiled in order to understand which ones are causing high localized di/dt. The chip power grid is designed based on that. Similarly on the package side, there must be a good model of the chip, and then it must be design for the L (di/dt) effects that affect the package level voltage drop. The third is the system side. How do you design for power transient thermal issues/power transient voltage regulator module issues?”

Even with all of this planning, errors creep in. Fixing them is difficult, and sometimes impossible.

“Once you have designed a chip and you find that there is some kind of huge transient that is causing a chip to fail, it isn’t possible to use that chip in the same way,” Shanmugvel said. “The first approach is to slow down the frequency of the chip, which sometimes can cause the di/dt to go away, but it’s not guaranteed. Another approach is to change the board level decaps or the package-level decaps with a much smaller change footprint. This could also make these errors go away. But again, if you don’t analyze and build for it, and you have a surprise after silicon comes in, it’s definitely a very messy situation.”

Add to this the question of how to simulate all of this. The hundreds of voltage change dumps or fast signal databases that have a vector on the chip must be simulated, and today’s tools are really not meant for handling this. However, industry insiders say next-generation platforms will allow VCDs to be profiled. Identifying the vectors that really affect the system will be mandatory for the upcoming technology nodes.

Overdesign no longer an option
While complicated, if these scenarios are not predicted, it inevitably leads to overdesign where all of the margin that was built in is blown away, leaving money on the table.

“At 7nm, you really cannot afford to overdesign especially on the power integrity front,” Shanmugvel said. “Your margins are so thin, with your supply voltages in the order of 450 mV, you really cannot afford to wing it. You can’t say at the last minute, ‘I over designed my power grid so I’ll be fine.’ If you are doing that you’re shooting yourself in the foot. Analyzing the design systematically from an early stage all the way to sign-off is definitely going to help design teams see the benefits, and not let them overdesign.”