Waking And Sleeping Create Current Transients

How to keep power down, performance up, and lower peak currents.

popularity

Silicon power-saving techniques are helping to reduce the power required by data centers and other high-intensity computing environments, but they’ve also added a significant challenge for design teams.

As islands on high-powered chips go to sleep and wake up, the current requirements change quickly. This happens in a few microseconds, at most. The rapid change of loading creates a challenge for regulators.

“Power continues to climb exponentially, at least peak power,” said Sameer Gupta, senior manager for the infrastructure power business at Renesas. And peak power is associated with power transients.

While new approaches to regulation are helping to manage the high dynamic range, careful power design helps to manage the transients themselves. Chip designers have to balance the need for keeping power down and keeping speed up, all while lowering peak currents.

Data-center energy remains a challenge
Concern has long been raised about the amount of energy that data centers use. Combined energy consumption is on the order of 1% of worldwide energy usage (plus or minus, depending on the report). While substantial, it’s also not quite as bad as had been originally forecast. This is partly due to more frugal energy usage by today’s chips than had been originally forecast.

One of the important factors in those energy savings is the ability to power blocks up and down within server chips as demand changes. By powering down unused circuits, significant energy can be saved. Powering those blocks back up happens relatively quickly — far more quickly than would be possible if an entire unused server was simply shut down instead.

While this capability results in a net energy savings, it creates a different problem. When a block goes to sleep, it causes a significant change in the power load. That makes the voltage spike up briefly until the regulator can adjust to the new power level. Likewise, when a block wakes up again, it suddenly adds to the existing power load, and the voltage can droop briefly due to significant current inrush — tens of amps.

Fig. 1: The effect of changing current (green) on supply voltage (yellow). A current step of 90 amps causes 10-µs glitches on the supply. Source: Renesas

The challenge for regulators is to be able to react quickly to load changes that can take the current from a very low level, if a system is mostly asleep, to high levels for intensive tasks.

Regulators adapt
DC-DC regulators are regularly employed to step incoming voltages — often around 12V — down to the VDD that individual chips require, which these days is less than 2V. These are switching regulators, meaning that a digital switch connects and disconnects the input voltage from the internals of the regulator.

This switch provides a periodic pulse of current that gets smoothed out within the regulator. The challenge is that, with a simple regulator, you get one big current surge per cycle. The average (or RMS) current is then relatively high, which especially for high-drawing chips like CPUs and GPUs can cause thermal issues and increase the required robustness of the design.

“Multiphase” power was developed to address this. This involves several parallel regulators connected after their output inductors, meaning they all contribute to the current, and hopefully equally. What changes is the timing of the pulses on each. So rather than one big pulse, you get a sequence of smaller pulses that are out of phase with each other. A pulse in the first phase is followed by a pulse in the second, and so on through all of the phases.

Fig. 2: A four-phase regulator. Each phase ideally contributes equally but alternates pulse timing with the other phases. Source: Based on an image from Renesas.

By doing this, the single large current pulse in the individual regulator is now divided by the number of phases, effectively spreading the load. Ideally, the result at the output is the same, but more stable.

Fig. 3: A conceptual illustration of the impact of multi-phase regulators. Voltage is maintained with less RMS current. Source: Bryon Moyer/Semiconductor Engineering

The control portion of this power loop traditionally has been done using analog circuitry. Current-mode regulation was too slow. Voltage mode was faster, but lacked the necessary current information, making it harder to stabilize.

Renesas said that moving the control into the digital domain provides a different, faster approach. Measure the voltage and inductance, and from that calculate quickly (rather than measuring slowly) the current, providing tighter control. Calibration is provided by confirming current on the downslope following a pulse.

“The advantage with going digital is you have full control,” said Gupta. A digital approach also gives access to better operating data. “Server guys want not only to pull telemetry information at the rack level, but also the ability to support full system monitoring and remote management.”

Addressing the source: power islands
All of this has become necessary as large systems-on-chip (SoCs) have become increasingly sophisticated in how they draw power. Blocks can be put to sleep when not needed, awakening when a task is ready. While this might seem simple if applied to a few large blocks on a chip, that’s no longer the situation, regardless of whether those chips are destined for the data center, smartphones, or elsewhere.

“Some chips have hundreds of power domains, and they want fine-grained control on which one to shut down and which one to wake up,” said Godwin Maben, applications engineering scientist, digital design group at Synopsys. If too many of them power up and down at the same time, transient currents can be enormous. The problem gets worse with advanced processes. “As we are moving toward lower nodes, the total capacitance increases, which means that the average current needed to charge and discharge [nets] increases,” he added.

That creates two dimensions to the power-ramping challenge. One is the rate of powering up and down for an individual block. The other is the powering up and down of multiple blocks. Blocks often are designed in separate teams, so the blocks themselves may not be “aware” of which other blocks also are powering up and down at the same time.

Blocks that aren’t changing power states can be affected by other blocks that are if power wavers in the process. “Timing is not an issue for the block that is turning on,” noted Rajat Chaudhry, product management director for the Voltus IC power integrity solution at Cadence. “But you want to make sure you don’t cause too much noise on the one that’s already functioning.”

If the blocks are in the same power island, they’ll change power levels together — but they may not know that. If they are on different power islands, it’s not guaranteed they’ll ramp together, even though it’s possible. This situation makes power architecture and design a critical early aspect of design.

At the chip level, architects must understand the use cases that will drive power switching. Within a power island, staged power-up can be a way of moderating the current spike. This is done by delaying the power-up of different blocks. Even delays of a few nanoseconds can play an important role in calming the current. But with different islands, you have no independent knowledge of which will change power when, so explicit staging isn’t really a solution.

Instead, power arbitrators can be used to handle too many blocks changing power at the same time. Rather than having a built-in delay between specific blocks, the arbiter can take requests in order and insert delays between whichever blocks happen to request a power change at the same time. Feedback allows blocks to signal back to the arbiter when they’ve stabilized at a new power level so that the arbiter has a better picture of what’s going on.

“There is a something called the ‘power-good’ signal,” explained Maben. “The power management unit sends a sleep command to the CPU, and the CPU returns back a power-good signal saying, ‘Yes, I’m completely discharged, I’m asleep.’ The power management receives that acknowledgement and says, ‘Okay, this guy is dead. Now, I’m going to send the request to the next step.’”

The specifics of the arbiter — that is, the set of rules for a given chip — will be completely dependent on the application. They become part of the power design.

From the chip level to the block level
Even with chip-level management, an individual block, which may contain millions of transistors, still can cause more of a transient than is desired when changing power levels. So in addition to managing the timing between different blocks, it’s common to manage the timing within a block, as well. This doesn’t require arbitration, because it’s known that the entire block will power up and down together. It’s again a matter of staging the power-ups.

This is done using power switches — many of them, in fact — so that each switch can be controlled individually and timed. “This switch will not have the current-driving capacity to supply power to all of the cells,” noted Maben. “You need hundreds of switches to power a million cell instances.”

The switches can be sequenced in different ways, each having an impact on overall power-up or power-down time. Daisy-chaining creates a sequence of switches that engage, one after the other. But this is often too slow. Other layout approaches, like fishbone and high-fanout, allow finer control. Individual switches may be delayed with respect to each other, but only by a few nanoseconds, yielding a faster overall power event. Wake-up delays created by moving cells farther apart from each other, where the only added delay is a longer wire, may be sufficient.

Fig. 4: Different ways of staging the turn-on and -off of power switches inside a logic block. Source: Synopsys

Another tool is the programmable delay widget. They’re programmed with a delay, relative to a wake-up signal, that determines when to turn on.

Fig. 5: Delay widgets can be inserted between blocks, with their turn-on times programmed. The delay widget on the left will turn on 20 ns after it receives its turn-on signal from Partition 1; the one on the right will turn on 10 ns after it receives its signal from Partition 2. Source: Based on an image from Synopsys.

These different approaches yield different power ramp rates and current surges. Ultimately, a balance must be struck by the size of the current surge and the wake-up delay.

Fig. 6: Voltage ramp rates and current transients, with faster approaches generating more rush current. Source: Synopsys

Wake-up timing varies according to where you’re looking. For the circuits controlled by a single power switch within a block, it can be a few clock cycles. With a 2 GHz clock, each cycle is 0.5 ns, so wake-up comes within some number of nanoseconds.

A complete block will, of course, take longer due to the insertion of delays between switches. Likewise, at the chip level, you now have the staging of blocks. For a large chip, this can bring the power-up cycle into the range of hundreds of nanoseconds. Inductances all along the way further introduce some delay as you move out to the package and beyond. A giant surge, then, as seen at a power regulator, can cause a microsecond-level spike on the order of 10 µs including recovery.

The peak currents seen on-chip can be huge. The burden on the regulator is reduced by using multiple levels of decoupling capacitors, or “decaps.” These are very familiar on the power pins of chips — and have been for decades. But they’re now also sprinkled into the blocks on the chip to help with transients there.

Decaps often are described as filtering noise, which they do. But another way of looking at them is they act as a store of energy for quick release. “Decap cells act as the local charging station when you need it,” said Maben. When the current load changes rapidly, the decaps contribute some of that current very quickly while upstream sources — the regulator itself and other, higher-level decaps — provide the rest of the current. They’re about the size of a flip-flop, so they don’t take much space.

Using decaps too liberally on-chip, however, can result in an increase in leakage current. Off-chip discrete capacitors are specifically designed for a particular level of quality and leakage. On-chip caps, however, must conform to the CMOS process, which isn’t optimized for such capacitors. So they’re relatively leaky.

Impact on the chip design flow
There are three general phases for designing the power architecture in a new chip. There’s the initial design, done early in the planning of the chip. Then comes verification, once the bulk of the design is in place, which hopefully proves the initial design was sufficient. Local issues, however, may create the need for a third phase — resulting in engineering change orders (ECOs).

Early in the design, there are no vectors, so there is no specific signal data to use for testing the power design. Instead, there are budgets for average power for different blocks, and that’s what the power designers will work with, focusing on static power. “We design the power grid based on the static IR drop, which is based on average power,” said Maben.

Much of this is well understood ahead of time. “Designers have a pretty decent idea of how much power they are going to have at a block level, because that’s part of every chip design,” said Cadence’s Chaudhry. “They get reasonably accurate, within 20% or so.”

This is a time of experimentation, exploring different architectures, staging, and power arbitration options. Creating the physical power grid happens here, and the experiments are likely to include determining which metal layers will be used for different parts of the power grid and how wide the “straps” will be.

Conspicuously absent from this part of the design is the ability to design for transients, especially on aggressive silicon nodes. “On the newer nodes, they may not know the local spikes in the current,” said Chaudhry. Experience and knowledge of the blocks will help to provide an initial design that will hopefully be robust enough.

“More people are trying to solve the power transient problem by doing static IR efficiently rather than waiting for dynamic IR, because dynamic is too late,” added Maben. “The best way to fix it is to get your simulation vectors as early as possible, and make sure your static IR drop is based on realistic power, not on vectorless power.”

Because chip, package, and board are now so tightly intertwined, it’s no longer sufficient to do power analysis on the chip alone — especially for transients, where inductance is critical. “As you’re seeing more noise, you may have to do a full analysis with the package model,” said Chaudhry. “You have a very detailed model of the chip. And then you have a pretty detailed extracted model of the package. And then you can add a simpler model of the board, and you can combine it all in that chip level simulation.”

Part of the challenge is the fact that noise budgeting between chip, package, and board no longer works. “As you are lowering your power supply on the chip, your noise margins are shrinking,” noted Chaudhry. “Previously, companies would say, ‘Okay, you got 2% noise margin on the board, you got another 2% on the package.’ But now, as the supply levels are coming down, you don’t have the luxury of that additive 2%. So they’re saying, ‘Hey, maybe if I get my board in there, I can do a combined 4% or 3%.’”

Evaluating dynamic power
Logic verification is the first time that dynamic power can be evaluated. “You have all the vectors, and you use simulation results to do IR-drop analysis,” said Maben. “There will be many windows in the design cycle where, for a short period of time, there will be a huge surge of current because blocks are turning on and off without the knowledge of others.”

Emulation is particularly useful in this case because it can exercise a large sequence of vectors that would be too slow to run in simulation. While the emulator is running, power analysis tools identify any current peaks that might require extra attention. Today, emulators stream data directly to those analysis tools.

Further efficiency has been gained by streamlining some of the streaming formats. “One does not need to dump all signals in the design,” said Preeti Gupta, director of RTL product management at Ansys. “What if they were dumping only the critical signals, like the flop outputs? Why couldn’t a power tool fill out events for the rest of the nets?” This has slimmed down the data stream.

If all of this proceeds successfully, there’s a good chance that once layout is complete the resulting power infrastructure will hold up well under the load. It’s likely, however, that isolated nodes will see some voltage droop during final power analysis. The challenge is to keep that number down to a few tens of nodes rather than hundreds or thousands.

Each of those final nodes then can be tweaked. “When you get to the local level, you may have very high currents or too many cells routing nearby, causing a localized peak,” said Chaudhry. “They can fix that by moving the cells apart or making the grid stronger in that particular local area.”

Spacing the cells is sometimes referred to as “cell padding.” “If I separate the cells and put a pad cell in between, I have distributed or staggered the power requirement so that the drop in voltage is minimized,” said Maben. “This is often done for clock cells, because they are switching at a very high rate.”

The power grid can be “augmented” by adding power straps in parallel with existing straps to lower resistance. “They make the power grid denser, which means they have wider straps,” said Maben. “And they incrementally insert vias or via ladders along the grid to drop from higher metal to lower metal.”

A similar effect can be gained when metal fill is added. Chips cannot have empty spaces, so metal has long been used to fill such areas. “If you connect this fill metal to power and ground, it boosts your power and ground network,” said Maben.

It also may be possible to reduce the drive of some cells to lower the current inrush. “Downsizing the cell impacts setup and hold, which means your timing gets impacted,” said Maben. “But if your dynamic IR drop happens to be on a path that is not timing-critical, who cares?”

Yet another option is to divide up loads. “Another thing they do is to split output capacitance,” he said. “Let’s say I have a flip-flop driving ten loads. I copy this flop, and each one drives five cells. I’m splitting the output.” The key is that the two split versions won’t switch at exactly the same time. There will be some tiny delay between them.

Finally, ownership of power management can be an organizational issue. “One of the challenges in execution is that the emulator teams are the verification teams within design companies,” cautioned Ansys’ Gupta. “And the power methodology team is typically either a separate team, or they’re part of the front-end design team. Just bringing these two teams together to talk about it and figure out who’s the right owner of the emulator power flow is a challenge.”

Conclusion
Managing data-center power is a compromise and convergence between the efforts of regulator design and chip design. Ideally, the regulator could withstand any power transient, but chip designers play their part by managing their transients more carefully. They do this by slowing power ramps down, which means they must keep timing and performance in mind for the chip to do its job at the intended speed.

These combined efforts have kept data-center power below the level it was anticipated to be. But it’s still high, so hopefully continued efforts will help to mitigate the growth of data-center energy usage.



Leave a Reply


(Note: This name will be displayed publicly)