Managing Peak Power

Slimmer margins and more data create big challenges for 5G mobile devices, infrastructure and within data centers.

popularity

Peak power is becoming a serious design constraint across chips and entire electronic systems as more functionality is added into end devices and the compute and switching infrastructure needed to support them.

The issues are a direct result of growing complexity in designs, fixed or shrinking power budgets, and the need to process more data more quickly. In mobile devices, the addition of more features requires more of the circuitry needs to be turned off or put into sleep mode. In the data center, the sheer volume of data that needs to be processed more quickly is exploding. And in communications infrastructure, the push toward 5G incorporates both of these issues and others.

Peak power is the maximum power generated at any point in time. This can occur when a device switches from the “off” or “sleep” state to the “on” state. It also can occur in a power domain that is used to control some blocks with memories, such as when one or more applications are being used for intensive computing. And the smaller the process node, the greater the problem becomes because the dielectrics and wires are thinner while the margins for noise are lower.

“When you turn them on from sleep to function mode, that will increase the in-rush current, and the current will spike because it turns everything on,” said Jerry Zhao, product management director for Cadence‘s Digital & Signoff Group. “That’s where specific technologies manage and measure and analyze for every single device or transistor how that specific transistor contributes to this in-rush current, at what time, and at what amplitude that current is going to be. When you accumulate all the switches together, because one block may have 50,000 such switches placed in different locations on the silicon, that’s where tools need to understand and see if it is too big. When the current is too big, the effect is that it draws a lot of current on the grid, so the IR drop is going to be huge, and the neighboring blocks that was already functioning may fail because the voltage dropped too much.”

Preparing for 5G
5G is shaking up the communications landscape with the promise of high-definition streaming video and less wait time for Internet access and downloads, but it also is raising questions about how to cope with much tighter power budgets. Peak power is a key part of this discussion.

“People talk about 5G as it relates to the application of the chip,” said Christen Decoin, product management director for the Digital & Signoff Group at Cadence. “There are different power demands when it’s a device that’s constantly used versus something like a cellphone. You leave it in your pocket and it’s working in a standby mode. Then there will be a surge of power when it is used. However, a 5G chip is going to be working 24/7. In this kind of utilization, the peak power might burn out the chip, because when all of the simulations are done the engineering team looks at the everyday usage. But if for any reason there is a peak in the power equation, you might kill the chip.”

Every kind of power is important right now because of battery-powered applications from peak power to average power, and static power to transient power. Specifically, peak power happens in a very active functional mode in a specific device, whereby it consumes a lot of power in a certain period of time and draws a lot of current from the battery.

A slow wakeup cycle will smooth out the in-rush current and reduce peak power. But that has to be balanced against the user experience, because if it takes a long time to swap from voice mode to video mode, users will choose a different device.

“With the transition to 3G to 4G to 5G, and wireless, battery-operated devices, the concept of peak power has become even more important,” said Shekhar Kapoor, director of product marketing for Synopsys’ Design Group. “Most of the customer growth that we see is coming in these applications. Power is at the forefront. Ten or twenty years ago, when you’d go buy a PC, you’d look at just the frequency. Now you’re looking at the power numbers. It’s the same in communications. 3G used to be voice and data, 4G was broadband, and 5G is like trillions and trillions of devices—and all kinds of devices. Power has become more important, driving the need for more and more power management techniques, and we have to get into the flows very early on to manage the power because it is no longer acceptable to wait to see what the peak power is going to be. Engineering teams must determine what can be learned from the early vector data on a design, and use that toward the system to keep the designers on track, and to see that they are not diverging too much away from the goal.”

Mobile infrastructure issues
Alongside 5G are demands to move more computing into the cloud, ubiquitous connectivity, and a much more flexible and robust infrastructure. To make these things a reality, however, infrastructure improvements associated with a shift from 4G to 5G will likely include a hike in cell edge data rates from 10 Mbits/sec to more than 1 Gbit/s, a trebling of spectral efficiency, a 50% gain in energy efficiency, and an increase in mobility from 350 km/hour to 500 km/hour.

“Consider an example where mobile phone users in a fast moving train, using Gbps data rates, encounter a change in base station,” said Preeti Gupta, director of RTL product management at ANSYS. “This will lead to large step functions in power as the new base station establishes the connections and starts to serve this large set of users while the previous base station releases the connections. The 5G systems must naturally be able to serve such needs with the sub-1ms low latency.”

This creates challenges for chip design teams. “For a semiconductor design, peak power per cycle—or across a set of contiguous cycles or across much longer cycles—are all important for thermal and power integrity of the system,” Gupta said. “Peak power impacts the power grid resources, including decoupling capacitance, and di/dt events impact the voltage supply levels on the chip and the package (L.di/dt). Detecting peak power and step function scenarios early is crucial for robustness, cost and schedule considerations.”

Gupta pointed to several significant changes this has caused in design methodology:

  1. Designers are now running application scenarios on emulators early so they have a fair chance of isolating realistic peak power cycles. Emulators run fast and also enable visibility into software and hardware interactions.
  2. New power profiling engines have emerged that have made it practical to generate per-cycle power profiles for millions and billions of cycles of activity with performance several orders of magnitude faster versus traditional approaches. This is done early at RTL, when meaningful high-impact design changes can still be made.
  3. The front-end and back-end design teams are forging tighter flows. Power grid analysis is leveraging RTL capacity and performance to improve sign-off coverage through smart identification of power-critical cycles. Big-data-based tools give this a further boost with performance and capacity for the longest of vectors and the largest of designs while facilitating analytics across logical and physical domains. In one instance, a designer switched on different portions of logic sequentially in time versus turning it all on at the same time – he chose higher average power for a short duration at the cost of a significant reduction in di/dt. This may be a better alternative than over-designing the power delivery network to provide for the excessive di/dt event.

 


Fig. 1: Heat map of what can go wrong. Source: ANSYS

The problems are equally challenging on the mobile communications infrastructure side. Stephen Kovacic, director of technology at Skyworks Solutions, pointed to a paper showing that peak-power-to-average ratio (PAPR) of waveforms is set to increase with 5G. It already has risen from 2:1 for 3G (WCDMA) to 5:1 for 3.5G (HSUPA) and 7:1 for 4G (LTE/OFDM).

PAPR has a direct impact on cost per bit, and peak power management is part of that cost and a key concern for 5G long range mobility and high data rates. The power amplifier, for example, consumes the most power in the base station. A signal with higher PAPR degrades the efficiency of the power amplifier and increases the power drawn.

Impact in the datacenter
The same kinds of issues that are being dealt with at advanced nodes are showing up inside racks of servers in a data center, too. There is much more data that needs to be processed than ever before, but there is only so fast that servers can run and turn on because of the thermal limits on the server racks.

“This is all directly related to power dissipation and density,” said Frank Ferro, senior director of product management at Rambus. “If you look at SerDes, the first question customers ask is, ‘What is the PPA?’ And they get there with memory, as well, as you go up in speed. These companies may budget a certain percentage for memory, SerDes, the processor and the PHY. If you look at some of the networking chips, they give off a lot of heat. If the peak power is not within expectations, they knock you out of the running early.”

Ferro noted that by scaling SerDes from 28nm processes to 14nm didn’t result in much improvement in performance. He said that is changing, in large part because of a mix of digital circuitry with analog.

“You always need to look at the architecture for the digital side, because it’s easier to port, but it’s also a challenge to get the performance up,” he said. “But there’s also a challenge on the performance side. If you look at what’s been going on in Ethernet, it took seven years to move from 1 gigabit per second to 10 gigabits per second. Now, 28G is the workhorse and we’re seeing companies asking for 56 and 112. The rate of development is phenomenal.”

One of the big issues inside datacenters is the standard server rack chassis, which are limited to 250 to 300 watts. “If you’re dealing with gigabit embedded SRAM, that’s more than 400mm square,” said Lisa Minwell, eSilicon‘s director of IP marketing. “It puts a lot of pressure on the memory subsystem. If you’re running that at 1.8 GHz worst case, that can boost the temperature.”

The solution, she said, is to increase the number of memory operations per cycle using a multi-core configuration. “That requires a new algorithm and a different way of measuring it because you’re dealing with operations per cycle per square millimeter.”

Changes at the design level
These challenges have not gone unnoticed by tools vendors. This is a big opportunity, and there is no shortage of work underway to solve peak power and other related power problems.

No matter whether the topic is 3G, 4G or the future 5G basestations, power consumption has always been a problem. Max Odendahl, CEO of Silexica pointed out that it has been solved by some engineering teams with Microsoft Excel, but this approach is quickly running out of steam.

“They have 50,000 little data points and then try to make it work based on the timing requirements of the mobile communications standard,” Odendahl said. “Those have a lot of processor cores, so we’re talking about 50,000 different pieces to 500 heterogeneous cores. All they can do is some kind of load balancing, hoping that it will work. There was no time to think about how to get this power optimized. This was out of their league because there are so many different timing constraints, both in throughput and latency, and you need to fulfill the standard and so much other stuff. So while peak power has always been a concern, now with [automated] tooling this is something you can do for the first time.”

To a large extent, this is embodied in the whole “shift left” idea. “The main challenge is that you are running these emulators, which are where you design all of the software architecture and power budgets, but they are done at a level of abstraction which is very different from what you will see when you do the physical design,” said Haroon Chaudhri, director of R&D for power at Synopsys. “There is a level of detail that isn’t available at the beginning, which comes into the picture later on. We try to keep it all synchronized in some way that what you thought was a high-power/high-power area. Do you see the corresponding behavior when you get into physical design? That’s where the big surprise comes. Unfortunately, generating stimulus for a 300 million-instance design is not practical. You have voluminous amounts of data from the emulator but to actually use it in a clever way to actually keep the connection between what you thought you designed versus what you have implemented is the biggest challenge, and where design teams struggle because suddenly as they go deeper into the physical design area, they start seeing differences in power profiles compared to what they saw at the beginning when they just had RTL and emulation. That’s a huge amount of cost that they are facing to keep this power profile in line with what they want. They spend a lot of time and money on tools and systems.”

To improve this process, there are machine-learning algorithms being developed across the industry, Chaudhri said. “Typically what happens in this process is that a previous design’s power profile is the driver for the newer design. Since they have successfully done a design which maintained the profile, in the new design, they are aware of the optimization moves they did that would preserve power or in some way make it worse but they want to go down step by step and make sure that they synchronize those at every stage to get a more predictable power profile. Where machine learning comes in is a previous design which may have been smaller with less functionality but because of the physical behavior of silicon, they can capture certain information and determine if they had made ‘this’ move during my physical design versus another, what the impact was. You’re not working in a vacuum. You have some information to move optimization in a certain way that you get the same profile that you started with.”

While these capabilities have been included in tools for a few years, the entire effort is getting more formalized within many of the leading EDA vendors today.

And even though the power challenges are mounting, with continued focus along with new technologies, designers have more ammunition than ever to attack the power challenge.

“The area we care about the most isprofiling the vectors to find where the design consumes the most power,” said Cadence’s Zhao. “That’s where you want to analyze hot spots, and it could be the IR drop is too strong so the chip failed, or the timing and EM. All of it is related.”

This is especially true when a design is migrated from 10 amps to 7 amps. Here, design teams are facing a lot of new challenges in how to look at power, he said. “This is not just on the power sign-off side. It has to be considered in the architecture, at the RTL, even at the behavioral level. If your architecture is wrong, and your peak power is not manageable at the end of the design cycle, that’s why there is a need for a full power flow. It’s not just a point tool at the end where if you were to find a big leakage, what are you going to do about it? There is no room to fix it any more because the function is already set, the architecture is set, and the placement and route is crowded already.”

Related Stories
The Time Dimension Of Power
Power is a complex multi-dimensional, multi-disciplinary problem. Does your flow address all of the issues?
Transient Power Problems Rising
At 10/7nm, power management becomes much more difficult; old tricks don’t work.
Toward Real-World Power Analysis
Emulation adds new capabilities that were not possible with simulation.
Tech Talk: 7nm Power
Dealing with thermal effects, electromigration and other issues at the most advanced nodes.