Power Management Becomes Top Issue Everywhere

Concerns about power are impacting everything, and AI is complicating it.

popularity

Power management is becoming a bigger challenge across a wide variety of applications, from consumer products such as televisions and set-top-boxes to large data centers, where the cost of cooling server racks to offset the impact of thermal dissipation can be enormous.

Several years ago, low-power design was largely relegated to mobile devices that were dependent on a battery. Since then, it has been creeping into a variety of applications, regardless of whether it is connected to a battery or a plug, and that has only become more pronounced as AI and nearly ubiquitous sensors push the need for more processing everywhere.

In many of these applications, power and cost are closely linked. In consumer devices, for example, lower power allows for smaller form factors because there is less heat to dissipate and smaller power supplies. That, in turn, means lower cost. And in data centers, cooling can account for 30% to 40% of total energy consumption, so any reduction in power improves the overall efficiency of the data center.

This is reflected in the IP used in these systems, as well. “Semiconductor IP designed for low power is adopting many of the design techniques used for mobile devices,” said Frank Ferro, senior director, product marketing for IP cores at Rambus. “These include power islands so that only the necessary circuits are operating for a given function. Clock gating and power gating are used throughout the design to, again, limit the power consumed by inactive circuits. And other techniques like dynamic frequency scaling allow the device to ‘throttle’ performance based on workloads to save power. Lower frequency usually translates to lower power consumption.”

In addition to low-power device architectures, choosing the right process node and design tools is important. “There are process node variants that have devices optimized for the lowest power consumption with low leakage,” Ferro said. “The combination of a low-power process node with a device architecture that takes advantage of low-power design techniques will ensure system designers the lowest power consumption.”

What’s different
Power has always been a factor in design, but it often has taken a back seat to performance and area/cost. In fact, the main focus on power outside of mobile phones for the better part of a decade involved primarily leakage current, which prompted chipmakers to adopt finFETs at 16/14nm, and which is one of the main drivers behind gate-all-around FETs at 3nm.

“From 90nm all the way until 14nm it was all about leakage, and there was so much effort put on addressing that,” said Godwin Maben, Synopsys scientist. “In the industry at that time, we didn’t really focus on dynamic power that much because we were totally focused on leakage. At 14nm the leakage was under control, whereas the dynamic power became a significant part of total power because of the increase in the gate capacitance. That’s when dynamic power really became dominant. The biggest piece, which is hard to address, is how to measure dynamic power.”

To measure dynamic power, good vectors are needed. “To get a good set of vectors, traditionally what used to happen is that the design architect would say, ‘These are the five or six scenarios in which the SoC will operate in the system. Pick those five or six, run power analysis based on this, or take an average of this,’” Maben said. “Those were the ways that were used to measure power. But today, that’s not good enough to address power requirements. Here, emulation-based power analysis has become critical because now when you have your SoC on an emulator, you have the entire system. You could even have a complete software stack. You can run an application that would put your SoC in the right mode, because that’s how your chip is going to operate in certain applications. The vectors coming out of this particular application would be used to measure power.”

Using this approach can be problematic, however. Running one second of an application generates billions of cycles of data because it captures everything about clock frequency, for example, which is greater than 1GHz. “There are billions of cycles, which means a terabyte of data, and that’s not useful at all because processing that amount of data for any tool is difficult. As a result, a lot of effort is being put toward getting the critical region in this entire time window, and this is good for power.”

Industry work here is primarily aimed at identifying the right vectors to measure so that the same vectors could also be used for optimization. EDA vendors and chipmakers are trying to determine how best to nail this down because dynamic power is the bottleneck for many of the new designs.

Currently, the best approach is to use emulation to perform the power profiling of a design, and because the emulator has the complete SoC at the RTL level, it should be easier to identify windows of interest based on the power profile. This approach does require more mature RTL to get started, which is why engineering groups are coming back around to RTL exploration techniques.

“During the early stage of RTL, there are tools that can give insight into how the power distribution is, and give guidance back to the designer as far as how to improve the RTL structure or the microarchitecture to reduce power,” Maben said. “Pre-130nm RTL estimation/RTL exploration was a big deal at that time, and once we moved over to focus on leakage, it took a backstep. Now we are back to that. RTL exploration is not just for power. It’s for timing and floor planning, and for early decision making. For example, during RTL exploration, can I look at physical aspects to create a better RTL?”

With all of this effort being put toward RTL exploration, there soon may be new tools to help designer not just explore the RTL from a power perspective, but timing, floor planning and a number of other tasks.

Another piece of dynamic power in new chips today, including all of the AI-dominated chips and SoCs, is glitch. Maben calls this the biggest new issue that has come into the picture. “Glitch power, because AI chips are purely arithmetic and data path components, makes up a significant portion of total dynamic power. This is an area designers, as well as EDA vendors, are trying to address, but it is quite complicated; measuring glitch is not easy.”

There are two types of glitch: functional and timing. “To measure glitch, the delays are important because the arrival time from the input of every gate is the key component in the measurement,” he explained. “If the delta between the arriving time of a signal is greater than or less than the propagation delay of the gate, there is a glitch. At the RTL stage there is no delay associated with it; there is no physical information, so how do we quantify this? This is where RTL exploration with physical information becomes critical, where, if we point to the source of the glitch, users can make certain changes from an architecture perspective to minimize glitch.”

In AI or math-based designs, glitch makes up as much as 30% of the total power, and while there are not any tools today to automatically optimize for glitch, some commercial tools can be used to help measure it, at least.

That’s one piece of the problem. While power continues to be a problem at every node, managing power consumption and power noise are especially troublesome at smaller nodes, where tolerances are much tighter and margin is increasingly limited.

“Higher device capacitance, interconnect resistance, and current densities at 7nm finFET nodes underscore the importance of dynamic power and thermal management,” said Preeti Gupta, head of PowerArtist product management at ANSYS. “By adopting a predictable and reliable RTL methodology, you can identify and fix areas of potential power issues earlier in the process and make better design decisions.”

Gupta stressed that RTL power analysis enables high-impact power-related decisions early in the design flow by providing a more intuitive environment for identifying, debugging and fixing potential power issues. Compared to the several hours it takes to synthesize the design and to compute gate-level power, RTL power analysis can be completed within minutes. “It is also much easier to simulate design activity at RTL for high coverage,” she said. “All these benefits allow you to explore multiple architectures for best design decisions across various modes of operation. Driven by rigorous tracking of RTL power across different bandwidth scenarios, AMD stated publicly how they reduced power by 70% in a high-performance computing design application.”

AI changes the rules of the game
While power is important in an increasing number of applications, it is the primary limiting factor in AI/ML chips.

“People are trying to say that timing is always the number one factor,” said Maben. “But now, given that power is a limiting factor, designers are asking if they can trade off timing for power, because even though they meet the frequency goal, with this amount of power they will not be able to use the chip. ‘Can I say that instead of 5GHz, if I want to reduce power by 10%, what is the frequency I will be able to get?’ This is because an increase in power leads to thermal runaway, and then temperature increases will demand that the performance is slowed. If you look at any tools, it’s always been timing, timing and timing. And there is no good way to say, ‘Ignore timing and reduce power,’ because that’s never been the case. However, with AI designs, power is really the limiting factor, so everybody is saying given this power, what is the maximum frequency I can reach? Previously, it used to be, ‘This is my design and frequency. What’s the power it consumes?’ Now, it’s the other way around: ‘This is the power I’m looking for. What’s the max frequency I can reach?’ This is key because some of the chips I’m looking at are a few hundred watts.”

The advent of AI is creating design opportunities ranging from better user experiences with consumer products to automated quality control on factory floors. The list of AI-driven use-cases is growing exponentially. The performance capabilities driving these devices are underpinned by innovative signal processing and machine learning (ML) techniques.

Dipti Vachani, senior vice president and general manager for Automotive & IoT at Arm, said that for IoT and embedded applications that are limited by power, cost and size, the real emphasis should be on computational efficiency rather than computational performance. This means the chip architecture needs to closely map to the overall requirements of the target application, minimizing silicon area and device cost. This poses a challenge for silicon designers aiming to differentiate their microcontroller with the greatest level of intelligence possible, she said.

To this point, Arm announced significant additions to its AI platform, including new machine learning IP and a neural processing unit co-processor for its Cortex-M55 processor, which the company claims can boost ML performance by 480X.

Further, as the IoT and edge intersect with AI advancements and the rollout of 5G, more on-device intelligence will drive the creation of smaller, smarter, more capable, but highly cost-sensitive devices. Delivering this intelligence on microcontrollers designed securely from the ground up, will reduce silicon and development costs and speed up time to market for DSP and ML capabilities on-device, Vachani said.

If you build it, will they come?
Synopsys’ Maben said the real keys to this problem are capacitance and designers’ willingness to adopt new techniques.

“We try to look at capacitance pretty early during the synthesis stage and try to do some logical restructuring based on the physical information,” he said. “For example, if I look at the toggle rate and the capacitance, and if the nets are long or short, maybe we could promote the nets to a higher layer or we could double space. We could do double width. There are tons of new things, but at the end of the day, it’s about efficiently managing the capacitance from a power perspective.”

The big question is when, or even whether, design engineers will adopt new techniques to solve these problems. “Do I need to functionally verify? Will that increase my complete cycle for tape out? For example, for people doing power gating, the biggest impact is to functionally validate at the RTL level. My simulation or the functional verification window cycle will be much longer, and it is a deviation from what we normally do. So if we give techniques that don’t impact functionality purely on the implementation cycle, then people are willing to adopt as long as it is automatic, and verification such as logical equivalence checking is possible to make sure that whatever is functionally verified is actually equivalent to the gates. If a new technique demands you need to step out of the normal cycle and verify separately, then people are reluctant to adopt that.”

Finally, Maben pointed toward another possible solution for leakage power. Even though it’s under control at 14nm and below, it still represents a big chunk of the total power.

“Engineering teams are now saying, ‘I’m doing power gating, I’m doing multi-voltage. Is there anything else which I can do to reduce leakage?’ This is where we start looking at the N-well of the design. Typically, most of the time it is on. Even though you say something is power gated, the N-well was kept on. Now, people are trying to see if they can turn off N-well. This has led to a new domain, which is designing with a split and a merged N-well. There, you need to have a way to address having a logic and a power down completely, which has N-well also powered down. If I have an always-on logic, which is N-well is on, when you are trying to keep two logics together, with different N-well, potential issues like N-well spacing come into the picture. To deal with this, it’s not uncommon to see new cells called insulated buffers and insulated inverters being designed in. Here, designers started designing insulated cells, called split N-well cells, where each individual cell will have two N-wells. So if I’m trying to optimize a logic going from always on N-well through a shutdown N-well, then I need these special cells, which will isolate these two N-wells. We are seeing many engineering teams already using these in order to squeeze out the last bit of leakage,” he said.



1 comments

Kevin Cameron says:

The reason it is more of a problem now is that with the end of scaling we need to do 3D ICs, and old approaches to performance (intel) are too power hungry for that.

Leave a Reply


(Note: This name will be displayed publicly)