Focus Shifts To Wasted Power

Low power is no longer enough. Is all of the power consumed usefully? Low energy is the new goal.


Mobile phones made the industry aware of power, but now the focus is shifting to the total energy needed to perform a task. Activity that is unnecessary to perform the intended task is wasted power, and reducing it requires some new methodologies and structural changes within development teams.

There is a broadening awareness about power. “The companies doing SoCs for mobile lead the charge with power awareness,” says Rob Knoth, product management director at Cadence. “When you look around today, be it someone designing an IoT application, or a consumer mobile device, all the way to something that plugs into the wall, or a server, power is becoming a differentiator and a requirement.”

Both extremes are becoming equally concerning. At the server level, complexity and thermal are the drivers. “AI is making the sizes of the SoCs grow at a rapid pace,” says Jay Roy, group director for SoC Power Continuum at Synopsys. “For companies like Amazon or Facebook, which are trying to adopt AI, cooling requirements are becoming dominant in terms of pricing. They are forcing the people who are working on the design to become much more power aware.”

At the other extreme, some devices need to conserve every Joule of energy. “A battery-free Bluetooth Low Energy device is well-suited for low data rate applications,” says Srinivas Pattamata, vice president of business development for Atmosic Technologies. “In many IoT applications Bluetooth connectivity is the dominant power-consuming part of the system. However, tens of milliwatts of power consumption is too high for sensing and data applications that run on small batteries for years.”

There is widespread agreement on that point. “The concept of wasted power is important terminology,” says Preeti Gupta, director for RTL product management at ANSYS. “AMD has defined it by saying that power must do work. Idle power is equal to wasted power.”

How is this focus different? “Power is instantaneous, energy is real work done over time,” explains Cadence’s Knoth. “Energy is a better quantity to analyze and optimize if you are concerned with things like battery life, or system architecture, to more efficiently accomplish a task. Energy requires a more sophisticated use of functional stimulus, power analysis and optimization techniques.”

Traditional techniques
Energy cannot be optimized at any single point in the flow. It requires continuous attention. “The notion of power is different for different phases of the design process,” points out Synopsys’ Roy. “It is not limited to hardware. It also is coming up into the software team, or integrating the entire SoC. At the end of the day they all care about the same thing, which is that I need to build something where the power requirements are minimized as much as possible.”

Some aspects of power optimization are well-established. “Power is composed of three main elements: leakage, short circuit power, and switching power,” says ANSYS’ Gupta. “For each of these, the design community has found ways to optimize them. There are other aspects like glitch power, which become significant in certain classes of designs — particularly designs that contain a lot of XOR functions. So DSP and arithmetic types of applications can have glitch problems. Capacitance, voltage, and frequency are the things under the designer’s control. We see people play around with the supply voltage, be it scaling it down or using multiple voltage domains or using power gating. We hear about dynamic frequency scaling, and clock gating is about shutting off redundant activity. There are a lot of techniques that are prevalent. And they are being applied early on through algorithmic considerations up until the last stages, where you are doing multi-Vt optimizations or pin swapping, or path balancing to reduce power.”

Fig. 1: Power analysis and optimization at various levels of abstraction. Source: Cadence

Each stage of the flow can impact power. “At the layout level, you are talking about 5% to 10%,” says Knoth. “When you back up a little farther and talk about synthesis, where you are going from RTL to gate, you can get up to 20% potential. But if you go back one more step to the architectural level, where you are actually deciding on data path and multiple cores running parallel processing algorithms, or slowing down the clock frequency to reduce the overall power to accomplish a task, there you are talking about 80% power optimization potential.”

Taking it a step further
Thirty years ago, functional verification was performed by looking at waveforms. “How effective would your verification methodology be today if you looked cycle by cycle, eyeballing waveforms?” asks Knoth. “You would get nowhere really fast. Similarly with power, teams have to apply more quantitative approaches, have multiple checks throughout the flow, and using techniques like regression testing.”

We are only in the early stages of defining many of those flows today. “For any notion to take hold, the background or the foundation of existing tools and methodologies need to be present,” says Roy. That comes down to technologies like emulation, virtual prototyping, and suitable vector sets.

Why is emulation important? “Emulation is enabling people to suck in the entire SoC, and is scalable, so they are able to run real scenarios,” continues Roy. “That was not possible 10 years ago. First is was used for functional verification, but now they also want to use it to see what is happening with power. Software is important and it can have a large impact on how much power is consumed. The interaction between hardware and software is slowing coming into focus.”

Qazi Ahmed, product marketing manager for Mentor, a Siemens Business, provides an example of those interactions. “Automotive, ML and AI applications tend to have large numbers of memory accesses. These can be profiling and then you can map the most frequently accessed addresses to another smaller memory, or by splitting the memory, or utilizing registers instead of memory. Techniques such as these can save significant power, sometimes more than 30% of dynamic memory power.”

Sometimes, this analysis can be conducted using a virtual prototype, which can provide much more accurate figures than the techniques used today. “Many people are still using spreadsheets to analyze power and they are taking wild guesses,” says Knoth. “This is where the biggest gains are going to come from, closely followed by optimization made through high-level synthesis (HLS).”

The impact of this kind of approach is getting more recognition. “They may have SystemC models or ISS models for key components, so we are starting to have this discussion,” says Roy. “At the architectural level you have a lot more room to be able to play with that kind of thing. For that kind of analysis to be more productive, the underlying engine that estimates the power needs to be reasonably well correlated.”

Tying power analysis into high-level synthesis is another area where significant improvements have been made, and that will become even more important as more things are added to the IoT and as the edge begins to take shape. These kinds of issues already are starting to spill over from traditional HLS types of applications into power, where HLS can provide a high-level view of the tradeoffs.

“You want to increase throughput and reduce area, and at the end of the day you want to reduce power, which is tied to throughput and area,” said Max Odendahl, CEO of Silexica. “We’ve been having a number of discussions about centralized computing. A significant amount of the battery power is going into centralized computing. So you need a centralized architecture and you need centralized computing for performance reasons, but then you’re wasting all of your battery power.”

The big differentiator
There is one foundational piece that makes the difference between power and energy – vectors. “To make your flow more power-aware, you need functional stimulus,” says Knoth. “But people will just grab some stimulus without fully understanding the quality of that stimulus and what that stimulus represents. An implementation engineer traditionally has done static analysis, and that is fine for optimizing power. If you move from static analysis to actually using functional vectors, then you move from power to energy.”

This requires a change in team structure. “Activity has a first-order impact on power consumption,” says Gupta. “However, the verification team was responsible for generating activity scenarios, and the power methodology team was responsible for running the different power analysis tools and recommending to the designers how they should change their design to incorporate power reduction techniques. There was a disconnect. Some companies are reducing that gap by creating power methodology groups that own both the vector generation as well as the power analysis and reduction. They are taking care of that throughout the entire flow.”

The necessary activity can be large. “Activity is controlled by software scenarios and the execution platforms that existed in the past were not able to scale and so they couldn’t see what the power profile was,” says Synopsys’ Roy. “That is changing. Power tools used to target block level designs, with 10,000 to 20,000 cycles. But now we have to be able handle complete SoC designs and much larger activity in the order of billions of cycles. The next generation of platform for real SoC-level power needs to scale and have the capacity to handle large activity and larger designs.”

How does activity impact design tools? “An example is mapping,” says Knoth. “During synthesis you have an elaboration step, a mapping step, and then you start dealing with different stages of optimization involving the gates and wires. At the mapping stage, you are going from a very high-level elaborated version of the RTL to choosing things like multipliers and adder architectures. Having functional stimulus drives that optimization. Making the right choices in synthesis, in an automated fashion, really helps reduce the overall energy footprint of the product. We have gone from something that was, ‘Let’s meet frequency or area and then assess the power,’ to one where power is a driving factor in the overall cost function so that you know you are making the right choices early on in the flow.”

This also creates a potential problem. “You may know the average or typical activity profile that would go through a block, and those patterns provide information to the synthesis and place and route tools,” says Roy. “They can then do a better job of reducing the power of the implementation. But if you are providing activity guidance based on experience from a previous design, the system-level tasks that I will be supporting can change the activity profile significantly. If I do not account for that, I am optimizing for the wrong conditions.”

Knoth agrees. “There are big pitfalls because people may be optimizing for a case that is not representative of the end-use cases. So they may under-optimize, and you may miss a peak event. Or they will use worst-case that an end user would never see, and you could over-optimize the design, impacting the schedule. You need to make sure that instead of just augmenting your methodology with functional vectors, you have the implementation and verification people and the architects sit down and talk to each other and understand the use cases, stimulus, how representative it is, etc.”

But you cannot just create a single vector set. “Scenarios change in terms of the goals,” says Gupta. “You may have different modes of operation. Consider OS boot or running certain applications. Sometimes you may be interested in the average power consumption because that shows you the thermal impact. As you go down to consider things like voltage drop, you are not interested in cycle-based power but transient power. What is the peak current value?”

Sometimes those concepts can be at a higher level than actual vectors. “We take each sub block and consider it for the complete range of device operations, including sleep, wakeup, transmitting and receiving.” says Atmosic’s Pattamata. “That means that every microampere of power consumption can be properly considered.”

Strategies emerge
Companies are still putting their initial methodologies in place. Gupta describes a strategy employed by the Qualcomm GPU group for finding wasted power. “In order to execute a certain function, you consume a certain amount of energy. If I wanted to complete that function in time t, versus time 2t, the energy should be the same. Power consumption will vary because power consumption is energy/time. If you have more time to complete a function, the amount of power you consume would be reduced but the amount of energy needed in order to complete the function would remain the same. However, if you see that the energy changes between the amount of time that you take to execute that function, it may point to a power inefficiency.”

Fig. 2: Energy methodology deployed at Qualcomm. Source: ANSYS

Another strategy is to utilize more sequential analysis on the design. “Companies can perform a sequential analysis to check if the output of a bunch of flops is either unobservable or stable under a given condition,” explains Mentor’s Ahmed. “This enables tools to derive stronger enable expressions for those flops, which can reduce dynamic power of CPU/GPU designs by 5% to 10%, and for network and communication designs the amount of power savings can go beyond 50% in some cases. Similar fine-grained gating conditions exist for memory, called memory gating, which can also save more than 25% of memory dynamic power. Data operator inputs can also be gated, known as operand isolation, and save 5% to 10% of dynamic power. Unwanted toggles on the inputs of a memory can be gated when the memory is disabled to save power, albeit small.”

This kind of analysis also exists at higher levels. “Consider data flowing across block boundaries,” says Roy. “If I have a producer and a couple of consumers, you may notice that the producer is very active. But if the blocks that are supposed to consume the data are idle, you may as well stop transmitting. The cross IP interactions with respect to activity have a lot more room to be explored.”

Change takes time. “The lack of understanding about how important scenarios are for power is the biggest mistake that teams are making,” warns Gupta. “This is not just for measuring power, but for creating a power efficient design. If the design will be idling 90% of the time, you need to design for that kind of operation. That is not well understood.”

Power and energy are concerns that cut across all aspects of system development. Development teams have to come together to solve these problems because there is no one person, or part of the organization, that can do it on their own. Design and verification teams both play an important role. Each phase of design has a set of responsibilities for the appropriate analysis and optimization, but vector generation traditionally has been the role of the verification team. This now drives the optimization strategy by providing the vector sets required at each stage of the flow.

Leave a Reply

(Note: This name will be displayed publicly)