Orders of magnitude improvements in performance and efficiency are possible, but getting there isn’t easy.
Energy optimization is beginning to shift left as design teams begin examining new ways to boost the performance of devices without impacting battery life or ratcheting up electricity costs.
Unlike power optimization, where a skilled engineering team may reduce power by 1% to 5%, energy efficiency may be able to cut effective power in half. But those gains require a significant rethinking of the entire architecture of a system, including how and where processing occurs, what functions are prioritized, how long they need to run, and how all of this is controlled and optimized within a system.
“For engineering teams working on IP design, this is very important, but it’s most impactful for companies that are more vertically oriented, and which own the whole stack — from the firmware that will be written on the device, to what the device is actually going to be,” said Rob Knoth, product management director in the Digital & Signoff Group at Cadence. “There you’re able to achieve more when you’re optimizing that whole system. The bigger your solution space, the bigger the set of variables you have to work with, and the bigger impact you can have. It’s not going to happen for everyone, but for the people who do operate at that level, you’ll see them be able to make some pretty transformational gains.”
This is essential as devices are connected to batteries, and as the amount of data that needs to be processed by these devices increases. So what previously was a “nice-to-have” increasingly is becoming a competitive edge. This is as true for electric vehicles, where there is a battle over which company can deliver the longest range per charge, as it is for computers and smart phones, where battery life is a selling point.
“The lifecycle energy requirements of anything, such as a server or a mobile phone, have to be low,” noted Qazi Faheem Ahmed, principal product manager at Siemens EDA. “People typically tend to focus on power at the IP level, and they might try to reduce and optimize power by a certain percentage. Let’s say they save 20% dynamic power. How much does that actually contribute to energy efficiency at the system level? Sometimes it might not do much. You might see overall energy efficiency gains of even less than 1%. And when energy equals power/time, time becomes an important factor because it tells us something about the way the block functions.”
The best way to deal with this is at the system level. “At an SoC level, some of the blocks might not have high toggle activity, but they might be active for quite some time,” said Ahmed. “Other blocks may have bursts of information coming in, may have high toggle activity, and then just remain silent. If we look at the amount of work done, in that case, the amount of work done by a block that does not toggle so much but dominates the functionality most of the time might actually end up consuming more energy. And that might be a good place to start for optimization.”
These kinds of tradeoffs are becoming increasingly pervasive as designs become more customized and heterogeneous. Lowering the power is fairly straightforward, though. Reducing energy is more complicated, and not everything warrants the effort.
“Generally, we make sure we are low power,” said Philippe Luc, director of verification at Codasip. “To combine this with energy efficiency, we make sure that we use that power efficiently so that we can make the task as small as possible and switch off the CPU when the task is complete. Specifically, I make sure that I am low power, but I would say I don’t care about the energy. I just make sure I don’t waste the power to do the task. Saving energy comes down to switching off as quickly as possible. We want CPUs that are fast and efficient in power, and that will bring energy low. We talk about power but in the end what’s most important is how long my phone will last before charging.”
That isn’t always the case, however. “From an architecture perspective, let’s say that I have to complete a task X,” said Godwin Maben, a Synopsys scientist. “Either I can complete the task very fast and sit idle, or I can take my time to finish the task, as long as I know I have to finish within a time period. Let’s say I have 50 clock cycles to finish this task, but if I really crank it hard, I can finish it in five clock cycles and wait longer. Power-wise, finishing really fast and waiting idle is not good, whereas if I take more time to finish but I reduce the frequency, my energy is better. It is a question of what my focus is.”
Fig. 1: Designing chips for greater efficiency. Source: Synopsys
Analyzing for energy, making tradeoffs
One of the key challenges in future designs will be prioritization at an architectural level. For optimal energy efficiency, architectural analysis must be performed at the outset of a design.
“When doing architectural analysis for a design with multiple cores, such as an SoC for mobile, there may be five or six processors,” Maben said. “You could run all six processors at 1GHz, or decide to call this processor as needed, run at a lower frequency, but use more processors at low frequency than one processor at a high frequency.”
Software scheduling is part of this decision process. “The first thing we need to understand is the average power versus the peak power that is needed for a certain interval of time. You might need a peak burst, then have an average power over a period of time. The system should be designed in such a way that this short peak outburst should be able to receive power. Even when you finish in five clock cycles, versus 80 clock cycles to finish a job, even in the 80 clock cycle scenario, if we slow down, there will be sharp outbursts in between, because there will be requests to the processor to do something in that period. So it is not that this processor is dedicated to finishing jobs, at all times. It means there is a lot of scheduling that happens.”
In the past, this was often done sequentially. Increasingly, it needs to be considered concurrently. “The industry already speaks of shifting left for hardware/software co-development, but there also is a trend to shift-up for power, energy and thermal optimization,” said Frank Schirrmeister, senior group director for solution marketing at Cadence. “Today, it is fairly common to link activity data from simulation and emulation to the power analysis based on actual technology characteristics, and we refer to this as dynamic power analysis. In an era of extensible processors and high-level synthesis, this process now shifts up. Users can take the higher-level description and generate RTL variations as input into the power analysis and optimization flow. For processor variations and different partitioning of software and hardware, the impact on power, energy and thermal aspects becomes assessable at even earlier levels.”
This is especially important because much of the power management is controlled by software. “It will be even more important to verify the power-firmware functionality before switching on a newly assembled component that uses a new configuration of chiplets,” Schirrmeister said. “We will likely see more demand for virtual prototyping and multi-chip emulation for such use models.”
Another consideration when making architecture decisions involving multiple processors or compute engines is whether static dynamic power dominates. “This is very critical to making decision,” Maben said. “For example, if the static power dominates, the optimal solution is to run fewer nodes at higher frequencies. But if dynamic power dominates, it’s preferable to run more nodes at lower frequencies. Again, it’s better to run fewer nodes at higher frequency versus more nodes at lower frequencies. The moment this decision is made is when the scheduling comes into the picture.”
This requires such techniques as dynamic voltage and frequency scaling (DVFS), dynamic voltage scaling (DVS), or adaptive voltage scaling (AVS). “At the end of the day, higher frequency is higher energy, no matter what,” he said. “When we use DVFS, for example this has to be linked to the software scheduler to decide whether to turn on more nodes. If it is dynamic and more nodes need to be run, it will call into play the power management unit. The scheduler is going to say, ‘I am going to run processor 1,3, and 5, so lower the frequency from 1GHz to 500MHz, because I’ve scheduled three cores.’ In the case where the scheduler knows the static power dominates, it is going to say, ‘I’m going to run only one core, but the frequency is 3GHz.’”
Fig. 2: Different techniques generate different results. Source: Synopsys
Just changing the frequency doesn’t make any difference, Maben said. “You have to change the voltage, because the higher the voltage, the higher the frequency will be. From a power perspective, switching power is CV2f, so there is a direct impact. The moment we lower the frequency it means we can reduce the voltage. When I increase the frequency, I need to increase the voltage. These go together.”
Energy optimization options
To achieve energy optimization, different architectural decisions need to be taken at the system level and the IC level. “Typically what happens at the IC level is there may be one, two or even four different kinds of scenarios, and they might run them,” said Siemens’ Ahmed. “These might come from compute average power, and they’ll see an idle power case, a peak power case. Then they will try to determine with the idle power if there is leakage power, and there’s not really work being done, for example.”
Peak power, also referred to as maximum average power, is where the most work is happening, and it consumes a lot of power. In between, where there are different kinds of normal functional modes, there may be power being spent. “Is this linearly scaling, depending upon how much work am I doing versus how much power I am consuming? That often turns out to be untrue because more power is consumed while not much work is being done,” Ahmed said. “That happens potentially due to wasted toggles in the design. To eliminate that, there are many strategies for optimizing power for registers and for memories. You can have some data that you write into a memory quite often, but it’s just a portion of the configuration. Here, maybe you can just write that on a flop and keep the memory only working when it’s needed, when you have large sets of data coming in. Or, you might want to do fine-grain clock gating or some micro-architectural changes to reduce the number of toggles to achieve some power reduction.”
These optimizations do not always translate into energy savings at the system level, however.
“It is entirely possible that some of the blocks you were focusing on at the system level on a real use case scenario,” said Ahmed. “Let’s say you have an emulator that runs out a really long trace of a mobile phone, like when someone is playing a game. In that case, some of the blocks might be active only some of the time, but a lot of blocks might be active most of the time. Considering that kind of information, you want to architect the power in a way that you can optimize the workload to maximize the energy efficiency. In order to do that, there are standard techniques at the system level for hardware, including assigning voltage islands so some of the blocks that are not performance-oriented might actually work at a lower voltage, and that saves a lot of power. Also, the SoC can be divided into multiple power domains. Some of the blocks that can be power gated to save the leakage power, as well. Interconnect memories can state the first toggle, and you can put the memories to light sleep. There are many methods to deal with that at the hardware level.”
Further, when architects work toward energy management, they will see the kind of functionality that goes into software versus what goes into hardware.
“At the software level, one of the things that can happen is that the resources may be sometimes completely used, or sometimes the resources might be partially used,” he said. “There could be blocks that the software may not be using optimally. You can add the OS level to manage the energy better by selectively turning off components when they’re not used, or when they’re partially used. You can also eliminate, for example, the resource wastage by developing resource management policies at the OS level that optimize the usage of the hardware resources that are available. In fact, people now have energy management programs built into servers so they can optimize the workloads to change the amount of work required, depending upon what kind of fidelity customers want on their application, for example. These are techniques that you can use at the software level.”
Modifying voltage, frequency
Making adjustments within the architecture to optimize energy consumption is complicated, and the problem gets worse as systems become more complex.
“You cannot just drop the frequency,” said Maben. “From a functional perspective, software is going to program and say it is going to a mode where the frequency needs to be reduced. But there is a delay in terms of when the software says to reduce frequency. First, frequency gets reduced, followed by voltage reduction. When both are in sync, a signal goes back saying, ‘I am at this level.’ The challenge is, if this takes too long, there is no point in doing it.”
Wakeup and scheduling time are key pieces in this equation. “How long does it take to schedule from one voltage to a particular frequency, and come back to the normal working? This has to be done upfront in architectural level analysis, where you build the entire SoC prototype using System C models, and you do transaction-level interaction,” Maben said. “At that point, it’s all transaction-level, and you will figure out for the system what the locking time will be. How long will it take? Does it make sense for me to go 0.4 or 0.6v? What are these voltages, and when do I go?”
Major phone makers today build in 10 to 12 sleep states, ranging from slight power down to full hibernation. Choosing which one of these states to apply and when depends upon different usage scenarios.
“What typically people want now, if they’re taking energy as a primary metric, is to look at how they can achieve, let’s say, a target of 10% energy efficiency on the SoC,” Ahmed said. “Some of it could happen on the software side, and some of it could happen on the hardware side. You need to have all the kinds of workloads that you expect the device to actually operate in. The better you can make that happen, the more you’re likely to understand which blocks in hardware are good for optimization, where you need to focus, and how much power you need to reduce to achieve that kind of energy efficiency. It’s almost like the way people look at power reports today. In the near future, people will be looking at power/energy reports. That will become really important.”
This will require some changes on the tools side, as well.
“Right now, people look at power at a summary level for the design or hierarchical level. There is leakage power, switching power, internal power, total power. There could be the same measurements for energy, such as switching energy, total energy, or total memory energy. These are the reports people want to look at. An energy number will be just as important as a power number,” he said.
The problem is when one decision is singled out, because energy only makes sense if considered at an SoC level, or when there are too many workloads, for instance.
“In this case, you might want to do some analysis, which means you might want to capture the behavior of your SoC, maybe even at the IP level, how energy changes when you make some micro-architectural choices, or when you choose different algorithms. These are the kinds of analysis people want to do when the goal is of energy efficiency,” Ahmed said. “Since current implementation tools are basically made for performance, whether you do synthesis, or whether you do place-and-route, performance is the primary criteria. If people want to have energy-optimized designs, these downstream tools have to start taking power, or maybe energy, as the primary criteria. So any optimization or any downstream techniques to choose the cell, for example, has to be driven from the point of view of energy, not just performance, depending upon how they set it up.”
Your “From a power perspective, switching power is CV2f, ”
phrase holds the key to a disruptive change for both power and energy. To break out of our local minimum(simulated annealing analogy) mental box we need to cease encoding logic states in absolute voltage levels. Device voltage thresholds force V too high. Delta voltage levels, frequency, phase,… all are ways to encode logic states which might lead to far tinier(or perhaps irrelevant) average Vsquared values to represent and manipulate logic. In order to move to a better power regime we must overcome many decades of complacency using absolute voltage levels to represent logic states. It will be painful. No pain, no gain. As encouragement compare the QRP contests ham radio operators have which communicate globally with 5 watts or less to our processors which squander power on the order of 100 times that to communicate clocking across 1 centimeter. I know I am ignoring the bandwidth difference. It still says we are wasting waaaay too much power on chip. Onchip clock distribution power use, being predictable, really carries no information at all despite being the energy prodigal son of chip design work. Changes to move to logic state representation using other than absolute voltage levels will most likely also inherently solve the squandering of chip power on clock distribution. Any voltage and frequency island strategy is an example of a solution having no global perspective.
________________
thewildotter