Clock gating and power gating were a good start, but there is much more that can and should be done to minimize power.
Concerns about the power consumed by semiconductors has been on the rise for the past couple of decades, but what can we expect to see coming in terms of analysis and automation from EDA companies, and is the industry ready to make the investment?
Ever since Dennard scaling stopped providing automatic power gains by going to a smaller geometry, circa 2006, semiconductors have been increasingly limited by power. The initial focus was on leakage, and that was easily solved by powering down parts of the circuit that were not being used. But with the migration to finFETs, leakage became a lot smaller fraction of total power consumption. As a result, gains had to come from dynamic power optimization.
The optimization of power takes time and resources. “Everybody says low power is important, but when push comes to shove, there’s a deadline to get the chip out the door,” says Marc Swinnen, director of product marketing at Ansys. “It has to meet DRC, it has to meet timing, but it doesn’t have to meet a certain power limit. Many design teams see power optimization as one more burden placed on them, one more thing they have to chase down when they are already overloaded. If someone says the power consumed by the chip is pretty high, the response may well be, ‘We did the best we could.'”
The situation is improving. “Design companies did not even have a power methodology in place 10 years ago,” says Mohammed Fahad, principal technical marketing engineer for Siemens EDA. “Power being considered a critical issue is a recent phenomenon. They have started optimizing for sequential clock gating. They started with targeting the flops consuming or wasting power first, then further enhancing their methodology to find the gating potential elsewhere. The maximum potential for saving power is at the early stage of the design cycle, namely RTL design. The more you shift to the right of the design cycle, the less impact you have for saving power.”
Certain industries are more advanced than others. “Chipmakers are much more worried about heat and power issues, primarily being driven by the amount of compute that AI needs today,” says Jay Roy, group director for SoC Power Continuum at Synopsys. “All AI designs have of a lot of compute with some memory spread in between for the transfer of the data. And that is making the power problem very significant, and almost uncontrollable.”
Low power does not come from a tool. “People aren’t as naive anymore, but in the past they were looking for a silver bullet to reduce power,” says Ansys’ Swinnen. “If you do low-power RTL design, that may save 10% to 20%, and then you do low-power clocking and that saves you another 3%. Sometimes with these small gains they aren’t interested. Then you do low-power replacement and that saves another 5%. They are looking for the silver bullet, which doesn’t exist. Power is really a systemic mentality you have to have throughout the flow, and every step impacts power to some degree. It’s adding up 3%, 5%, 4% here and there, and at the end you have a low power chip.”
Many of those gains are from the back end of the development cycle. “EDA tools and most of their design flows have been adept at handling the intricacies of new process technologies like finFET or FD-SOI,” says Anshuman Singh, power and performance architect for Ambiq. “However, the same tools haven’t done so well with architectural tradeoffs and system-level analyses.”
Doing system-level optimization is easier said than done. “The efforts in terms of methodology, compute resources and engineering talent to deploy system-level techniques are definitely non-negligible,” says Guillaume Boillet, director of product management for Arteris IP. “Only the most advanced and power-savvy design teams invest in those.”
Dynamic power
The focus today is on the reduction of dynamic power, which is directly related to the amount of activity flowing through the system. Many people view this in a fairly rudimentary way, such that if there is a toggle on an input and the output changes, there is going to be some dynamic power dissipated by the device.
There are various levels of sophistication in minimizing that. “The EDA industry provides power optimization that is fairly well grounded, well understood, but very limited,” says Synopsys’ Roy. “That means that given a design, and some profile of activity that flows through the design, the implementation tools are fairly mature in terms of creating the best design structures given the activity that flows through it. Many techniques can be applied during synthesis or place and route. For high frequency nets, you can promote them up in the metal layer chain so the caps are reduced and the power is reduced. While about 15% to 30% of the design is timing critical, the rest is not, so you can recoup some power from those. Those things are fairly well understood and are in the tools.”
Perhaps the biggest gains comes from clock gating. “Clock gating saves power by pruning the clock tree, such that the flops are not activated when there is no change in the inputs,” says Siemens’ Fahad. “That has the maximum impact for reducing the dynamic, or the switching power, of the overall design.”
But much more is possible. “The areas where automation applies represent only a subset of what can be done to reduce overall power consumption,” says Arteris’ Boillet. “In particular, the automatic dynamic power reduction techniques are very local in nature and are highly dependent on the quality of the activity vectors, which are themselves very difficult to generate.”
Getting to the next level is not so easy. “What is needed is to raise the abstraction of the activity itself,” says Roy. “We can’t just look at toggles, we have to start looking at a unit of activity that happens if I run an instruction in a CPU, or even at a high level such as a packet of data passing through a system. That opens up a lot of room for power reduction, both at the RTL designer level, as well as at the SoC or on the architectural level.”
System-level optimization
The more you shift left, the greater the possibility for gains. “The elephant in the room is the activity flowing through the design,” says Roy. “This is one area where the industry has not spent much time. There is a big opportunity to reduce the power if you can reduce redundant activities flowing through the system. For example, design techniques, such as multi-stage pipelines are tuned in terms of timing, but there is a room to start looking at the activity flowing through them, which can affect power.”
In a lot of cases, it becomes necessary to include the software in that analysis. “It is at the architecture and system level that the bulk of the gains can be achieved, and this is where the efforts are now spent,” says Boillet. “Being able to tightly integrate macro-level hardware power reduction mechanisms with the software is key. As an example, by providing a standard interface to the software through which an intelligent interconnect can provide traffic information, the OS can leverage that to activate macro-level techniques. Those have much higher impact on overall power consumption than local techniques.”
Today, software workloads feed forward to hardware. “Software applications can be run on an emulator, and we can get the power or the toggle profile of the SoC,” says Fahad. “We use that toggle profile to drive the RTL power optimization platform. It is important that we are creating the right vectors from the software application, and this can happen when we run the software application with real-world scenarios. We can use the vectors generated by the emulators, and they can even use those vectors to estimate the power or optimize it.”
Software development is detached from a lot of what happens in hardware. “How do you reduce traffic associated with the movement of data and increase the cache hit rate?” asks Roy. “If you have a cache miss, you’re going to use a lot more energy to get back the previous data. The underlying tools could provide that analysis and become a lot smarter if they also have the power models of the underlying semiconductors that they are trying to simulate or emulate, to be able to give that feedback.”
That feedback loop will become increasingly important. “If software engineers had a better appreciation of power consumption within the die, and within sub-blocks, they could make better decisions when developing the software,” says Stephen Crosher, director of SLM hardware strategy at Synopsys. “How do you, through the lifetime of the device or lifetime of a product, upgrade the software and understand how that’s going to impact the complete range of devices that are out there in the field? The more information that is fed back to software developers from the field, the better they can assess the power impacts, and we’re going to see that evolving quite a bit. Trying to find more granular and distributed ways of assessing power throughout the architecture of the chip is a very interesting space.”
To achieve those goals, the toolchain has to adapt. “When people write software, there is a compiler in between,” says Roy. “Compilers have been tuned to optimize for speed of running the software. If I take this C construct and I create one sequence of assembly instructions versus a second set of instructions, the total runtime for the first set is going to be 10 cycles, and for the second one it might be 8 cycles. What the compiler does not understand today is for that sequence of instructions, what is the power or energy profile? That is information the compiler does not have because the hardware models that are available to the compiler only have cycle information, or maybe timing information. They have no power information.”
This lack of models impacts all levels of design. “Determining the firmware and software settings for the optimal power and performance is still mostly a matter of iterative testing and characterization,” says Ambiq’s Singh. “With IPs being increasingly built for reuse, such settings continue to increase. Consequently, manufacturers are left to their own devices to determine the best implementation for features and functions. For example, whether a task like voice recognition is best performed locally on the wireless earbud through dedicated neural network engines on the edge or sent to the phone, manufacturers must decide in the absence of appropriate power and energy consumption data. Hence, system, network, and architectural-level power and energy analysis remain the next frontier for semiconductor design.”
Lower voltage
While there is a lot to be gained through high-level and system techniques, there is still a lot that can be saved at the back end, particularly when it comes to voltage reduction. “Partially due to geometry scaling (to keep the electric fields as constant as possible), but also because of power, there’s been a strong drive to lower the voltage on chips,” says Ansys’ Swinnen. “Voltages have been driven down to less than 0.7 or 0.8 volts. This ultra-low-voltage regime brings issues in design, especially with power integrity. When chips ran on 1.5 volts, distribution of power on the chip could afford a little bit of voltage drop, but when you’re down at near threshold voltage supply, you can’t afford this. Designers are under extreme pressure to keep the voltage drop to a minimum, and minimize the power distribution network space.”
Variation can be the nemesis of these designs. “We need to talk about in-field minimization of supply voltage,” says Synopsys’ Crosher. “Looking at Vmin on a per-device basis you can have better informed optimization schemes, such as DVFS and AVS, because you found the floor level for that particular chip. During test, people want to understand that critical level, that critical path failure level, and then drop the supplies down as far as you can. This involves doing critical path margin analysis. You then compare that Vmin, or lowest supply level in test, compared to in-field. The trick is that those critical paths can change. It can change from device to device, and it can change depending on temperature. if you can get away with a low supply, but still achieving the same data throughput and the same performance criteria for the chip but on a lower supply, that’s incredibly valuable.”
That analysis can be extremely complicated. “The problem is not just keeping the voltage constant, which has its own requirements, but even with a little bit of voltage drop, it slows down that cell,” says Swinnen. “Even if the voltage drop across each cell is acceptable, multiple of these cells in a single path, each seeing some voltage drop, means that path is going to slow down, and that’s going to impact your timing. But voltage drop is dependent on both the power it draws when it switches, and also when its neighbors switch. From seven nanometers down, neighbor switching has the bigger impact on the voltage drop. So, suddenly your timing is dependent on the switching of the neighbors around you, and that impacts your timing and how do you calculate that? What is the worst possible switching scenario that will give the biggest voltage impact on the specific path?”
Some of that is learned over time. “There’s a lot of information coming out of the chip, at various stages of its life, through design, production, and in-field that is also helping some of the tools improve power optimization in the design phase,” says Crosher. “Without feedback, we’re doing this overly pessimistic, over-margined design approach. But we can take a less fear-dominated approach, thinking about worst cases, and actually take a reality-dominated approach by seeing how the silicon is being manufactured. That information can be passed back into the design process to make a less over-margined, overly pessimistic approach. Reality analysis then is fed back into the design phase.”
There is no push-button solution to power optimization. You get to an improved product through a thousand cuts. The number of cuts available is increasing, but in many cases the EDA industry only can supply the analysis tools. Engineers still have to make the optimizations, and that means that the cost of those power reductions always will be weighed against the economic impact they have.
Unless there becomes an Energy Star level that has to be met, or the design performance is limited by power, power will remain a secondary design consideration.
Related
11 Ways To Reduce AI Energy Consumption
Pushing AI to the edge requires new architectures, tools, and approaches
Is DVFS Worth The Effort?
Dynamic voltage and frequency scaling can save a lot of power and energy, but design costs can be high and verification difficult.
Tradeoffs To Improve Performance, Lower Power
Customized designs are becoming the norm, but making them work isn’t so simple.
Hidden Costs In Faster, Low-Power AI Systems
Tradeoffs in AI/ML designs can affect everything from aging to reliability, but not always in predictable ways.
Searching For Power Bugs
To find wasted power means you understand what to expect, how to measure it, and how it correlates to real silicon. We are further from that than you might expect.
Power optimization could be “push-button” with AI if there were fast analog simulators that the AI could use to evaluate its experiments. However the SystemVerilog language everyone uses for digital design has no analog capability, and there’s no (IEEE) plan to add it.
You can’t validate DVFS with the current tools, let alone optimize anything.
Hi Brian,
Interesting articel, however I cannot agree with the last chapter: “Unless there becomes an Energy Star level that has to be met, or the design performance is limited by power, power will remain a secondary design consideration.”
There are many areas, where power consumption is the primary concern. There are mostly ICs for battery-powered devices. And there are planty of them and they are going more and more into consumer mass market. I can see it especially in smartphones and laptopts, where people are more concerned about how long can a device work on battery (important benchmark aspect), but additionally how much power does a device dissipate (some devices get very bad reviews when they get too hot in users hands).