Most chipmakers do RTL power optimization today, but where are they in terms of the possible power savings and what comes next?
Today it is difficult to find a design that does not consider some kind of power optimization. Mobile needs it to preserve battery life, data centers need it to reduce operating cost, and many are finding they need it to meet tougher regulatory requirements. In a survey conducted two years ago, there was no segment of the industry that was not taking a serious look at reducing their power profile.
“Clock gating is the baseline entry into power optimization” says Mark Baker, director of product marketing at Atrenta. And Koorosh Nazifi, engineering group director for low power and mixed signal initiatives at Cadence, adds “There has been a lot done to optimize systems with multiple Vt, multi-bit registers, clock gating and many other techniques.” William Ruby, senior director of technical sales at Ansys/Apache, estimates that “Through the widespread use of synthesis-based clock gating, I can safely say that we have reduced dynamic power 30% to 40%. This does not include process scaling effects. Back-end optimization techniques are responsible for another 10% to 20% savings, primarily in leakage using techniques such as multiple Vt.”
Historically, there has been more work on things where the intent can be extracted automatically from the RTL description, such as clock gating, but now, due to pressure from industries such as mobile, more people are looking at techniques such as dynamic voltage and frequency scaling, power gating to reduce leakage power, and want to take the existing optimization further.
“Certain things can be done automatically – this is a true optimization” says Ruby, “and is done in the context of multiple constraints. At higher levels, RTL and above, you don’t have a good sense of physical constraints and so this is not optimization, it is power reduction.”
Baker outlines the process for optimization. He says it requires three things: estimation, optimization and verification.
Estimation is based on the propagation of activity through the design. There are a couple of approaches for this. The first is to drive the DUT with simulation activity. When doing this tool vendors claim to be within 15% to 20% correlation of actual silicon. But it is possible to start the process earlier than this, before the testbenches have been constructed. This is vectorless analysis, based on toggle rates of the inputs that are propagated through the design. Atrenta supports both approaches and Baker says “as more information becomes available, it becomes possible to provide better estimates. So, as traces become available they can be used to drive additional analysis.”
When it comes to optimization Guillaume Boillet, technical marketing manager at Atrenta, says that “while every design is different, we have gone through enough designs that we get a sense of the gains that have been achieved. When automatic clock gating is applied using an RTL synthesis tool, you reduce power in the range of 15%. When a tool that can enhance the quality of the enables is used, you can expect an additional 10% to 15%.”
Adds Ruby: “If you have a clock line toggling with no data activity, this is a power consumption bug. If we find a register with no clock gating, how do we compel a synthesis tool to add gating? We can add an enable signal, but this will cost us additional logic so we need to be able to compute the power cost of the addition logic to know if there really is a power saving.
Verification normally will use equivalence checking to ensure that the changes do not affect functionality. Baker cautions that “if we fail to validate the changes, we drop them as part of the auto-fixing.”
Ruby provided another interesting statistic that shows the levels of success. “Dynamic clock gating efficiency is the percent of time that gated clocks are shut off. Static clock gating efficiency is the percentage of registers that are clock gated. Of these, dynamic is a much better metric and we are seeing mobile designs reaching about 80% to 85%, which means that 85% of the time the clocks on the chip are shut off.”
But more can be done. “The trend these days is more towards physical-aware clock gating,” says Koorosh Nazifi, engineering group director of low power and mixed-signal initiatives, Cadence. “When registers are placed in different areas of a chip and when the clocks are connected, the gating element could adversely affect clock skew and balancing of your clock network which can constitute between 30% and 40% of your SoC power consumption. Being able to more intelligently apply clock gating based on physical placement of the registers becomes more critical.”
Not everyone is looking at the physical level to make improvements. “This is where the least impact can be made,” claims Shawn McCloud, vice president of marketing at Calypto. “You are looking in the 10% to 15% range. The real opportunity to impact power is above RTL, where you are looking at the architecture of the implementation.”
There are many ways to look at power consumption. “Today, most of the power is being taken by the memories” points out Jay Roy, group director for RTL Power for Cadence’s FED Group, “which are around 35% to 40% and growing. Memories these days are being made with multiple modes such as sleep mode, half sleep, etc., and the effective use of those modes can effectively reduce the power consumption.”
Today, a lot of people are working on the specification and implementation of multiple power domains. This is primarily for reduction of leakage power. “Specification of power intent is manual,” says Nazifi. “It is automatically implemented at RT level today but could be pushed up to higher levels but there are no tools or automation available.”
“You have a design at the RT level and you have a desire to partition the design into multiple power domains, using either UPF or CPF,” explains Oz Levia, vice president of marketing and business development at Jasper Design Automation. “The [tool can] infer the changes that need to be made in the design. Then there are a host of checks that are automatically generated. Some of them are very simple, such as checking for isolation cell insertion, or that retention buffers are in the right place. Others are more functional in nature such as where there is a reset signal or specific clocking mechanism or required power-up/power-down sequence.”
Adds Tom De Schutter, senior staff product marketing manager for virtual prototyping at Synopsys: “You may think [that a printer company would] have less of a problem than a mobile company, but new regulations, especially in Europe where sleep mode is very important, means that power consumption in this mode must be minimized. It was a challenge to get the entire device into a state where it consumes very little power and yet could come out of it.”
Virtual prototypes are one of the ways in which the hardware and software can be executed together to solve this kind of problem.
Some people are looking to higher levels of abstraction to get additional power savings. “Sequential optimization techniques are the next level of savings,” says Baker. “Stability and don’t-care conditions can create additional opportunities where enables can be added to block propagation.”
Sequential clock gating performs analysis over multiple clock cycles looking for other ways in which the clocks can be gated. “Engineers have been analyzing designs, but this is a painful and time-consuming analysis to do by hand,” says McCloud. “It’s looking at the input vector set to if it is representative of my behaviors, considering the extra logic that I would need to insert for gating the fan-in or fan-out, which itself consumes more power, in order to estimate the overall power savings. Sequential analysis does this automatically and finds the places to insert or strengthen the enables. You can do it vectorless, but most people depend on representative vectors in order to make a good tradeoff.”
“As soon as the formal team catches up to do the checking,” explains Cadence’s Roy, “then higher-level optimization will become widely adopted.”
Many are still looking to high-level synthesis (HLS). “HLS tools have always had the notion of a ‘what-if’ analysis,” says McCloud. “This was initially for performance and area and it is only recently that power has been considered. The concept of being able to use HLS for optimizing power is still very new and not many people are doing it today.”
So has power become a first-class optimization citizen? One would think so, but McCloud says no. “Power is still a second-class citizen. Designers first make sure the functionality is correct, then they concentrate on performance, then they are within area budget. After they have these three things done, then they look at power. Often they don’t have a lot of time because they are close to tape-out. This is why most designers want automated tools that can quickly find some decent power savings.”