Reducing Power At RTL

Dependencies, different methodologies, and a growing number of variables make this an increasingly challenging and complex problem.


Power management and reduction at the register transfer level is becoming more problematic as more heterogeneous elements are added into advanced designs and more components are dependent on interactions with other components.

This has been a growing problem in leading-edge designs for the past couple of process nodes, but similar issues have begun creeping into less-sophisticated designs as the overall power budget is reduced. Balancing tradeoffs and understanding the context of interactions is especially difficult in devices in the nanowatt or microwatt range, where battery power is extremely limited and energy harvesting is an essential component. Here, for example, very small leakage currents multiplied by the number of devices can cause significant power loss, even in idle mode.

“Only with an accurate power estimation can an ultra-low power system be designed securely,” said Björn Zeugmann, member of integrated sensor electronics research group at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “It’s necessary to care about the worst case, which means the highest current flowing, to construct resilient structures. Since this will not be the most common use case over time in most applications, a realistic use case for simulation is needed to get a realistic power analysis. This real case can be obtained from mission profiles of previous work and is needed for estimations of live time in systems with battery cells, for example. An accurate power analysis is not important in system designs in which power consumption, self-heating and other relating effects play a secondary role.”

Power is a complex topic, and fully understanding its implications in a design depends on a large number of variables that need to be taken into consideration.

“Low power designs impact the functionality of a chip,” said Preeti Gupta, director of RTL product management at ANSYS. “In terms of something like clock gating as a technique, there are challenges when a clock gate is inserted as it can cause glitching if not handled properly. Or consider power gating as a technique where switches are added to the design. How do those switches and isolation strategies and retention strategies work? These add to the total verification cycle, and when there are voltage islands and dynamic voltage and frequency scaling (DVFS), all of these add to the total verification requirements for a particular chip. As such, there’s no doubt that more must be done for low power designs.”

Commercial tool vendors offer a variety of options specifically for reducing RTL power. No one size fits all, and no tool does everything. For example, there are tools for guided reduction for RTL power versus automatic reduction of power. But in either case, there is a recognition that the biggest power savings happen early in the design cycle.

“This has given rise to a robust set of tools and methodologies around the power efficiency of RTL,” said Rob Knoth, product management director at Cadence. “Power efficiency analysis at the C-code architecture level is even more impactful. A quantitative approach leveraging modern digital implementation and analysis tools helps to better understand the tradeoffs between frequency/bandwidth and power. This is relatively straightforward for big architecture decisions, but becomes much more difficult for attacking the ‘long tail’ of power efficiency improvements throughout a design.”

Figure 1: Power Optimization Potential throughout the design cycle (Delp, 2009) Source: Accellera/ISQED

The quality of power efficiency analysis is directly related to the quality of the functional stimulus used. But it also depends heavily on the capabilities of the verification and RTL design engineers.

“Adding power efficiency to their list of tasks must be done carefully,” Knoth said. “That leads to the question, ‘How can EDA help?’ There has been a lot of work in this area divided into two main camps — guided and automatic optimization. This optimization can be done either at the RTL level or integrated into the implementation. How do you decide?”

Guided RTL optimization is the process whereby a tool provides suggestions to the designer to improve their power efficiency. With guided RTL optimization, the human is still in charge of making the final decision. Automatic RTL optimization removes the human from the loop and the EDA tool makes the changes directly to the RTL. This is helpful where time is of the essence, but comes at a cost with regard to quality of results, readability of the RTL, and overall trust in the system.

For these reasons, Cadence worked with customers to create a third approach—using guided RTL optimization for the biggest changes, and using automatic optimization as a part of the synthesis flow to attack the long tail, Knoth said. “This ensures there is no miscorrelation between the opportunity analysis and what implementation will see in terms of power, performance and area (PPA). It is also the best use of the critical human and compute resources for a project.”

What influenced this approach comes down to verification, bandwidth/schedule, reuse (of IP and physical design flows) and integration. Power reduction is worthless if verification cannot ensure that the optimization doesn’t change the mission critical function of the product.

Knoth noted this is true for both guided and automatic optimization and for functional and formal verification methods. If only using guided changes and the number of changes is small, it can be a tractable manual problem. As the number of changes grow, it requires a robust connection between the optimization and verification technology. This is why synthesis and formal technologies are tested against one other during the development cycle. Ensuring that a formal tool can verify complex data path and control transforms is critical to realize power efficiency they can deliver.

But there are risks in all of this, too. With designers squarely in the critical path to a product, a lack of strong correlation between what the RTL power efficiency analysis believes is an opportunity and what will eventually materialize on silicon can cause future problems.

Guided versus automatic?
If the RTL is IP from a third party, or planned to be re-used across other projects, then any modifications to the RTL are not straightforward. Changing RTL can create a verification nightmare. Automatic optimizations during the synthesis flow—backed up by strong formal verification between the gate-level netlist and the RTL—is considered the best solution.

For teams delivering RTL IP to customers—internal or external—they are incentivized to produce the highest-quality RTL possible, and often power is a key selling point. Tools that can help IP designers squeeze out all the wasted power from their RTL is critical. Its readability is also important; rendering the automatic RTL optimization a non-starter. This model can’t rely on what synthesis flows the implementation teams use. Guided RTL optimization is the best fit, Knoth believes.

Shailandar Sachdeva, senior staff applications engineer at Synopsys, sees this differently. “Where automatic reduction works is in a case where the designer is not writing RTL, such as machine-written RTL from possibly HLS (high-level synthesis). Let’s say a design engineer wrote C code, it was generated through some HLS tool, and the tool spits out RTL. So there is no user intervention, and therefore no RTL designer is actually maintaining that code. This situation lends itself to automatic RTL power reduction. The reason is because the code is written by a tool. Nobody’s actually going to maintain it going through that flow.”

Further, sometimes the design teams will have a lot of third-party IP, over which they have no control. “This IP could be off-the-shelf, purchased from a vendor or it could be generated by HLS by a separate team,” Sachdeva said. “They may or may not have the RTL, and even if they have the RTL they have to understand it. But even then, they don’t have the original design of it. This lends itself to an automatic power reduction method.”

There are no shortcuts, and this is not a simple problem, Sachdeva asserted. “When the tool modifies the RTL, who will maintain that RTL? Let’s say the power designer who’s running the tool was given RTL. They modify it, and say, ‘Here is the RTL.’ But the RTL designer will say, ‘I can’t maintain this RTL because I don’t know what you have done.’ It becomes a huge issue. Ownership of that RTL and ECOs are big issues because nobody knows the RTL, so it becomes tricky. On the other hand, let’s say the designer actually wrote the RTL. They own it, and can say what modification has been done. Then, if downstream it is realized that there is an issue, that designer will have a better understanding of it and can control it more intelligently rather than brute-forcing it.”

Other issues
Integration is also a consideration. “Accuracy of the power efficiency analysis is critical to both techniques,” said Knoth. “You cannot afford miscorrelation between what the RTL power estimation tool believes the synthesis tool will do, what the place-and-route tool will build for a clock tree, or what the signoff power tool will calculate. If you have four different tools from four different EDA vendors, when the tools don’t correlate, who’s responsible? For guided RTL reduction, miscorrelation will cause critical path slips. You’re wasting time from the most schedule-critical resources to chase phantoms. For automatic reductions, you need to have the power efficiency analysis and optimization engine embedded inside of the synthesis and implementation flow. It’s important to consider not just the power reduction, but also the timing/area/congestion impact of the changes. An example is combinatorial clock gating with XOR gates. It is relatively easy to show how it could reduce clock power. It is much harder to make sure it doesn’t introduce new implementation issues with area/congestion/timing. Only a tightly integrated solution will be safe when considering these tradeoffs.”

Practically, many times the choice comes down to the time available, observed Mohammed Fahad, product engineer at Mentor, a Siemen’s Business. “Given you have enough time to closely look at all the suggestions, and very meticulously implement the changes suggested by the tool, you would go with the guided approach. But if you don’t have enough time at hand, and the time-to-market window is closing soon, then you would go with the automatic reduction. There are definitely many pros and cons of both. For example, in the manual flow, if you would like to retain the control of your RTL, but there is overhead for the designer doing extra change in the RTL — which wasn’t a part of their mandate, but the tool is suggesting something — then they will have to do it. The rest of the downstream flow of verification is additional. In the case of automatic optimization, if it gives a quick reduction of power at the same time, as in, ‘It’s a tool-generated RTL, you have concern of controllability of the RTL, credibility of the RTL and then the verification.’”

Additionally, when an engineering team is implementing guided reduction, they have the ability to back up to the previous step if they make one of the suggested changes but don’t like the result. But this adds its own set of challenges.

“Suppose the user is looking at the suggestion provided by the tool and implementing that change by himself,” Fahad said. “Of course, this is a machine suggestion, so there could be multiple ways of doing the same thing. Sometimes a user gets the hint that a particular flop is gatable and has redundant power consumption. They may come up with their own optimized logic, which is perfect. The whole idea is to suggest that particular flop is consuming more power than necessary, so they sometimes implement their own changes. And sometimes they just look at our changes, look at our position, and implement those. But an important consideration is whether it is really easy to go back to the original RTL in the automatic optimization. If the machine has changed the RTL, how do I go back to my original RTL if I don’t like it, or something goes wrong in verification? How do I undo something which is done by the machine? That’s the main concern with the designers.”

Fortunately, the solution is that the machine writes the optimized RTL with an overriding signal always, he pointed out. “All of these suggestions, all of the changes that the tool implements automatically, are implemented with an OR gate, having an override signal, that we call CG override (clock gating override). So if you don’t like our changes, just put that in, and the RTL is as good as the original. It’s very easy to undo what the tool has done.”

The bigger picture
One of the main issues with RTL power reduction comes down to automatically writing power-optimized RTL, which is a very complex problem.

“The biggest challenge that we face is the bad logic issue, whereby we implement something and it causes a chip failure,” Fahad said. “Fortunately, we’ve never had a chip failure, but we get caught during the verification downstream. Gating verification is able to catch whether there is a problem with an assimilation mismatch between the original functionality and the new power optimized functionality. Bad logic is the biggest problem that we face, even though we have a formal verification means of equivalence checking between new RTL and the old RTL, and we have verification methods used by the customer. Everything is in place. But despite that, in some downstream flow, the bad logic is caught switching, which essentially is the wrong implementation done by the tool causing the functionality to change. This is the biggest challenge that we face now.”

For this reason, power is the burning problem in the market. “We never had power as a problem in chips, but in the past decade or so — ever since the emergence of handheld devices, such as mobile phones and laptops, etc. — the efficiency of battery power has become the most important aspect of a device. We used to have frequency, speed, and many other factors. The race was in a different paradigm. Now, the race is about how long the battery can last,” he said.

Given all of the interactions among device functions, along with the tradeoffs concerning how to get the most optimized power solution, the buzz and activity around power reduction will continue to challenge all design teams and tool providers that participate in this space.

“Power is different,” Fahad said. “Power is very complex. Power is very unpredictable. Power is heavily input-dependent. Power is heavily flow-dependent. Power is process-dependent. Power reduction is functionality-dependent. For all of these reasons, power reduction is not a simple problem, and it’s always interesting to be part of finding solutions to a difficult problem.”

Leave a Reply

(Note: This name will be displayed publicly)