Reducing And Optimizing Power

Second of two parts: New power reduction and optimization techniques and approaches are being thoughtfully considered and developed. Here are just a few.

popularity

While power optimization/reduction techniques such as clock gating do help engineering teams improve designs from a power perspective, more can be done.

In fact, there are tools and methodologies under development to incorporate power in a more meaningful way. Part of that involves accurately pinpointing what designers should be looking for.

“If you look at academia or research that has been done, there are thousands of possible changes that you can do to your design to optimize power, but there is no time to try out all of those possible changes,” said Abhishek Ranjan, senior director of engineering at Calypto. “Which ones are going to give a benefit on a particular design and also provide a very quick feedback? The power exploration cycle is very, very long. If someone is exploring a new idea after a light bulb has come on and they want to implement some idea, they change the design. Then, of course, they have to make sure that the design is functionally correct and have to re-simulate that design to get new activity records and run the power optimization tool to figure out whether there was any power savings or not. And this was just one out of a thousand ideas they wanted to try but the cycle is very long. It can take days or weeks to complete the cycle.”

The problem, he said, is in the way teams are structured. Simulation is done by one team, IP is developed by a different team, and synthesis is done by a different team. By the time an RTL designer gets the feedback on power, it is too late. As such, designers are often afraid of even trying out anything because the feedback cycle is so long.

Ranjan asserted that this feedback loop must be shortened and that there needs to be a methodology to help the designer get very quick feedback on the impact of change without having to involve all these multiple different groups.

“This is the methodology of the future. What we have today is a very broken way of exploring the power in a manual way. We have some RTL power analysis tools, which help the user look at where the power is being consumed, but they don’t give an idea of where power is being wasted. Because power could be consumed due to logical reasons — the design needs to work at such high speeds or such high combination —that doesn’t mean that you can save power. There might be nothing you can do about it. So the tools have to be smart. It’s like Google Search: The reason Google Search is far superior to other search engines is because the top 10 search results tell you what you were looking for. That’s what the tools will have to do. They will have to direct the designer to exactly the place they should be working on and they cannot create a lot of junk, which is what is happening right now. [Existing tools] pretty much say that every flop has the potential for optimization but the number of flops in an SoC has crossed the 2 million mark and the designer cannot look at all 2 million flops to see. The tool has to be really smart and think that this technique, for instance memory banking or data path optimization, is going to work for this design. Then it has to focus on applying this instead of making it work on clock gating or memory gating, which might not realize any more power savings.”

A lot of work has started here and this year should bring some technology to market. Similar to the integration seen in Google apps, where there is a single cockpit for performing power optimization, he predicts, whereby the cockpit will be an assembly of several products and several features which designers have been using in a disjointed way. “The idea is that any exploration that someone wants to do, the tools are readily available – they don’t have to bring up five different tools from five different companies to do one simple task,” Ranjan added.

Power optimization
Historically clock gating has been enabled automatically based on the RTL description as part of logic synthesis (something that can be automatically extracted). The emphasis has shifted recently to making sure that clock gating can be enabled for multi-stage registers. At the same time, it needs to be extended to become physically aware to ensure the physical location of the registers is also factored in, such as whether a given clock gating element can be shared across multiple registers. It also has to enable what is called sequential clock gating, which looks at the pipelines and the data arrival time and the data activity to determine whether or not clock gating can be shared or enabled

“Multi-stage clock gating means moving the gating element further upstream as opposed to having it right next to the registers,” said Koorosh Nazifi, engineering group director for low power and mixed signal initiatives at Cadence. “By moving it further upstream, you potentially can achieve additional power savings because you can enable that gating element at an earlier phase in the design. Second, the number of gating elements can be reduced because if the clock network is feeding into multiple registers, you can gate that upstream as opposed to at its fan outs.”

Nazifi noted that Cadence is working on automating advanced techniques and improving current techniques by making them more intelligent, factoring in things that previously were part of the optimization tradeoff such as physical proximity or physical. In addition, the company is exploring activity-driven-based optimization for dynamic power reduction. “This is a very challenging task because even for analysis there is always the question of how to know the testbench used to generate the activity is representative of the actual behavior of the circuit in the real world. It’s most likely a snippet of the given activity. Does that really exercise the circuit or put in worst conditions to be able to accurately determine what the power consumption is and perform optimizations that can handle all possible scenarios?”

He said all the basics are there already. The goal is to move the information upstream to achieve the biggest power reduction opportunities.

“Just like at RTL, there are certain techniques that you can automatically take advantage of based on the RTL description. Or, for that matter, once you have that RTL or HDL and map it to a gate-level representation, then you can perform additional optimizations like sizing and so forth. At the system level the same thing applies. There are certain techniques that you could potentially explore — primarily pipelining and so forth — that can be explored automatically based on the SystemC constructs and description. But to really take that much further you need language support, and the same thing happened in the RTL space. Basically, we determined that HDL doesn’t give us the necessary information to bring in physically related information with regards to power and ground connectivity that typically enters into the flow at the physical implementation phase into the RTL. So we had to define a new set of constructs and semantics. We ended up with two — CPF and UPF. But those were complementary languages and semantics that had to be introduced in order to provide the necessary information with regard to the power and ground connectivity of a given circuit at the RTL. The same thing needs to happen at the system level before we can really enable these techniques to be applied at the SystemC level. So it’s not just the tool side. It’s also the language side that has to be accounted for.”

Standards
To enable power optimization at the system level, some have suggested that language standards are the answer. But even standards have their limits.

“Language is only as good as the information you provide,” said Lawrence Loh, vice president of engineering at Jasper Design Automation. “On the other hand, having a consistent form of communications is very important because what a system is today is similar to what a small ASIC was many years ago. Small ASICs never have to worry about transistors being correct. They know exactly the characteristics of their transistors. Beyond that, they know the characteristics of an ASIC library cell — what kind of set up or hold time they will need, how much power it’s going to take, and how much delay it’s going to have. In the early days, people did a very good job in characterizing the transistors and library cells. Now people don’t even think that far. They think about IP, not the building block inside the IP. How do we capture the same level of information so a system integrator can safely utilize that information to do a system-level analysis? They don’t want to worry about inside the IP and how it works. They don’t want to worry about inside the ASIC cell and how it works. They don’t want to worry about the transistor and how it works. They only want to worry about looking at IP from the outside — what do I expect in terms of power, in terms of functionality, in terms of how to disable and turn off the whole IP?”

A sufficient description is still needed to communicate that. While Loh acknowledged the current power efforts, as Nazifi did above, he still believes they have a long way to go.

So whether the jump up to the system level for power optimization and reduction comes in the form of new tools or language or delivery mechanisms, incremental improvements to existing methods are always occurring. The decision on how best to get there is debatable, but engineering teams are sure to have a number of paths to explore in the future.

To read part one, click here.



Leave a Reply


(Note: This name will be displayed publicly)