Mining For Data

Current power analysis techniques are typically employed too late to allow for major improvements. By data mining of physical info, RTL can be improved.


By Ann Steffora Mutschler

Power analysis accuracy at the RTL design abstraction is a challenging problem. Smaller geometries just make the challenge of predicting accurate RTL power consumption even more difficult, which in turn impacts other design decisions such as power-grid planning and package selection.

“It’s one of these things where the earlier you are in the design, even before RT level, macroarchitecture and microarchitecture decisions have, by far, the biggest impact on the power, performance and area results you’re going to get,” said Pete Hardee, low-power design solution marketing director at Cadence. “Making those decisions at the earliest possible stage has a much bigger impact, but our ability to measure that is arguably poorest at those high levels of abstraction and improve with the detail.”

There’s a lot of detail that needs to come into those decisions. Very simply, it boils down to activity and characterization.

In terms of characterization, he explained, “I’ve got to be able to know exactly what it costs me when components switch and what the leakage component is, as well, with all of that stuff at the transistor level. I’ve got to know the dynamic and static components of my power for all of those library cell elements that I have in my design.”

The switching activity in the different system modes also must be taken into consideration, including how long the activity stays in a particular mode, how long various blocks are inactive and how long they are fully active. “I need to know not just one switching activity profile, but many switching activity profiles and I need to know the time base for those, as well,” Hardee said.

Barry Pangrle, solutions architect for low-power at Mentor Graphics, pointed out that generally for somebody who is writing the RTL, unless they are running it through some type of synthesis they’re not really getting a whole lot of feedback of what the physical information is. “Even at that level, most of the physical information that’s going to be available has been captured in the cell libraries that they’re using for the synthesis. At that point, there’s a lot that the synthesis tools can do from a synthesis optimization standpoint based on the information they have in the cell libraries. But the only other way to get that and improve it at RTL (because improving at RTL could be specific to exactly the physical information) is what technology am I actually doing this implementation in? If I’ve got a library that’s basically all high Vt, for example, and it’s relatively low leakage compared to one that maybe high performance and leaks a lot, then what I do to ‘improve my RTL’ better be based on what my target technology is.”

Designers want to be able to change the RTL because they know that’s where the biggest impact can be made but they also understand that the absolute accuracy there is not as good as what can be achieved at the physical level, noted Shawn McCloud, vice president of marketing at Calypto. “But one thing to keep in mind here is that the relative accuracy is actually a very good metric. If you look at the relative power of one RTL implementation to another RTL implementation to another RTL implementation, the relative accuracy usually is pretty good so you are able to make at least a tradeoff there in RTL design.”

In addition, he said it’s possible to model power behavior based on previous versions of the design. The latest RTL analysis tools that are coming out can mine the data from previous versions of the chip and build a model to be used for later refinement at the RTL level. This definitely provides more accuracy at the RTL level.

The best way to approach power analysis depends on whom you talk to.

Vic Kulkarni, general manager and senior vice president of the RTL Business Unit at Apache Design, said RTL signoff for power requires a more accurate approach to RTL power analysis with sub-65nm technologies. The company’s PACE (PowerArtist Calibrator and Estimator) technology calibrates design transformations through the implementation flow and produces granular power models.

What prompted the PACE project, which started almost two years ago, was the constant power budgeting issue being experienced as designs were going below 40nm and what happens to the RTL power number, Kulkarni explained. “So when they have 3 milliwatts and they reduce the power using techniques such as auto-reduction to 2.5 milliwatts on a large block, for instance, what happens when they go through synthesis and place & route and clock tree synthesis, then final P&R and gate-level signoff. They suddenly find they’re out of whack, typically 30% off. Constantly people were asking us, ‘How do I know that what I’m doing at RTL is preserved within certain bands of accuracy?’”

Kulkarni said the process of creating a PACE model gives better capacitance estimation during gate level power analysis when compared to wireload models from Liberty libraries because over time, those wireload models have been optimized for timing, not power.

Synopsys takes a different tack. ‘We approach it from two different ways,” said Mary Ann White, director for Galaxy implementation platform marketing. “We don’t feel the need to data mine at the back end to make changes at the front end because Design Compiler and IC Compiler share a lot of the same techniques and engines. Because of the shared engines, we’ve been building more and more physical know how into Design Compiler so there’s no need to wait until then.”

A second approach involves more of a rough estimation for exploration at the front end. “It gives you the accuracy within 10% for what happens at the back end, particularly with timing and area—and you can actually get some of the power results up front, so you kind of know what’s happening, even in the presence of dirty data so you don’t have to have a full complete thing to do this type of exploration analysis,” she added.

And finally, from the Cadence camp comes another alternative. “To get an accurate power analysis early on, I need to know what my clock tree is, and how much of that clock tree I can typically disable in the various system modes,” Hardee said. “That’s where it really becomes complex. At the early stage of the design flow you can get a relatively good handle on system activity. The activity and the switching activity I can know that pretty well—arguably even better than later on because my tools are running at a higher level of abstraction. I can run deeper cycles. I can even run that with maybe a little bit of system so I can run under real system conditions and see what my real system activity is. But all of that other stuff—to get accuracy from the characterization point of view, I need to know my routing, clock tree, etc., and all of that comes later on.”

For most tools that is a real problem, he continued. “If I’ve got good characterization, I can’t possibly run all of that activity. If I can run fast enough to run all of that activity, I don’t have good characterization of the design. Where we can bring those things together is in our hardware emulation technology. We can actually run a lot of activity under software; we can run with the software and run real system modes and while the design characterization-wise that is implemented on the Palladium box is not that representative of the real silicon, we embed RTL Compiler under the hood with dynamic power analysis and we take all of that characterization information out of the standard cell library and effectively synthesize that using RLT Compiler. We synthesize the design to the real target library. Now we have both components pretty accurately. This will beat the RTL power analysis tools every time.”

At the end of the day, no matter what the approach, there is a power analysis solution for each situation and the industry continues to put significant R&D dollars towards improving power analysis at the RTL and above.

Leave a Reply

(Note: This name will be displayed publicly)