Experts At The Table: Managing Power At Higher Levels Of Abstraction

First of three parts: Redefining the system for power; power estimation vs. real measurements and optimization; accuracy vs. relative accuracy; the impact of increasing complexity on estimates.


Low-Power Engineering sat down to discuss the advantages of dealing with power at a high level with Mike Meyer, a Cadence fellow; Grant Martin, chief scientist at Tensilica; Vic Kulkarni, senior vice president and general manager at Apache Design; Shawn McCloud, vice president of marketing at Calypto; and Brett Cline, vice president of marketing at sales at Forte Design Systems. What follows are excerpts of that conversation.

LPE: Do we need to redefine what constitutes a system when we’re talking about power?
Meyer: It definitely is part of a bigger picture. The problem is that there is very little support up at those levels. You’re managing budgets and it’s a crude mechanism you need there. As we go forward things like virtual platforms and power—or maybe more on the energy level where you’re looking at where power is spent between hardware and software—and making decisions on how things should be implemented makes a lot of sense.
Martin: We’ve supported power modeling in our application-specific processor design flow for a number of years. The kind of power modeling you get can be quite accurate for the instruction set. It’s reasonable to allow you to make energy tradeoffs with the instructions you have. You can then run that set of instructions and predict what the energy consumption is going to be, and then you can do experiments around that. We feel like we’ve been a plug without a socket for years, at least for determining the right instruction set.
Kulkarni: In terms of how we see the world, it’s more than absolute accuracy and prediction. It’s really relative accuracy and relative analysis of what-if scenarios. That’s where the whole world is going in terms of macro architecture analysis. But the Holy Grail is ESL synthesis with RTL power analysis. That’s the ideal flow—you capture some of the physical effects in the RTL world but do a lot of what-if tradeoffs, hardware-software co-simulation, DVFS, looking at various power scenarios, and then validate that through a real hardware description language because that’s where all the realization will occur. But you can do all the relative accuracy of power up at the ESL level. That’s the picture we see from our end customers.

LPE: So it’s big-picture power estimation?
Kulkarni: Power estimation is still questionable at the moment. But getting traces to drive the RTL power analysis is a much better approach, which we have used in a mobile application. We took a 2.5 million-gate design and applied SystemC-level simulation, then did ESL with a partner, and then looking at the IP and DSP models there were multiple cores. We took the transaction-level trace and combined it with RTL analysis to essentially emulate what would happen on a six-second call on a cell phone. We could analyze that in four hours using this flow vs. three months using pure RTL analysis and RTL power estimations. So a combination of ESL synthesis plus RTL analysis that captures a realistic stimulus and the physical effects can reduce that band of accuracy from RTL to gate to final signoff.
Cline: Today that’s a typical flow. There isn’t any real SystemC analysis—or good ones, anyway. But as far as optimization and estimation go, these are two separate worlds. The first one is that somebody has to get the power right on a macro level. You need some way to model those larger blocks and get your power budgets right at the block level, which may have only 10 blocks in your design. From there you need to go to power estimation. You go through synthesis, go to RTL estimation, and then loop the information back into your system level. There has to be some sort of modeling at the higher level, with more parameters than just the performance numbers. There has to be some other quick estimation of area using a synthesis engine and quick estimation of power using a combination of synthesis and techniques through application-specific processors or RTL.

LPE: How accurate do the initial high-level power analysis or estimation have to be?
Martin: Our experiments with our processor estimation have been that for the RISC core you can be plus or minus 3%. For other things it can be more like plus or minus 20%. The key is to make macro tradeoffs. It’s not whether this is 5% better than that one. It’s whether this takes you in a different space than that one, and for that 20% or 15% is an adequate coarse-grained analysis. People always want to verify at RTL and maybe down to place and route that the decisions they made at the high level are being validated at the implementation level.
Kulkarni: What we find is a band of inaccuracy, as opposed to absolute numbers. About 30% is adequate as you go through RTL, RTL synthesis, P&R, layout, and final grid signoff. If you keep that band and narrow it down consistently, you get a true power budgeting solution at the system level. That includes hardware-software tradeoffs, and RTL to synthesis all the way to grid design and package. That way the ESL designer working on the next-generation smart phone is not completely off the mark in terms of final cost, power and SI budgets.
McCloud: From a high-level synthesis perspective, accuracy needs to be quite high. At the end of the day you’re doing hardware architectural exploration between different frequencies and technology. If you are off 20% or 30% when you’re trying to make your design selection it’s significant. It does come down to the accuracy of the up-front power estimation tool when you get closer to the hardware you’re trying to create. If you’re talking about something before that in the TLM platform space, at that level when you’re making decisions about whether to move something into software or keep it in hardware, accuracies in the range of 30% or 40% are sufficient. But if you’re creating a hardware accelerator, you need to be within 10% or 20%.
Meyer: If you’re consistently overestimating by 20% and you know that’s happening, that’s much more acceptable. But if you’re plus 20% in one area and minus 20% over here, then your confidence disappears very quickly.
Martin: You have to be monotonic. If the estimator says A will be greater than B, then by the time you get detailed analysis A had better be something greater than B—even if the actual numbers aren’t the same.
McCloud: And that’s the problem. If the estimations in high-level synthesis are off by 20% or 30%, you run a high degree of risk that your relative comparison between one solution and another solution is not the same relativity when you go down to RTL synthesis at the gate level. If you think solution A is 5 milliwatts and another solution is 7 milliwatts, when you go to actually implement that and run it through power estimation tools at the gate level, you might find that comparison is correct. That’s why I believe you need a relative level of accuracy.

LPE: Is it harder to get an accurate assessment as we start adding in multiple power islands, voltage rails, and stacked die packages?
McCloud: It might get a little bit easier. Of course you need to be able to architect your high-level synthesis tool to be able to take into consideration that you’ve got islands, but in some respects you’re localizing the power estimation to a particular region of your design. When you’re talking about power gating of an entire hierarchical block, that’s actually a benefit when you localize it to a specific area.
Cline: The problem that high-level synthesis tools will have in the future, if there’s not a closer correlation to the back end, is having a disconnect at 20nm or 1nm.
Meyer: And having some way of passing down what you think is a good implementation for this piece of it. There has to be something to express, ‘I expect this to be a high Vt and this to be a low Vt.’

LPE: Does it make it harder to pick which processors and which IP and which interconnects you’re going to use because you are running at such a high level?
McCloud: The further you get away from the silicon the greater the impact you can have on power. At the gate level a power expert can save the design 10% to 15%. If you get up to the TLM decisions with software and hardware, you can achieve huge power savings. Maybe you only 30% accuracy, but the decisions you make can have a bigger impact.
Martin: That’s where configurable processors can open up a whole new area. You can get a 10-to-1 improvement in performance in terms of the instructions for deep dataplane applications. You can get a 3-to-1 improvement in energy consumption.
Kulkarni: You are doing so many tradeoffs at ESL that you are purposely making assumptions based on how long it will take. If you add more accuracy to ESL what that means is you are really adding synthesis. That will explode the runtime. The designs are getting so complex that for the next tablet there will be 1 billion gates. That’s an unheard of number for mobile applications. To do a what-if analysis for that kind of application it’s critical that it gets done quickly. In GPS vs. the iPod, for example, it’s critical to determine what kind of stimulus can be provided for ESL. That’s the missing link for root power analysis. It’s better to optimize timing and area, but it gets more difficult to optimize for power. The more analysis you do, the more synthesis you put under the hood, which will expand the run time.
McCloud: I don’t think you can do any reasonable level of power estimation without some realistic switching activity represented. Otherwise your estimates will be way off.
Martin: That just emphasizes the use scenarios. We run into design teams sometimes that don’t seem to fully understand the system constraints they’re operating under. They can’t identify whether they want to operate at a 200MHz operating point and a 40LP process, vs. a 500MHz operating point. But if you target close to the edge of a process the results you get in terms of energy consumption and peak power consumption can be way off. People really understand the usage scenarios and the constraints that system architectures put on operating points and operating constraints. Sometimes that’s missing.