Power Estimation: Early Warning System Or False Alarm?

Experts at the table, part 1: Is power estimation good enough to make design decisions and to provide confidence in final silicon?


Semiconductor Engineering sat down with a large panel of experts to discuss the state of power estimation and to find out if the current levels of accuracy are sufficient to being able to make informed decisions. Panelists included: Leah Schuth, director of technical marketing in the physical design group at ARM; Vic Kulkarni, senior vice president and general manager for the RTL power business at Ansys; John Redmond, associate technical director and low power team lead at Broadcom; Krishna Balachandran, product management director at Cadence; Anand Iyer, director of product marketing at Calypto; Jean–Marie Brunet, product marketing director for emulation at Mentor Graphics; Johannes Stahl, director of product marketing for the prototyping at Synopsys; and Shane Stelmach, associate technical director power and reliability solutions expert at TI. What follows are excerpts of that conversation.

SE: What is the current state of power estimation?

Schuth: It seems as if we are in a place where we can get accurate results or results in a timely manner. The best we can hope to achieve at this point is a rough envelope of where you might expect to be in the end with little regard to power modes within the chip because that may be too complex of an issue, at least in the early phases. We have a long way to go but that may be okay for that stage of the design. I feel that we will always be leaving stuff on the table.

Balachandran: There is a great divide where there is system-level power, then there is power from RTL to GDS II, and then there is sign-off power. The three of them don’t intersect. They don’t meet, and that causes lots of stomach acid for the industry. You can have different engines that calculate power and they don’t correlate. So you start with a number at the top, which is usually done with spreadsheets, and then as you go down the flow you have decent results when you get to gate-level. Those numbers are well correlated to SPICE, and you have sign-off, and usually the surprise is between gate-level and sign-off. This causes many iterations and the numbers to not converge. One part of this is power modes not being considered, but there are a whole host of things not be considered up front. Physical topology is not considered. Timing effects are not considered. You have synthesis that is timing-driven, but RTL power does not estimate the impact of timing on power. Those kinds of inaccuracies just cannot be ignored. Timing-driven design became important and the industry took 10 to 15 years to solve it, and bring it to a stage where it could be controlled, but to do that you have to be able to measure power reliably and consistently throughout the whole flow so that at the end of the day, what you get is what you expected. If you know what the error is going to be, you can plan for it. We are not there today.

Stahl: I would say that the surprise is not in sign-off, but in silicon. When you actually run the application in silicon and you forgot to run one of the necessary applications for the silicon. This can mean that you miss the power budget in a big way, not a small way.

Stelmach: That happens in real life, many times. At lot of times it could be in places such as test. You are doing BiST on memories or doing transition fault testing and your mission mode power is not peak power for the design. You can be accurate or fast, but you also have to know where to look. You can’t turn over every rock so you need to have an idea about what you are trying to solve by power prediction. Are you trying to produce good datasheet values that you can use to characterize your product or are worried about being able to ramp up production quickly and looking to find the false fails – Vmin screening that is not consistent with your mission mode power? To do that the continuum of the flow is needed. Too often we have point solutions for early power prediction, a point solution for IR drop sign-off, a point solution for emulation and they don’t fit together.

Kulkarni: We have been working on RTL power for almost fifteen years in terms of understanding the intent. Before we talk about a number in power estimation, you have to see what the intent is. It is about what-if analysis in terms of various scenarios – application scenarios. There is no one magic number. When we talk about power estimation, it is the concept of PPP: Pessimistic, Predictable Power. There are issues about how to reduce the band of accuracy between early estimation and final sign-off. The various dots have to be connected. For example, to capture clock power and post layout power, the synthesis tools is probably unable to predict it. It happens with clock tree synthesis, test insertion etc. To become physical and timing aware there has to be a reference design at the end. Before you even start with RTL and PPP, you abstract the models of capacitance. Clock nets, fan-in information, technology files, slew rates that capture the timing information from a reference design. Then you can bring that up to the RT level of abstraction. We realized this by looking at case models and we created a reference methodology first – say for 16nm finFET. Then there is a base extraction utility that gives values for capacitances, clock capacitance, combinatorial logic etc. This enables us to do RTL power estimation. Then to connect the dots between the RTL world and the dynamic voltage drop, peak power events, dIdT events downstream at the package level, you can put that through RTL frames and in each frame look at peak and average power and bring that down through encapsulated models. That can create an early chip power model which is an abstracted RLC model to look at the package. So you can predict what will happen to the Z impedance curve of the package based on RTL estimation. To connect the dots requires multiple tools and these come from collaboration events with customers and calibration technology.

Redmond: For early power analysis I would categorize it as cumbersome and inaccurate. We use spreadsheets for total power estimation. We will have a box at the system level for memory, processor and other components and the drill into them, so SoC power would have a huge spreadsheet below that broken into IPs and its mode may be X. You drill into that and find another huge spreadsheet. This is a huge amount of effort but it is very mechanical, performed by hand, and this is cumbersome. Depending on what the “ifs” are, you may need several variations of the chip. Give me power estimates for each one. That is not push button. The other problem with spreadsheets is that if someone gets more accurate numbers and punches them in, how do you make sure that everyone gets the updated information? The data management is difficult. Some of the number may come from DDR4, for example, and if this is the same as the last chip, then that is great, you can use the actual numbers, but if it is a new block at the architectural level, the numbers are hand waiving. When you couple these number with ones that are very accurate the error bar is questionable and you don’t have a handle on it.

Brunet: We have seen several customers go through tape-out, and when the silicon comes back and they plug it into a board they find that it consumes 3X to 4X the expected dynamic power. What is interesting is that most of the time they have an OS, and if you study how they verify the chip with respect to power, none of them are actually booting the OS at the RT level. They rely on the functional testbench and use this for the power tool. We think this is the wrong way. You have to simulate and emulate the chip in the same way by putting the system on an FPGA prototype or emulator – a simulator is too slow to boot an OS. You have to run a couple of hundred million cycles. You need full visibility of every net, so this is a little challenging for FPGA prototypes. The emulator appears to be the right platform for that. This is about having a real, live representation of the chip and how it will be utilized. You need to replace the functional testbench with something that is plugged into the emulator, so then you get a power number. RTL power analysis has been going on for 10 to 15 years, but most of the tools are closer to silicon – at the gate level with accuracy in the range of 5%. RTL is 10% to 15% from silicon which is a breakthrough. This was not possible a couple of years ago. It requires a change of methodology for power. A couple of customers have been burned and they are now looking at these new ways.

Stahl: I love spreadsheets, but only for looking at my business. We have seen the same things that you are talking about: for an architect it is very cumbersome and inaccurate. Cumbersome means it is not really connected with the design and uses a different representation. Inaccurate is not because you don’t know the power of the block, but because you don’t know the interaction between them during execution. The spreadsheet is inherently static. This makes it difficult to incorporate the dynamic behavior, and it has a very high chance of making mistakes. Some customers pushed the envelope and the spreadsheet got more complicated with scripts, and it eventually became unmanageable. The outcome is that they said, “Why don’t we look at this in the same way we do performance. Here we write models and execute them to get the performance of an SoC and apply that to power.” Over the past two or three years we have been taking our performance solution and enhanced it with the representation for power. So we have overlaid the performance and power models. This can get extremely good results, which means 15% deviation from actual power. This is enough for decision making during the early stage of the design. Later on, these can be re-run in an emulator and measure the actuals in terms of cycles and power and go to sign-off. This step from exploration at the high level and redoing it in validation is the right thing to do.

Iyer: From where we stand I see two things. One is the accuracy and the runtime. You need the physical model in order to get the necessary accuracy. So what is the minimum set that can guarantee the accuracy and at the same time the necessary speed at the RT level? More importantly, we need to focus on the application of those power numbers. How do you make use of those numbers? If you are looking at a block or a chip, at what level are you going to make decisions about power optimization? How quickly and painlessly can it be done? These are the additional questions that need to be answered as an industry.