Experts At The Table: Power Budgeting

First of three parts: Understanding power from the architectural level; where is the low-hanging fruit; models vs. real measurements.

popularity

Low-Power Engineering sat down with Barry Pangrle, solutions architect for low-power design and verification at Mentor Graphics; Cary Chin, director of technical marketing for low-power solutions at Synopsys; Vic Kulkarni, general manager of the RTL business unit at Apache Design Solutions; Matt Klein, principal engineer for power and broadcast applications at Xilinx; and Paul van Besouw, president and CEO of Oasys Design Systems. What follows are excerpts of that conversation.

LPE: The ITRS road map points to serious problems ahead with power budgets. How do we solve that?
Chin: We want to be talking about power budgets at as high a level as possible. You have much more leverage for impacting power the sooner you can attack the problem. You want to look at the problem as far up in the architectural cycle as possible. That’s critical to power budgets. Another issue involves how we close in on the specific details. On the verification and test side we talked about vectors driving the tests. That’s been the case with power at a detail level, but not at the higher level. We need to better understand what are the operating modes, how we get into these modes, standards, how we specify power intent, and the implications from the hardware through the software stack, including the firmware, middleware, operating system, all the way up to the applications. Application software itself has a big implication on the power in the device. We need to be able to boil that down to the hardware and meet somewhere in the middle.
Klein: Static power has been a big driver after 90nm. Because it’s process- and temperature-dependent, it affects things where the chips are being used in a rack with limited airflow. We see thermal and power as very important, and each generation we’ve gone further than the recommendations of the ITRS. We’ve had to reduce those by process choices and other techniques. But as we begin to drive down static power we have to consider dynamic power. As we grow the cores of the FPGA bigger and bigger you can have more things that are toggling. That’s where power optimization techniques come in. We’ve invested in post-synthesis power optimization through SPICE-level clock gating. And once you’ve dealt with all those things, you’re still left with I/O power. You have to look at what you can do with the architecture and how you can deal with that. It’s a multi-pronged approach between static, dynamic and I/O power.
Pangrle: A lot of what drives the power budget is the target market for the chip. If you look at high-performance microprocessors in servers, they have topped out under the 150-watt range. If you’re looking at a cell phone you may be limited to 1 watt. There’s a broad spectrum, and for each one of these there are a lot of networking applications. If it’s going on a board you may be looking at a total of 17 watts. What’s driving this is the total cost of the system, which includes the packaging and how you’re going to cool it. With high-level servers, if you start going above a certain level you have to start looking at liquid cooling rather than just using air and fans to cool them.

LPE: At one point even fans were considered exotic and too pricey, right?
Pangrle: Yes, that’s correct. It all comes down to cost, and at 90nm we’re seeing a real big impact in leakage current and static power. When digital watches first came out they were running at 9 volts and static power was practically non-existent. Threshold voltages were high and you didn’t even take those into account. When we got below 100nm we went to 1 volt. Even if you look at the 28nm process, the nominal Vdd is still in the range of 0.85 to 1 volt, so we’ve lost that scaling. If we’re following Moore’s Law and doubling what we’re putting on each chip at every new technology node, but the energy per device isn’t being halved, that creates some real issues. We need higher-level optimization and tradeoffs between hardware and software.
Van Besouw: Power is the limiting factor in everything from small devices to set-top boxes for meeting performance goals. It’s a very complex process in determining how much power is going to be distributed to each smaller block. If you have hundreds of millions of gates that means you have hundreds of blocks. But when you distribute power that isn’t evenly distributed across the chip. It very much depends on the functionality of each block. You don’t know until you go down to the placement how much power is really being consumed. You want to do the power optimization at as high a level as possible—at the architectural level. You want to design this at the chip level, not at the low level, but there’s also another problem. You need details. You need to know what’s being used. That will determine how you implement RTL. It depends on the power requirements, and that impacts timing and placement. It’s a very connected problem. You want to make the right decision at the RTL level, but you need accurate information for placement. Floor planning, in turn, has an impact on timing closure and the timing characteristics, which includes the use of voltage islands. It’s like putting a 3D puzzle together where the shape and size of the puzzle is constantly changing.
Kulkarni: There is a difference between the classic Moore’s Law consumption and the Moore’s Law expectation. The trend is toward ‘More than Moore,” or MtM, which creates demand for power budgeting. The Moore’s Law that has been scaling transistors and process geometries was really driven by timing and performance. The MtM roadmap shows this problem isn’t just ICs. It’s also 3D stacked ICs. When you look at all the new tablets and smartphones we’re looking at stacked ICs. The power budgeting is exacerbated by the MtM law, which is taking over now. Moore’s Law will continue, which is ‘More of Moore.’ There will also be ‘More than Moore,’ which is MtM. We see that in the mobile markets, which are 100% focused on power and noise. OEMs are defining a power spec at RTL and then asking chip vendors to bid for it. The chip vendors need to define a band of accuracy for power all the way through post synthesis, clock re-synthesis, block placement, placement and route, and then dynamic voltage drop. Power budgeting came about as an emerging challenge. You need to make sure you can deliver the 5 watts maximum that the customer is asking for. But you have to be careful because you also can create voltage drop issues downstream on a PCB once everything is signed off. How do you predict that with the right stimulus management?

LPE: How much can still be saved in power in 2D for a reasonable cost?
Klein: We think there’s a lot of room left. We have spent more and more time at each generation looking at ways to save power. There’s a lot of low-hanging fruit that you don’t necessarily think is there. Even in the dynamic power area there’s low-hanging fruit. We’ve implemented fine-grained clock gating, and smart software at the post-synthesis level can take advantage of that. Designers also could do it pre-synthesis, as well. There are so many things people haven’t done yet that there is a lot available. We’ve also built headroom into our 28nm processes. We have a large number of parts we can offer at a lower voltage, which gives us the ability to lower dynamic power by 20% just based on the square of the voltage. Then you can still architect hard blocks, which compared to FPGA soft logic will be much better. Because we’re coming from the FPGA space, there is significant improvement.
Kulkarni: We’ve found there is significant room at the RT level. In the mobile area we found a sophisticated designer had done all he could for a quad-core design. In certain modes, there were three cores shut off and only one was running. But he found hot spots on those cores when they weren’t supposed to be running, so he investigated further. It turns out that the dynamic power, which is the relationship of four signals—data, clock, enable and reset—was not off completely. Data was circulating. Functional verification showed no problem. Formal verification showed no problem. But those three cores were consuming useless power. Once he found the problem, he reduced dynamic power by 22%. But there is no single button to push for that. RTL debug is becoming a way of finding those problems. That’s not really low-hanging fruit, though.
Pangrle: A lot of what you’re bringing up is that customers are new to active power management. That can get built into their flow and it’s it’s something that could be caught during verification. You can create assertions to catch those signals. People will find incremental ways to improve things from the RTL level down, but the real progress will be in looking at it from a system perspective.