Experts At The Table: Managing Power At Higher Levels Of Abstraction

Second of three parts: The challenge of user profiles; two-way communication in a hierarchical flow; power vs. energy; the uncertainties of software.

popularity

Low-Power Engineering sat down to discuss the advantages of dealing with power at a high level with Mike Meyer, a Cadence fellow; Grant Martin, chief scientist at Tensilica; Vic Kulkarni, senior vice president and general manager at Apache Design; Shawn McCloud, vice president of marketing at Calypto; and Brett Cline, vice president of marketing at sales at Forte Design Systems. What follows are excerpts of that conversation.

LPE: Does the approach to using tools have to change for low-power?
McCloud: Way too often, when people get to the block level, they start saying, ‘I’m going to use my functional tests.’ That’s completely wrong. That’s one of the big advantages of doing things at the higher level. You’ve got things that are closer to the real application running. That’s an incredible difference.

LPE: But how do you integrate that concept with different use models? Two different users may use the device for completely different purposes and in different ways.
Cline: You can go through the user profiles pretty easily. If Apple claims its new phone will have eight hours of talk time, that probably doesn’t include me not talking on the phone at all. I’m just going to surf and get e-mails and do other things. They understand one profile, and then there’s probably a mixed profile for the average user. You can conceptualize that at the system level when you’re doing your design, figuring out if the phone is running for eight hours straight here’s what’s probably going to happen on your next project. But you need to get down to something measurable, which right now is the RTL level, and even that is questionable in some situations. But it’s a tough problem.
McCloud: Today, the cell phone is one of the more common applications that high-level synthesis has to do with the image single-processing blocks. A large number of these ISPs—everything from the sensor to the image correction that’s occurring to the final JPEG decoding—are done with high-level synthesis. The reason is that having a dedicated hardware accelerator is going to produce the lowest power needed for doing image signal processing. It’s a very specific function. You take a picture, you do image correction and processing, you compress it, store it and you’re done. You shut the hardware down.

LPE: High-level synthesis isn’t normally associated with power estimation. Is this a new use?
Cline: It’s been there to some extent. Power estimation comes in a number of different forms. Let’s say you’re laying down a clock cycle and designing it in high-level synthesis, and the tool is doing the work to fill it up. It knows that a multiplier is going in. It can do quick analysis on a multiplier with a reasonable stimulus to figure out whether this multiplier has a better power profile than another one, as long as they both meet your performance goals. You can trade off the area with them, too. One of the things we can do is go through this process of scheduling the clock cycles, filling it up with all the different functional units that are going to go in there, and determine at a later stage in the synthesis process which are the lowest-power multipliers. Can you swap those in and still meet all your timing budgets? If you blow timing then you’re tool is useless. You have to meet timing, and afterward you minimize area and power and let the user make tradeoffs where they want to. It’s been there for a long time with various levels of maturity. What you’re going to see is that will continue to mature, but you have limited benefits at that level. You can make 5% and 10% improvements at that level. You can’t make 50% improvements.
McCloud: It’s evolving. Power has been in HLS for awhile. It’s being used to design applications for lower power. The first stages of this centered around simple exploration. You take a JPEG and your requirement is to compress that picture in 500 milliseconds. You can do that with a 25MHz clock, a 50MHz clock or a 100MHz clock. Each one of those has very different power tradeoffs. That kind of capability has existed in HLS for years.

LPE: But you’re coming at it from the standpoint of clock speed compared with power first.
Martin: That’s only true if you have a fixed-instruction-set processor. When you design your own instruction set you’re in a whole new ballgame, which is one reason we’ve been supporting high-level analysis and estimation in the design flow for a number of years. The power talk is interesting because of the confusion people have between power and energy. In mobile devices, assuming you have driven toward a peak-power level that is sufficient for low-cost packaging you’re targeting in your device, it’s all about energy. Energy is the issue here, and there are many ways to fit into a particular energy budget, but so often people express themselves using metrics such as milliwatts per megahertz. What does that mean in terms of the overall energy? It depends on what a milliwatt does. If you target one type of instruction set where you can do more in one cycle than in another, that reflects in the total energy consumption.
McCloud: For me it all boils down to battery life. That’s what matters. There’s a second component that is centered around power and integrity. The reason it’s so important is that when you reach 45nm we’re starting to reach a technology inflection point where you cannot scale the supply voltage any more to help reduce the power. At about 45nm or lower, the power density is goes non-linear. This is going to create huge problems around thermal and supply integrity. You’re going to start getting a Vdd dropout, you’ll have hot spots in your chip, and the chip will burn up. It’s not just about battery life. It’s how we’re going to be able to take advantage of these technologies in the future and be able to produce these chips in a way that power density doesn’t go through the ceiling.
Kulkarni: Specifically what we’re looking at is how do we get the high-level power support budgets versus the power consumption, which is insatiable demand of all functionality and multiple modes of operation. How do we make sure that the power grid we’ve designed will work? We’ve been watching that stimulus carefully. But what happens to that stimulus out of millions of clock cycles. There are things in the context of dynamic voltage, voltage route, and the package, and the PCB, and the system. You have a band of inaccuracy first, and then you look at the energy models and what happens over time. How do you capture those when you are switching between a lot of domains and there is a lot of switching activity? How do you model that accurately? And how do you model the physical effects at a higher level of abstraction so that your inaccuracy band gets narrower and narrower. We especially see that below 28nm, where there are huge transients causing voltage droop. Either your grid will collapse if you overdesign, or if you underdesign you will have electromigration problems with power energy and heat all coming together.

LPE: What you’re talking about here is a hierarchical flow with two-way communication, right?
Kulkarni: Yes. And the reason we have not done too much power synthesis at the ESL level in the past is that when you make a transaction-level model, how do you go inside that? Power creates a different level of challenge. The industry really needs to address how to create these high-level models that will tune into what happens down at the chip level. And then you have to connect the front end to the back end and get to the details of power in both directions.

LPE: How do we fix software to make systems more efficient?
Meyer: You’re presupposing that you have software at the time of the design. That’s one of the biggest challenges. That’s where virtual platforms become an important part, running real software on the system. That’s one of the real challenges at the system level—to have something you can run in software early enough to influence the hardware decisions that you’re making.

LPE: Do we need an understanding of the software and how it’s going to function at a very high level, though?
Meyer: For some cases, if you could characterize how much the software is using each of the blocks and be able to understand system performance without detailed modeling, that would help you understand your power budget and do a better estimate. But we really haven’t spent much time working at that level yet.
Martin: We spend a lot of our time working at that level. Letting our customers build configured and tailored processors is extremely important. Sometimes you can do an interesting job by taking referenced or standards-based software, and if you have a very good compiler for an application-specific processor it may be able to do things like automatically vectorize and infer the use of some fairly sophisticated instructions. That has a limit, though. To really figure out what an optimal algorithm implementation would do on a particular instruction set you may have to get into more manual optimization. But some of the early work can be done with an envelope if you have a good targeting compiler.
Cline: A lot of our customers have that exact issue. With cell phones they may look at their next platform and say, ‘This time it’s going to have real-time video on it. Can my ARM processor run this real-time video algorithm at the same time it’s doing a network connection to beam everything up while it’s also downloading e-mail?’ And they may not be ready to buy the next ARM processor. So if they put this into custom gates, what is the cost in terms of area and power and what is the speed? They do that initial analysis using high-level synthesis and figure out what the tradeoff is. A lot of times they can buy a bigger processor and take on more royalty cost or power issues. So in some cases they have the software already, or at least they know what it’s going to look like. In other cases when you build a bigger system you may not have that.
Kulkarni: One of our customers who was designing a digital TV asked us whether they can profile the software for power. It’s an interesting question. With digital TV your eyes are pretty much looking at an oval field picture, so can you do power reduction on the black pixels on the edges? That’s pixel-by-pixel power reduction. That’s a great challenge for all of us. It’s not just mobile applications. It’s also digital TV, streaming video, heads-up displays for military applications, and so on.
Cline: Those guys don’t care about battery life, which makes it very interesting.
Kulkarni: But they still want to reduce the power.
McCloud: It’s all about the packaging cost.