Experts At The Table: Making Software More Energy-Efficient

Last of three parts: What drains the battery; how software can impact system-level energy consumption; where to find the low-hanging fruit.


By Ed Sperling
Low-Power Engineering sat down to discuss software and power with Adam Kaiser, Nucleus RTOS architect at Mentor Graphics; Pete Hardee, marketing director at Cadence; Chris Rowen, CTO of Tensilica; Vic Kulkarni, senior vice president and general manager of Apache Design, and Bill Neifert, CTO of Carbon Design Systems. What follows are excerpts of that conversation.

LPE: How much of the battery drain on a smart phone is caused by the hardware, how much is caused by the software, and how much is caused by bad reception?
Kaiser: Software controls a lot of it. Bad hardware that does not allow you to turn something off is one cause. But that doesn’t happen as often as bad software. If the hardware has one clock that turns everything off then you have a problem because whenever you want to use one little block you have to turn on five. But with software you have to give engineers feedback and tell them what knobs to turn. Ideally, you even give them an algorithm for how to tweak those knobs. We tried to do this with Nucleus. The drivers automatically manage their own power for WiFi or anything else. If no one opens the driver it won’t burn power. If you can lower power, don’t worry about the rest of the OS. Just minimize dynamically. You can set up limits for the driver. Then the application guy just needs to be able to allow the device to turn on. You need to give people simple metrics like CPU utilization. And if you give metrics on how much power your CPU is using while idle and how much it’s using when it’s busy, you can tell how much your CPU is using. Then, if you lower the frequency to half and the CPU is twice as busy, it’s actually burning more power. The compiler needs to do the job.
Rowen: The compiler can do a good job of the lower level things, but the choice of algorithms and which states you’re going to transition among is way beyond what the compiler has any access to. I recently saw a study of the number of states that a cell phone goes through. Something like 38 messages had to go back and forth between the software running on the phone and what was going on in the base station that were basically a negotiation as the phone entered a cell. There are some very tough and complex tradeoffs to make about whether you want to save power at one level by doing fewer transactions or you want to be aggressive and get the negotiation done as quickly as possible because it allows you to get into the lower power state as quickly as possible. There are some non-obvious tradeoffs at work at the system level because you have to determine if the phone is in a low-power or high-power state. They’re not things that you’re going to work out between Microsoft and Nokia. It’s going to be between Nokia and AT&T.
Kaiser: Does it matter? How often do you associate with a particular cell station? It affects standby time, but standby time is already pretty long. Does it really matter if you optimize that case, or do you care about other cases? How much of your battery went into this handshake?
Rowen: With the scenarios I’ve seen it could matter a lot.
Hardee: If you change the data arrival rate to those processes that are rendering Web pages, it’s a big difference. You could be running your graphics processors continually just because you have a slow data arrival rate, as opposed to processing everything and shutting down. It would be difficult for the software guys to optimize for those cases. What they can optimize for is how predictable stuff is. Can you do predictive scheduling? That changes what the application is doing. Those decisions are set pretty low down in the software stack, but what’s available to use and how effectively it can be used is another thing the software engineer has to think about.

LPE: How much of this information is making its way between hardware and software teams?
Kulkarni: That’s where virtual platforms come in. A co-simulation platform is a better description. But the marriage of the software with the hardware and how we capture that in instrumentation then can be driven toward a meter, which may be RTL power, a hardware description. But it all has to convert into power analysis at the end of the day. The feedback can be given to the system designer and the software designer, but all those things are missing. What Carbon is doing is an important step toward that. You can do the power analysis and get that feedback. We have to look at the application over time, and the feedback has to be in real time. In one of our customer applications for digital TV, they asked us if your eyes are looking at the oval in the middle of the screen can you turn off the power at the edges. They’re looking at pixel-by-pixel power control. This is real-time feedback of hardware and software applications.
Kaiser: You can re-encode movies based upon brightness. If it’s pretty dark, you can show it with much lower backlight. The backlight can vary and the screen looks the same. And it can vary by region. That’s beyond the scope of hardware. It’s algorithms.
Kulkarni: This customer is looking for software energy-reducing concepts. They want to know where their software is consuming more power.
Kaiser: They want the drivers. And if you’re going to be varying the CPU, then you also need to provide the compiler.
Rowen: Depending on what level in the system you’re talking about, the hardware has always provided the software. We’re doing a lot of advanced baseband design. The next thing after the industry specification that you do is make it happen in 150 milliwatts at 300 Mbits per second. That drives all the subsequent design, including the choice of algorithms, the processors, the allocation of memory and the interconnect. They’re all driven within a power budget. Everyone working at layer one knows the power. This very tight hardware-software co-design is very established there. It starts to loosen up as you go up, in part because you’re aggregating these much more complex systems together.
Neifert: That’s where it’s missing. The power is really a system context. Five or six years ago I started getting inquiries from leading-edge customers. A couple years later it was leading-edge research groups. About two years ago it made it out of research, and now about 30% or 40% of our customers are doing this in some way. It’s of great importance now.
Hardee: We all tend to gravitate toward the simulation model or the virtual platform’s ability to do power estimation. That’s not actually the low-hanging fruit, though. The thing that can be done relatively simply is system integration testing of power management software. Can you switch the mains on and off? Is it idle when you think it’s idle? That’s a lot lower-hanging fruit in a SystemC TLM 2.0 modeling environment than in power estimation. For power estimation, we have a ways to go even in the activity formats used. You have to use averaging formats over defined windows. These all apply at the signal level. How do we bring them up to the TLM 2.0 level to make them run faster? That can be an issue. There are circumstances where you can say you have an AXI protocol and 64 bits, and you can do the math to get from signal level to architectural level. But then you look at all the architectural differences that start to become nuances in that model, like whether you’re doing split transactions and how are bus transactions being pipelined. Is that being correctly modeled in the platform. There’s a lot of complication. Even to get relative accuracy you will need to model this.
Rowen: We’ve gone up halfway between this signal and toggle level and TLM. Processors are nicely defined. What we’ve done is to automatically derive instruction-execution-level energy models so we can, as part of the initial instruction set characterization, come up with a pretty good energy model per execution. It’s still data independent, but there’s a summary number. The simulator knows how to count things like memory references. Then the whole processor plus memory subsystem has very accurate relative and kind of accurate absolute energy at a level that runs at the full speed of a fast simulator, not at RTL speed. Therefore you can start to make that a building block within a transaction-level approach. That’s one of the pieces of raising energy in abstraction and getting past the toggle.
Neifert: You start doing toggles and you slow everything down. You may use the toggles as an instrument for calibration, and then you go back and put that in and say, when I do this I take this much power per cycle. Then you can start aggregating some of those numbers to at least get a relative figure.

Leave a Reply

(Note: This name will be displayed publicly)