Understanding and implementing power state switching delays can make or break a design.
Power state switching delay is a key factor in minimizing power, and getting it right frequently means the difference between a successful design and a dead chip. But tradeoffs are intricate, complex and often involve judgment calls, making this a place where designs can go completely awry.
For years, traditional, full-swing CMOS process technologies were used with CMOS-style logic circuits. Power was spent almost completely in switching because the transistors did such a good job of turning off that if a logic gate was outputting a 1, then all the pull down transistors were switched off. If the gate was outputting a zero, then all the pull up transistors were switched off.
“If you could stop the activity, you could stop the power spend,” said Drew Wingard, CTO of Sonics. “The best way of reducing power at that point was to try to stop the activity. The easiest way to stop the activity in a synchronous digital circuit is to stop the clock, so clock gating became the thing to do.”
Today, layers of clock gating are implemented in complex designs. But it can be difficult for the hardware to determine when is the right time to shut off the clock, which is why many design teams now use software for clock gating.
“The microprocessor would write to some register that would stop the clock over here on this part of the chip,” he said. “Then leakage happened, and as we advanced in process technologies and transistors got shorter, and all these special effects are happening to the transistors, the side effect is that the transistors don’t turn off all the way. So even when the pull-down transistors are supposed to be off, they don’t turn off all the way and we have some leakage. It’s not a lot of leakage but you take a small number and you multiply it by a billion transistors, and it gets to be a lot of power.”
Managing the leakage also has pushed more power management into software, because the approaches all tend to take a little bit more time than just stopping the clock. “Most people weren’t making these changes very often, so whatever time overhead associated with doing it didn’t seem to be very significant. There also was a risk issue: Once you’ve taken the power away from a circuit it can’t do anything, and that’s scary if the signal that’s supposed to go from A to B to tell it to turn back on goes through a region of logic that powered off the supply. The electrons aren’t going to get there because almost invariably the place and route tool is going to put in a buffer or inverter to make sure that the signal can make it from A to B with acceptable propagation time. The wires are really slow, and that inverter doesn’t have any power any more. There are all kinds of interesting challenges that show up in chips and so people believe that by moving that control into software, they would be relatively more immune to bugs because they could at least patch it after the fact by rewriting the software.”
The challenge is that as more things are continually integrated onto the same number of square millimeters of silicon, which equates to ever-denser process technologies, there are more transistors that could be leaking. So even more ways of pulling out power need to be employed. “If we can make power state switching faster, we can take advantage of ever-shorter moments of time in which the circuit is going to be idle, which means all the control must be moved back into hardware,” Wingard said. “We’ve seen examples where we can take processing pipelines that are working on the latest HD 4K video frames and take more than 30% of the energy out of them by taking advantage of the structural idle times that are present in the data stream. There is no way to do this in software.”
Not everyone agrees. Guillaume Boillet, technical marketing specialist for power solutions in the emulation division at Mentor Graphics, contends that relying on hardware rather than context switching adds complexity to the design. “It definitely makes the job of the power architects even more complex. I don’t think the EDA industry is going to come out with a full-blown solution any time soon that will make the power architect disappear. The only thing the industry will be able to do is provide better metrics in order to make better decisions,” he said.
There is agreement that the power state transition delay is critical, because it can impact performance and energy used in the system while everything else waits for the transition to complete. But the best way to deal with that is open to debate. “You need to look at each power state transition in terms of its impact on energy and performance in the end application to determine which ones should be improved,” said Ashley Crawford, power architect at ARM.
Breaking down the problem
Mentor’s Boillet said it is important to understand the two types of power state switching delays. There are power state switching delays related to the power supply recovery, such as the time it takes to bring back the supply where it should be. The other type of power state switching delay is the state recovery/context switching time it takes for the block that was switched off to go back to a state where it can once again compute what it should be doing.
Fundamentally, power states switching delay exists because all the switches cannot be turned on at once. That would create very high current rushes that would lead to voltage drop, and the timing wouldn’t be met, he explained. Further, having too many power domains makes the power grid too complex, so there will be I/O drop issues along with supply integrity issues.
“On the state recovery side, for blocks with configuration requirements, those blocks need to be configured before doing computation,” Boillet said. “For those, there are two ways to recover the state it should be in. The first is to use retention, which can be done through implementation of retention flip flops. This is pretty complex. When you want to optimize and use only the flip flops that are required to recover the state, it can be complex. The alternative is not to go all the way to completely shut off the power, but to use a very low voltage that allows for state retention, but will not allow access to the blocks. Yet another alternative to retention is to use context switching. That’s what people do when they want to have retention but prefer to have the system deal with bringing back the information, maybe from RAM, into the configuration registers of the switchable power domain.”
For state recovery there are other blocks that do not require state recovery, or maybe have very little configuration that is easier to deal with. In this case, for those blocks, it is a little bit easier for a power architect to decide to make it a switchable power domain because the overhead is not that big, Boillet continued.
But why is the delay important? “Obviously, if you end up with a very nice system on paper where you switch off the switchable power domain very often, and you go on and off very often, all those things have a cost. It takes time to recover, so maybe in certain cases you even stall the rest of the system. But also, all the context switching has introduced new activity and extra power. It’s all a question of tradeoffs, and those are very complex considerations that will justify that most companies have power architects,” he said.
Luke Lang, director of low-power product engineering at Cadence, agreed these are complex decisions. “If the time horizon for switching is very short —100 nanoseconds or so — there really isn’t much power savings. For example, when turning power on and off, in the ‘off’ state all the charge has been dissipated, and when you want to turn it back on, every node that’s going to be a 1 state needs to be charged up to the Vdd voltage. It essentially becomes a problem of charging capacitors. If you charge too fast, the power supply will start ringing, and you must wait for the ringing to die out, which slows you down overall. So you cannot charge too fast. Let’s say you charge as fast as possible without ringing, to anything slower. There is still a finite amount of current you need to deliver, so it really doesn’t matter whether you do it at the fastest possible rate without ringing versus any slower. There is no power savings.”
On the other hand, if the time horizon is opened up, then switching fast does help a bit. This is true with a PC screen saver, for example.
“Typically, we’re in front of a computer and the screen saver is set up for 5 to 15 minutes of inactivity, but you just don’t know when there is inactivity or the screen saver is a form of power state changing,” Lang said. “What happens is if you are able to wake up very quickly? Then you don’t need to wait quite as long. So if your screen wakes up instantaneously, then one millisecond after a keystroke you can go into screen saving mode or low power mode. Then the next keystroke comes and you are immediately back. The user doesn’t complain. Using that as an analogy, if you apply it to a cellphone, if within a millisecond of detecting silence on your phone, it immediately goes into low power mode, switches a bunch of stuff off, then whenever it detects any voice from either end, if it can wake up without any lag at all, that gives a lot more opportunity to go into lower power mode. As such, there is definitely an advantage in being able to switch faster from that point of view. You don’t have to wait so long to say, ‘I’m sure I can go into low power mode.’ You can go into low power mode very quickly because you can come out of it quickly, and the end user will not see any difference.”
Metrics are useful when comparing the switching rates of different power management approaches, but even those can be complex.
“Metrics are typically unique to a specific component in the context of the application,” said ARM’s Crawford. “In some cases there can be specific wake-latency goals that no technique should violate, but this is not always well defined. The conventional approach is then to rank the value of the possible techniques in terms of break-even time and opportunity time, then estimate the power savings any performance impact based on that.”
It’s really a combination of metrics that will be important, Boillet stressed. “The first one is the power consumption. Whether it is the average power you care about, the longevity of the battery, or peak power, or if you have thermal issues, or IR drop issues. The other is the complexity of the design. Let’s summarize it into the time to tapeout. Power is on one end, time to tapeout is on the other end. How do we get there? It’s definitely unrealistic to expect any EDA tool to generate the power envelope for a representative usage of the system, meaning, you’re going to have the power over time for full contributors for one second of simulation or emulation of your smartphone chips.”
This won’t happen, so emulation uses clever models that will give a good sense of how the power envelope will vary over time based on the register activity, based on the memory accesses and other factors, he said. “This enables scenarios wherein an emulation use case, the power states are overlaid on top to determine if there are portions of the design where a block is on, even if it’s not being used. All the nodes, where a switchable power domain may not be used as efficiently as they could, can be identified. As well, delay can be examined because the other advantage of emulation is that it has a pretty good representation of all the delays, how much time it’s going to take, the recovery, taking into account the memory transfer from the system to the block. Design teams take all those things in perspective to decide if it makes sense to strengthen their switching scheme and to decide if it makes sense to extend or add new power off failures.”
Another metric that can be utilized is millions of power state transitions per second (MSPS), Wingard said. That addresses the strength of the control system, including how quickly things be moved through the different states. To go that fast, is there one circuit that needs to go at a million power states per second? “We have seen some examples that are that much, but it’s probably not a very common thing. However, when we look across the whole chip, we find that these idle moments sometimes happen at the same time as each other, and sometimes they don’t. As a result, the other big challenge with doing it on a processor is that the processor can only do it for one part of the circuit at a time because processors are serial. If we build it using hardware state machines, then we can actually go in parallel, so we can scale up the number of power states we can handle per second based upon the number of controllers that end up being built.”
Wingard said he has seen examples where between 10 million and 12 million power states per second could be used in just one subsystem of the chip. “If we compare that against what we think you could be able to do with a processor, we think it’s hard to get above about 100,000 power states per second using a dedicated microcontroller. If you thought about doing it under the control of the operating system on the host processor, we think you’d be down in the tens of thousands of power states per second, so we’re really talking about a couple orders of magnitude more capacity here. The result of that is that you can take advantage of shorter idle times or moments, but also you can be more deterministic.”
One of the biggest reasons why system architects don’t take advantage of an idle moment to shut things down is they don’t know when it’s time to wake back up, and then they have a responsiveness requirement. By the time they know it’s time to wake back up, it’s often too late. The alternative is an always-on design.
So how does the chip architect or design team choose which approach to take?
It’s not always clear. Boillet said these tradeoffs are very subtle, and what the power architects are using are high level modeling solutions that are, most of the time, incomplete.
“It’s an art, and it’s a question of understanding the implications not only in terms of power, but also in terms of design implementation complexity, which is also very important,” he concluded.
The Time Dimension Of Power
Power is a complex multi-dimensional, multi-disciplinary problem. Does your flow address all of the issues?
How Small Will Transistors Go?
Leaders of Imec, Leti and SRC talk about what’s after 7nm, who will play there, and what the challenges will be.
GaN Power Semi Biz Heats Up
Technology makes inroads in power supply market, with electric vehicles and fast-charging adapters on the horizon.
MRAMs offer less volatile cache to address the dark silicon dilemma. What happened to Dennard’s Law?