From initial concepts to final sign-off, power has become one of the most challenging design problems.
In the quest to get SoC power right as early as possible in the design flow, it still holds true that the biggest impact occurs at the beginning of the project, with diminished results as a design progresses through the flow toward tapeout.
ARM’s big.LITTLE architecture has gained a lot of traction here, prompting MediaTek to introduce its Tri-Gear big.Medium.LITTLE type of approach last month. Both of these approaches can have a profound impact on power consumption, so the question really becomes how can you make these decisions with confidence.
Fig. 1: Task load distribution of scenarios. Source: MediaTek
“Is it based on spreadsheets?” asked Frank Schirrmeister, senior group director, product management in the System & Verification Group at Cadence. “How do you legitimize trying it out? This is done with lots of simulation and analysis. Some things you even overlook, which is the notion of thermal. In the power flow, you put in more cores to run in parallel, and for a certain time you may run in overdrive and get the full performance. But then, over time, you run into thermal limits, which is where the 8-core/4-core discussion comes from. Some people argue that once you go beyond 6 cores, if you run them long enough in parallel will have thermal issues.”
As with almost anything, results vary greatly by implementation. The energy efficiency of different processors using the same instruction set can have very different energy efficiency curves.
“The idea of big.LITTLE is that when you need a large amount of performance, you can be more energy efficient with a larger processor that can deliver that performance, and still retain its sweet spot,” said Drew Wingard, CTO of Sonics. “But there are other times in the running of the application where that kind of performance isn’t required, and there is a point at which slowing down the clock rate on the big processor doesn’t work well. And there’s a point where reducing the voltage on the big processor doesn’t work well because things start to get into really bad switching regimes.”
The microprocessor could stay running at a higher voltage, trying to get its job done, and then be power-gated, Wingard said. The challenge is that it’s difficult to figure out how long a general-purpose microprocessor will be in the off state, and when it needs to come back on. There is also a question about how much state needs to be restored before the voltage can be brought back up to another level. If it is shut off completely, the power is turned off.
“Microprocessors running an operating system often are idle until the next interrupt, and they don’t really know when the next interrupt is,” Wingard said. “The big.LITTLE idea is that when I get to that place where I’m really just waiting for the next interrupt, I might have a little bit to do, I’ll use a smaller processor that in general has a shallower pipeline. That’s one of the ways it ends up being more energy efficient at lower frequencies, and typically it doesn’t have quite as many tricky features to try to improve the instructions per cycle, which is a tradeoff that spends energy to try to get more performance. So you take the microarchitectural differences, you build a processor that far lower gates and therefore does a better job at any given voltage, but one of them still stays on essentially all the time. That’s a reasonable approach.”
But Sonics contends there are relatively few places across a big chip where that approach is attractive. And typically there isn’t such a huge dynamic range between the more active times and the less active times. So for things like power gating or retention voltage switching — where the local supply voltage is reduced to a place where all of the internal registers and memories maintain their state but the chip is not at an acceptably high voltage to safely run the logic — this can greatly reduce leakage current.
“Those are powerful techniques that can easily take advantage of the highly modal behavior in some chips,” he said. “If I know in this mode that I’m never going to use that part of the chip, it’s really easy to say I’ll put it into a power domain that can be switched off. A lot of these blocks have these periods of time where they are relatively idle that are pretty short. We call them idle moments, and if you can implement all of the power control including detection of the idle moment, the sequencing machine that actually says, ‘Now we’re going to stop the clocks, isolate the outputs, apply the resets, unhook it from the power supply,’” then the impact can be significant.
These sequencing steps can be applied in hardware where a hardware state machine is created to do that. This allows the detailed control to be done at the lowest level, because some of the steps have delays in them, and some of these steps have a little state machine associated with implementing these steps. The good news is that if all three of those things happen in hardware, he asserted, “you can go about 500 times faster than what you would expect to be able to do when you are running that control in software in a microprocessor, and that’s really interesting because it lets you do a couple of things. One, it lets you try to go after idle moments that are much shorter because you don’t spend as much energy in the transitions. Two, it lets you do it completely automatically, so you’re below the consciousness of the software stack. The buffer doesn’t even know you’ve shut this stuff off — it just happens. And if you can do that fast enough, then the driver software or the operating system never has to know.”
This way, when looking back at the processor side, where big.LITTLE happens, the same concepts can be applied to the processor. This allows the engineer to imagine they’ve got a system where they start power gating the little processor, and wake it back up only when an interrupt happens. Previously this couldn’t be done because the processor did the control, but if this technique can be done fast enough, then the interrupt response deadline can still be hit.
The impact of packaging
Advanced packaging approaches affect the power flow from implementation to verification, as well. This needs to be mapped out at the very earliest stages of a design, however, and validated as early as possible, Schirrmeister said. That includes making sure that in the context of software, enough use cases and scenarios can be run through it to fully verify the power.
This affects activities such as power shutdown, which may have an impact on where the caches are at the time a certain region is shut down. Tools now can generate test cases to cover those types of scenarios.
“While you can do some relative assessment with higher-level models like the ARM Fast Models, when you want to do more quantitative, then you’ll go to the cycle-accurate models, or run it in an emulator and use an engine to predict from RTL the right pieces, Schirrmeister said.”
There are many considerations that need to be taken into account in an SoC, though. It’s not just about the number of cores. Brandon Wang, solutions group director at Cadence, noted that 30% to 40% of power consumption occurs in other parts of the chip, such as massive I/O for DRAM. Reducing that power consumption will have a far greater impact than core power, he said.
This is where advanced packaging can play a significant role, because it shortens the distance between components such as memory and processors, while also improving the bandwidth for that communication.
“That’s another benefit of reducing the signaling power across different die, because those can be achieved at the same time when the area is also shrinking,” said Wang. “Basically, by reducing the I/O buffers, the reduction of the I/O buffer size will also reduce the parasitic capacitance of the I/Os. And since power is CV², reducing the voltage of supplies is one thing but reducing C is linearly reducing the dynamic power.”
System-level power analysis
All of this points to the need for a better understanding of how power will be used in a design and what the impact will be. The standardization of IEEE 1801/UPF 3.0 is a step in that direction, helping define how power models work with performance models.
“Given the standard is there, people are starting to use this to look at the combination,” said Pat Sheridan, product marketing senior staff for virtual prototyping at Synopsys. “How does this affect downstream implementation and sign-off? If you’re the architect, you do your studies and you can share your findings with the hardware team and the software team. What you are typically doing at that point is understanding, given a certain workload on top of your architecture, what is the amount of time spent in a particular piece of hardware, such as the relationship between activity, performance and power.”
The architect can look at a budget, then can provide information about given certain application workload, how much time is being spent in certain subsystems, what power states need to be supported, and what are the voltage and frequency operating points that need to be supported to achieve the overall goals for the architecture. The hardware team can look at that and do their work. And almost equally important is the software team that has to develop the power management software, because sometimes power management is implemented as software running on a dedicated processor that’s responsible for power management. It may have a good understanding of the system and the power states, and then a virtual prototype can be used to develop and debug the power management software much earlier in the process.
Power is now a cost function
Power experts have been warning for years that power needs to be considered at the concept stage of a design. That seems to have finally caught on.
“As I look at the industry, the one thing that’s clear is that power is a mainstream design concern for just about everybody,” said Piyush Sancheti, senior director of marketing for the verification group at Synopsys. “Ten years ago there was a class of customers that cared about power, and another class of customers that were concerned but not necessarily doing anything about it. That has changed. Power has become a cost function for just about every design function that you do, from architecture down into verification, down into synthesis, and place and route. If you are doing mobile, IoT, or end consumer devices, then it’s more than a cost function—it’s a design objective. It’s something that you differentiate over. But even for companies that are in networking storage or the more tethered functions, power is a mainstream design cost function.”
For this reason, chip designers are allocating budgets both from a schedule and a resource standpoint. “It’s reached a point where you don’t sign off until the power objectives are met, whatever they may be,” Sancheti said. “It could be standby, peak power or dynamic switching power — or all of the above. A fundamental shift occurring is that power sign-off requirements are there at each stage of the design flow, and it’s no longer something people look at as an afterthought.”
But it’s also something where right and wrong aren’t always clear. “It’s a sliding scale—accuracy versus performance or access to the models,” he said. “Power has more complications associated with it than other design concerns because it has two fundamental elements, the hardware and what’s being exercised on the hardware. You could be super accurate on the hardware — you could be running a post routed database — but if the activity you’re providing it is completely off the charts, it’s ‘garbage in, garbage out.’”
Related Stories
Reaching The Power Budget
Why power is still a problem, how it will get worse, and what can be done about it.
SoC Power Grid Challenges
How efficient is the power delivery network of an SoC, and how much are they overdesigning to avoid a multitude of problems?
Implementation Limits Power Optimization
Why dynamic power, static leakage and thermal issues need to be dealt with throughout the design process.
Designing SoC Power Networks
With no tools available to ensure an optimal power delivery network, the industry turns to heuristics and industry advice.
Leave a Reply