Designing For Extreme Low Power

Power is becoming a differentiator in many designs, and for IoT and edge devices it may be the most important competitive differentiation.

popularity

There are several techniques available for low power design, but whenever a nanowatt or picojoule matters, all available methods must be used.

Some of the necessary techniques are different from those used for high-end designs. Others have been lost over time because their impact was considered too small, or not worth the additional design effort. But for devices that last a lifetime on a single battery or scavenge the power that they need to operate, no stone can remain unturned.

“For the past three decades, people have optimized and reduced the power by going to lower geometry nodes,” says Anoop Saha, senior manager of strategy and business development for the Calypto group of Mentor, a Siemens Business. “But for IoT devices that are operated by a battery, that is when people start to get concerned about microjoules and picojoules of energy. Then, extremely low power becomes much more critical. I see a gap between what exists now versus what the market needs.”

There are several levels at which power can be optimized. Dave Pursley, product management director in the Digital & Signoff Group at Cadence, equates this to Maslow’s hierarchy of needs for humans. “The most basic need for low-power design, or really any design, is that the silicon needs to work and work reliably.”

There is progress that can and should be made at this level of the hierarchy. “One of the most significant roadblocks in this area is the need for comprehensive low-power models,” says Roland Jancke, head of department for design methodology at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “These are provided by the design systems, parameterized by the foundries, and used by the designers during their development flow. Technology in this area seems to be progressing faster than development tools and standards.”

Moving earlier in the design flow, you are more focused on the need for optimization, says Pursley. “Design and implementation teams focus on minimizing the power of the design. Multi-mode multi-corner optimization, multi-bit cell-inferencing, multi-Vt leakage optimization, clock gating, power intent, etc., are all focused on minimizing the power of the design. In some cases — many cases, in fact — relying on optimization and sign-off will meet your needs. Your silicon can and will survive with that level of attention to power.”

One significant difference between high clock frequency designs and those that operate with lower frequencies is the amount of logic that can fit between flip-flops. “You can have deep data paths because you do not have to run these circuits at really high clock frequencies,” says Rob Knoth, product management director at Cadence. “However, this increases the amount of glitch that happens. This can significantly increase the total power especially if it is a transport glitch. That is a glitch that does not get filtered by a gate and can cause an actual switch. So long as it settles out before the clock edge, there is no functionality problems, but the power switching between the edges is going to go up. That is causing people to look at more glitch tolerant logic design and also causing EDA vendors to get smarter about incorporating more intelligent glitch analysis and optimization.”

Controlling power became more important when Dennard scaling stopped about 20 years ago. While new nodes do provide lower power, they do so at increasing cost. But the power does not reduce at the same rate as size, meaning that power density may increase. “Going to the lower node is not the only way to reduce power consumption,” says Mentor’s Saha. “I’ve seen examples where a special-purpose chip designed and optimized for your particular use case consumes less power than a general-purpose design implemented at 7nm, even though the optimized design is at a higher node.”

Most development teams are satisfied by these types of optimization. “Nanowatt and picojoule systems are achievable today, but require a number of power saving tricks to be deployed together,” says James Myers, distinguished engineer at Arm. “This includes power gating everything possible to suppress the leakage, stopping or slowing all the clocks to avoid wasted dynamic power, using vector processor extensions to reduce cycle counts, and then dialing down the voltage as far as possible.”

Continuing above that in Maslow’s hierarchy is a whole different world. “When designing for extreme low power, with nanowatt and picojoule concerns, it is insufficient to create the lowest-power implementation of the design,” adds Pursley. “You need to create the lowest-power implementation of the lowest-power design.”

All three levels are important. “To achieve lower power consumption, the industry will require improved SoC design techniques, design optimization and customization, and process technology scaling,” says David Su, CEO of Atmosic. “These techniques together will result in an extremely low-power wireless solution. Low-power radio technology is designed to enable connected devices to operate with minimal power, maximizing battery life. By reducing power consumption and extending battery life, we will see IoT solutions in which the battery will last the lifetime of the product itself.”

And it is important than none of the levels are ignored. “Capacitance, voltage, and frequency – those are things under the designer’s control,” says Preeti Gupta, head of PowerArtist product management at Ansys. “This is where we see a lot of people thinking about how to play around with the supply voltage – be it scaling it down or using multiple voltage domains or using power gating. We hear about dynamic frequency scaling. Clock gating is about shutting off redundant activity. There are a lot of techniques that are being applied early on through algorithmic considerations up until the last stages, where you are doing multi-Vt optimizations or pin-swapping, or path-balancing to reduce power.”

Importance of architecture
Most IoT edge devices are basically fairly similar. “The chip basically has sensing, processing and communication,” says Kurt Shuler, vice president of marketing at Arteris IP. “There is usually one sensor, or multiple sensors attached to it. These things are polling or communicating periodically. They usually have a part of the chip that they call ‘always on’, even though it’s not always on. It’s doing the communications and checking to see if there’s anything from a sensor. Compared to a mobile phone, or some AI chips or an ADAS chip, these chips are not huge. These are really tiny chips, but the power management within them is really complex.”

There are other applications that may look quite different. “There has to be domain-specific architecture innovation,” says Saha. “There has been quite a lot of research about where power is going. For example, in compute, a lot of power consumption is associated with the off-chip DRAM access. So how do you optimize that? You can reduce that by either changing your software, so that you minimize the number of DRAM accesses, or by changing the hardware so that you have more memory available closer to the compute unit.”

For other applications, having small systems dedicated to a specific task works well. “On-demand wake-up technology allows end-point devices to listen for incoming ‘wake-up’ signals while remaining in a very low power state,” says Atmosic’s Su. “This not only reduces the system power consumption by an order of magnitude, but also reduces the collisions of signals in the air by keeping the beacons in stand-by mode.”

Getting things right at this level requires analysis. “How do you find that best low-power architecture?”, asks Pursley. “The solution space is huge. Algorithms can change; architectures can change; the hardware-software boundary can change. In years past, this tradeoff analysis was based on, at best, some back of the envelope calculations and a heavy dose of, ‘I know how I did it last time’.”

There are many questions that need to be answered. “Where is the right hardware-software boundary?” asks Saha. “How you figure out the hardware and software co-design is a critical part of it. Which part should go into hardware, and which parts should go into software? What is the right memory structure, what is the right quantization, and what is the right aspect of different micro-architectural features? If you don’t use high-level synthesis (HLS), a lot of these decisions are made before you start writing your Verilog code. That is a problem for low power devices because you don’t know what the most optimal architecture for your application or your design will be, you need flexibility. You need to be able to change things very quickly and measure it quickly.”

“For digital designers and architects, high-level synthesis (HLS) allows them to quantitatively evaluate these architectural decisions and quickly create and evaluate RTL for a wide range of architectural tradeoffs,” adds Pursley. “The integration of HLS with logic synthesis and power estimation gives designers and architects quick, early, and accurate power, performance, and area analysis.”

Another recent power optimization strategy that exists at the boundary of hardware and software is to utilize extensible processor architectures. “I have seen people using high-level synthesis to create custom instructions in the processor,” says Saha. “This could be an instruction in the processor, which might make sense if this is a small repetitive task in your application, or you could keep it out of the processor, as an accelerator. These are architecture decisions. For some applications, custom instructions will be optimal, but there will be many applications doing things in an accelerator is more optimal.”

Sometimes where you place the power control can be important, too. “There are various combinations and permutations of clock gating, power gating or other power optimized states,” says Arteris’ Shuler. “There could be more than 20 different ones. And they’re turning on and off different things and clocking things at lower or higher frequency based on exactly what it is they’re doing. The state machine that has to deal with the power modes is quite complex. The network on chip (NoC) is the highways and byways between all these blocks and subsystems. And that’s where they gate the power and bring things back up very quickly. Not only do they have to go through all the states, they have to go through them very quickly. With power management in the NoC, it’s actually very advanced.”

Reducing voltage
Power consumption has a quadratic dependence on voltage. “But reducing voltage is the hardest power saving trick to deploy,” says Arm’s Myers. “It can return 4 to 10X gains at near- and sub-threshold voltages. What ultimately limits us here is DC-DC conversion efficiency and non-volatile memory access energy, though both are seeing improvements lately. But there are additional design costs that could be eased by improved foundry and EDA support. In particular, variability at low voltage calls for adaptive techniques, which can be difficult to sign off, and device leakage is not always accurately characterized to picoamp precision.”


Fig 1: Minimum energy point is usually slightly above threshold voltage. Source: Arm

One of the problems with very low voltage operation is that variability can have a significant impact. “The incorporation of a highly accurate PVT monitoring subsystem supports the semiconductor design community’s demands for increased device reliability and enhanced performance optimization,” says Ramsay Allen, vice president of marketing at Moortec. “This enables schemes such as, AVS and power management control systems.”

Domain crossings are created across all voltage and frequency boundaries. “Anywhere that you are changing the clocks or creating a power domain — that’s a domain crossing,” says Shuler. “With power you need level shifters, and with clocks you’re looking at an asynchronous connection. We have dedicated tools for the design and verification of such crossings today, and in many cases this difficulty can be effectively hidden within the interconnect. What happens is that when they choose a process, the level shifters from that library go into these digital containers, and they have all be pre verified.”

The importance of vectors
Still, there is plenty of opportunity to optimize for the wrong things. “You have to not only think about how to measure power, but also what you measure,” says Saha. “It’s a function of the architecture and the stimulus and how the system behaves. How you design the system is important. How you measure and figure out what you measure is important. And then the third part is how do you optimize what you have?”

Not considering use cases is a common mistake. “Many people do not think about the stimulus used for power estimation and optimization,” says Pursley. “It’s all too easy to miss power issues by being distracted by what turns out to be a local maximum, possibly something that can never be seen. At the exploration and early design phases, you are likely looking to minimize overall power. The stimulus you use for that is unlikely to show a worst-case power issue post-place and route, just as stimulus designed to expose a corner case is unlikely to be very useful when minimizing power under typical use models.”

Conclusion
There are many things that contribute to the total power or energy consumption of a device, but when looking to create extremely low power devices, no stone can be left unturned. There is a saying that what may have been a difficult solution, starts to look like the best option when the traditional solutions get increasingly more complex or difficult. That is certainly happening in the power domain, where techniques such as near-threshold design are seeing increasing levels of attention.

Editor’s Note: A tutorial and panel session at DAC will address some of the challenges and opportunities for near-threshold design. The session airs Tuesday July 21st at 3:30pm Pacific Time.



3 comments

Sakthi says:

It’s a really nice and concise article!

Theodore Wilson says:

Excellent as always Brian!

fish says:

Is separating pull-up and pull-down paths, if at all possible, being considered as a power reduction method ?

Leave a Reply


(Note: This name will be displayed publicly)