Low Power-High Performance

Reducing Software Power

Software plays a significant role in overall power consumption, but so far there has been little progress on that front.

September 12th, 2019 - By: Brian Bailey

With the slowdown of Moore’s Law, every decision made in the past must be re-examined to get more performance or lower power for a given function. So far, software has remained relatively unaffected, but it could be an untapped area for optimization and enable significant power reduction.

The general consensus is that new applications such as artificial intelligence and machine learning, where there is minimal legacy software, are the obvious places to introduce new techniques for software optimization. But others see more mature and unchanging areas being the obvious target. Areas where designs are changing rapidly likely will look at large architectural changes rather than fine tuning.

“There was a time when the two worlds were disconnected but I see growing collaboration between software and hardware teams,” says Preeti Gupta, director for RTL product management at ANSYS. “There is an emerging realization that software impacts hardware, and recently some solutions are also emerging.”

Change still is not widespread yet. “There is an awareness spectrum. It is very industry specific and it is not just software versus hardware, it is power-aware or not,” says Rob Knoth, product management director at Cadence. “It is the product space that determines how aware they are. We will continue to see hardware designs that are not particularly power concerned, and you will see software that is designed to run on many kinds of hardware that may never optimize for power. You will see a growing number of system companies that are leading the charge towards more power-aware software.”

Generic software will remain untouched. “When I look at software from an SoC perspective, for the most part, the software is a given at the high level,” says Johannes Stahl, senior director of product marketing at Synopsys. “Application software, games, apps for the phone, programs that run on the server – it is all a given. Could those software developers do more? That is questionable because they are not motivated to do more. The real opportunity is at the lower level of the software – firmware, power management software – they are the opportunity for optimization.”

It takes a concerted effort. “It is the companies that own the whole stack where they are responsible for the software and the hardware,” adds Cadence’s Knoth. “They are looking at co-optimization that can be done between the hardware and software together to see what will produce the most efficient end product. One way that we see this represented is with the concept of energy first. That is where you bring in more than just a static representation of the hardware, and you have to start looking at what the software is doing with the hardware to achieve some end goal.”

Even for high-level software there is some awareness. “When you are talking about apps running on a smartphone, there are some apps that clearly drain your battery while others do a much better job,” says Gert Goossens, senior director for ASIP tools at Synopsys. “There are times when it appears that it is not related to the complexity of the application and clearly indicates that it is a kind of bug. This says that some apps are not written with power in mind.”

But many of those bugs may still be low level. “It could be in the software, or in the hardware when the hardware reacts to software in an unexpected way,” explains Synopsys’ Stahl. “Or there is simply a register that is programmed the wrong way. These bugs are fairly low-level. It is likely that they switched on or off a certain processor or a region on the chip that has a huge impact on power. It is not that the application software was written in a certain way and that caused the power to be higher.”

The same problems can also be found in power-sensitive devices. “If you are writing a significant amount of new code, be it C or RTL, and you have an advances power sensitive platform with shut-offs and sleep modes, you can have power bugs,” adds Knoth. “Someone may have coded the wrong terms for the ‘enable’ in an ‘if’ statement, and you will not catch that with static analysis methods. You will only find it with functional methods where you are looking at the toggles or the watts associated with an operation in software. That requires lots of simulation and emulation.”

Power analysis
It is important that embedded software designers have an idea about where the power is being consumed, and that means they must be able to do power profiling.

“Processor vendors provide tools that do some level of power profiling,” says Synopsys’ Goossens. “This shows you where in the code most of the toggling happens, which is an indicator of where power is consumed. They also show memory accesses, which are a typical source of power consumption. A lot of power is consumed in memory accesses. That creates a feedback loop to the software developer, who can then refactor the code to reduce the number of memory accesses — or ensure that the data is kept closer to the processor, such as holding data in registers rather than in memory. Maybe the processor has a loop buffer, which is a tightly connected instruction memory where you keep the critical code for the loops in the application. This is closely coupled to the processor, and so consumes less power. Also, in the data memory, there may be a certain memory hierarchy and you want to optimize the usage of memory that is the closest to the processor.”

Edge processing is one area where software power is being optimized. “Software developers are being forced to work with ever decreasing power budgets,” says Javier Orensanz Martinez, vice president and general manager, for Arm’s Development Solutions Group. “Performance analysis tools, such as Arm’s Development Studio and Keil’s MDK, can connect to various debug probes to gather power consumption data from the target devices. The tools then present this data and other system information, including CPU and GPU activities, synchronously, which has proven to be a powerful means to understand and optimize the software for power consumption. Even in the absence of physical probes, tools help identify performance bottlenecks like memory bandwidth, which in turn helps reduce power consumption.”

Good analysis enables effective reduction. “Just saying my battery life will be ‘X’ doesn’t help much,” says ANSYS’ Gupta. “The analysis has to be predictable and it had to start earlier than RTL – even at the system level. They know they are getting estimated clock tree power, estimated wire capacitance and synthesis effects such as clock gating. Everyone is concerned about the predictability of power numbers, no matter what level of abstraction you are talking about. When I work with software teams, they want to understand trends. They don’t care if the power is 2X off. What they care about is did power increase because of my changes, or did it go down.”

But while models may exist for the processors, power models are required for the whole system. “I would love to call it a virtual prototype, but it will come down to the modeling paradigm that you use and if there is an industry standard way to model these architectures in a way that provides enough information about power,” says Stahl. “You can argue that it is a combination of modeling the algorithms with respect to what is important to power and then modeling the components of these chips from a power perspective. This is where I think the industry will go and we will be able to extract a power model at a high enough level for basic components, like a processor, and then apply some algorithms on top and execute that with decent speed.”

Where are the models? “The models available in the industry today that can provide the necessary levels of predictability or scalability are lacking,” says Gupta. “With RTL, we have Liberty, we have the standard cells to map to, and we can come up with reasonable estimates. At the system level, we do not have a good way to model that. But that does not mean that the need for high-level architectural decisions is not there.”

While IEEE 2416 was recently released, and provides a system-level power model, the industry has not yet assessed if this is a viable way forward. Today most system-level power analysis is performed on RTL using emulation to get enough data to be useful.

“People are putting more of their systems into an environment where they can simulate larger and larger workloads more efficiently,” says Knoth. “This is being driven not just from a functional standpoint, but from a power standpoint. The more accurately you are simulating how a device is going to be used, the more you can understand the power consumption of it.”

The industry has been building analysis tools around emulation. “At a very high level they are just looking at toggles, not power,” adds Knoth. “How much are the signals switching? This requires running hundreds of millions of cycles and helps them gauge the goodness of the software. That starts to get refined. You bring it down into the millions of cycles and you start looking at watts and not toggles. That continues to get refined down into the world that we are more familiar with, such as IR drop. But it is the front end of the funnel where system architects and designers and looking at hardware and software together to optimize energy.”

Fig 1. Power analysis across the design flow. Source: Cadence

“Fast analysis techniques have emerged that can take the software and run that on an emulator, dump the activity, feed it into a power analysis tool and it creates a quick power profile,” says Gupta. “That profile is on a clock-cycle granularity. Now you know which IPs are consuming power and when. When is important, because average power does not give you the insight of exactly what signal costs where.”

Paradigm change
One of the techniques that some chipmakers have taken is to do more in hardware, and to tightly integrate whatever cannot be done in hardware with efficient software that is developed in conjunction with that hardware. This is particularly evident where software is embedded and can be highly optimized.

The low hanging fruit in this area involves access to memory, such as in a microcontroller. “Sometimes the power consumption of the memory is not necessarily the most significant aspect,” said Paul HIll, director of product marketing at Adesto Technologies. “The MCU itself might be the highest-power user of the system. If you want to send a command, the MCU has to sit in idle mode and check every couple of seconds to see if it’s ready. It’s like when kids go on vacation and ask, ‘Are we there yet?’ Reducing power consumption of the memory is one thing, but if you can eliminate some of the work the MCU has to do, you can save significantly more power. You can’t do any of this without software, of course. But what you can do is add IP to the memory to tell the MCU when a particular process is completed. So you can send a program command to the memory and then the MCU can go to sleep. When the memory completes the task, it wakes up the memory and says, ‘I’m done.'”

Figuring out what gets offloaded to software and what gets done in hardware isn’t so simple, though. It frequently requires a level of domain expertise to understand the tradeoffs in power, performance and reliability. And in some markets, such as automotive and data centers, it requires a proven track record.

“We see this with SerDes, which is sort of equivalent to the software stack in the storage market,” said Hemant Dhulla, general manager of the IP Cores Business Unit at Rambus. “No customer will want to use your storage software stack when you are new. People always want to use a stack that has been beaten up for 5 or 10 years. That varies by market. In automotive and medical, it’s definitely true.”

The challenge is that balancing between hardware and software in some of these markets is new. Automotive electronics five years ago was nowhere as sophisticated as it is today, and software is being developed by companies with no proven track record. But the challenges remain the same, even if the implementations are different. This helps explain why much of the analysis in a variety of companies is done by specialized power teams, and not by the software engineers. But that could change in the future, and both will require the same types of tools.

“The use case that we have for software developers is to give them a software platform on which they can execute software,” says Stahl. “We have no way to give them instantaneous feedback during the execution of the software of the cycle-by-cycle power. That is the task of a different group of people within the company — people that deal with power.”

This means the feedback loop is loose. “There are companies where power profiles have been created from emulator activity and the data has been exposed to the software team,” says Gupta. “The software and power methodology teams have started to talk to each other. This will increase over the next few years. The industry is heading in the right direction, and part of that is the IEEE 2416 models.”

The software paradigm has to change, though. “They have no choice,” says Goossens. “They have to care about power. Methodologies and tools that help software engineers understand more about the hardware and the cost of the hardware including power cost are becoming more important. There are multiple aspects to the design problem. You can ask the question, ‘Is the software good? Does it need to be rewritten?’ This is a question that is independent of the processor and doesn’t matter if it is running on a standard processor or an ASIP (application-specific instruction set processor). The way you write the software will have a power impact. The second part of the problem is the architecture of processor itself. How can you optimize that to get better power consumption?”

Many hardware teams are facing a serious dilemma. “If you offload functionality from the processor and into hardware, you can improve aspects of the problem, but those blocks become fixed,” points out Goossens. “Today, every new product is rich in features and vendors must be able to add features quickly. That means that what you placed in hardware may have to change, and so you have to make them more flexible. Increasing amounts of functionality have to be in software, but standard processors will not do the job. That is why ASIPs are becoming so popular.”

Ethical concerns
The problem is bigger than just corporate profits. “We are at the stage where what we do in electronics and what we want to achieve with electronics is impacting the environment,” says Stahl. “We have to do something about it. Companies need to consider the entire context and see if there is a net benefit to using certain technologies. For example, you could argue that self-driving cars are a huge benefit for society, and if overall you can optimize the energy consumption of transporting people—including the energy consumption needed for the computers that are being used for that—then there will be a net benefit to the environment. That total analysis needs to be done.”

Gupta agrees. “It is high time that software teams started to take advantage of the evolving resources to their benefit. The result would be a greener planet.”

Brian Bailey

(all posts)
Brian Bailey is Technology Editor/EDA for Semiconductor Engineering.

Reducing Software Power

Brian Bailey

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

3D-IC For The Masses

Chiplets Add New Power Issues

Development Flows For Chiplets

New Data Center Protocols Tackle AI

Chiplet Tradeoffs And Limitations

Implementing AI Activation Functions

Die-to-die Interconnect Standards In Flux

Sponsors

Recent Comments

About

Navigation

Connect With Us

Reducing Software Power

Brian Bailey

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

3D-IC For The Masses

Chiplets Add New Power Issues

Development Flows For Chiplets

New Data Center Protocols Tackle AI

Chiplet Tradeoffs And Limitations

Implementing AI Activation Functions

Die-to-die Interconnect Standards In Flux

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored