System-Level Power Modeling Takes Root

Why modeling power much earlier has suddenly become so critical for so many applications.

popularity

Power, heat, and their combined effects on aging and reliability, are becoming increasingly critical variables in the design of chips that will be used across a variety of new and existing markets.

As more processing moves to edge, where sensors are generating a tsunami of data, there are a number of factors that need to be considered in designs. On one side, power budgets need to reflect that many of these devices will need to do more processing on a single battery charge, or in safety-critical applications where power and related physical effects can impact reliability over the projected lifespan of a device. On the other side, these devices will be deployed across a wide variety of use cases that include inconsistent coverage for wireless communications, sometimes extreme environmental conditions, and varying interactions with other devices, any of which can impact how much power they consume.

All of this information needs to be modeled and simulated, and increasingly it needs to be simulated in the context of an entire system. But that all has to happen earlier in the design flow, when decisions about power can be corrected in the context of other effects. And it has to happen for markets where sometimes there is no history of advanced electronics, or where safety depends on sustained functionality over time.

“Besides the static and dynamic power consumption, this also provides power estimations related to expected use cases, particularly by integrating the application software and simulating realistic usage scenarios,” said Roland Jancke, head of the department for design methodology for Fraunhofer’s Engineering of Adaptive Systems Division. “Activity patterns, or so-called mission profiles, on a macroscopic level are as important as the duty cycle is on a microscopic level for assessing the overall power budget. The same holds true for reliability analysis when judging whether lifetime targets are met. It allows estimating the expected workload of the processor and/or the platform, and helps to identify bottlenecks.”

Starting with the big-picture architecture has always been important for performance and functionality, but the growing emphasis on power modeling this early in the design process is new.

“This approach provides early decision guidance for selecting a suitable overall architecture, dimensioning of the battery and power supply components, or even choosing a supply by energy harvesting,” said Thomas Markwirth, an engineer in Fraunhofer’s EAS Functional Modeling & Verification working group. “Moreover, system-level simulations are a starting point for further power-aware optimizations regarding memory concepts or algorithmic implementations, for example.”

Steven Woo, vice president of systems and solutions and distinguished inventor at Rambus, agrees. “You need to model things end-to-end, from one chip to another. There may be a low signal change, and you need to understand the power implications all the way through. But it’s more difficult to model that at the speed it’s going at. The bottom line here is that everything has an impact.”

Cascading effects
Power problems get worse at each new process node. Static leakage, dynamic power density, RC effects, and a variety of power-related problems used to be dealt with much later in the design flow. That’s no longer possible, even when designs include multiple chips in a package. And problems in one area of a design can easily cascade into problems in another area due to thinner gate dielectrics, rapid on-off switching over time, and even process variation.

There are a number of considerations that come into play here. “One aspect involves energy,” said Anush Mohandass, vice president of marketing and business development at NetSpeed Systems. “Good engineers talk about bandwidth in terms of gigabytes per second, or what the upper limit is in terms of wattage. The really sharp architects talk about energy, energy movement, and what that means. They don’t just talk about gigabytes per second. They say gigabytes per second per watt so they include the concept of energy. And as energy directly corresponds to the battery life, it’s pretty intuitive. You use more energy. But there’s a fixed amount of battery that you have, so you should care about energy because that impacts the standby battery or the per-use pack.”

Another aspect of power modeling is thermal efficiency and thermal heat mapping. “In regard to thermal maps, you’re worried about the thermal distribution and why it matters, because if a portion of the chip becomes really hot, you shut it down or you lower that characteristic,” Mohandass said. “This affects the performance. Thermal efficiency and heat mapping come into play when you’re looking at thermal characteristics and how they affect performance. That’s the primary aspect. The secondary aspect is task management. Because you have these heterogeneous architectures, what’s going to happen is my big CPU goes on for a short time and then it hands it over to the little CPU, after which it hands it over to the GPUs. Then there is coherency on top of it. Here, task management becomes important because task management impacts the thermal heat map, which affects performance. So when people talk about power management, they are just thinking battery life. Battery life is one thing, but in fact power management has a much more direct impact on your performance of the chip, especially in heterogeneous hardware.”

There is no shortage of tools to handle these issues. In fact, for the past decade EDA vendors have been turning out new tools to handle every aspect of power, heat, aging, and other physical effects. But in the past, the real focus was on power as a gating factor for performance rather than a primary concern at the early stages of the design. This shift is subtle but significant.


Fig. 1: The case for good power modeling. Source: Cadence

“For the architecture use case, it used to be all about performance,” said Tim Kogel, solution architect at Synopsys. “And for early software development, it used to be all about bringing up the software, making it work, debugging it, and if there is a problem, testing it. But the power at that level — system-level power analysis, as opposed to the detailed sign-off power analysis that you do at the gate level — for both of the virtual prototyping use cases, the system-level power model is like an overlay to that virtual prototype.”

Then along came the UPF 3.0 standard, ratified in 2016 to standardize the format for the system level power monitors. “That at least provided a way to express the power consumption of the component at the level, typically, of an IP block for one core or one cache or one accelerator or one memory,” Kogel said. “For each of those there are system-level power state machines that can calculate the power consumption. Then the power monitors overlay the virtual prototype, and when the virtual prototype is built, by adding the power models, system-level power analysis is possible more or less for free.”

To determine system-level power, many engineering teams today try to add up all the power consumption statically. The problem is that approach omits many details, particularly how the utilization and the activity of the different components change over time.

“It used to be okay to assume some average power consumption for each component, but today you have dark silicon effects,” said Kogel. “The SoCs are so big and so aggressively power managed that everything that’s not active is shut down, which means average power doesn’t really tell you anything. You really need to look at the use case. How does the use case activate the different components, which then start to consume power? That is changing all the time and is a transient power consumption wave that must be looked at. This is when you have a virtual prototype that simulates activity either at a high level, architecture workload level or, based on running software, that is what gives you much more realistic results for the power consumption over time.”

System-level power modeling grows
The adoption of system-level power modeling began quite slowly after the UPF 3.0 standard came out about two years ago, Kogel observed. “We tried to raise the awareness through seminars and articles, but there was always pushback about the availability of power models and the accuracy of the results. Users typically compared system level power models against sign-off power analysis. The reluctance is understandable, because it’s a different way of thinking. With modeling you need to take more responsibility for the results. For example, with power models the results depend on the characterization. In the beginning, the characterization is based on estimates.”

Much has changed recently, for several reasons. First, increasingly complexity is pushing engineering teams to move to higher levels of abstraction. When the established methods break, then people are more willing to listen to new ideas. Second, computational intensity continues to grow, but the battery capacity in mobile devices and the cost of cooling for data center applications do not scale the same way.

“That has been true for years, but with of state-of-the-art applications like 5G, AI, autonomous driving/flying, we seem to have reached a point where engineering teams must look at power in a more holistic way,” Kogel said. “There are now deployment projects with system level power analysis in advanced application domains including ADAS, 5G, AI modems, which optimize architecture for power, and 4G modems, which optimize power management software.

In addition, the gap between fine-grain power management (clock-gating, power gating, voltage/frequency scaling) and high-level power management (OS power management) has become so large that it can only effectively be managed using a system-level model.

NetSpeed’s Mohandass takes that a step further, saying the practice of system-level power modeling is ubiquitous and pervasive, and for companies with sufficient resources, this is done with emulation. “You actually put the design in emulation and run it with the software. You bring up the OS and see what it means in terms of power, power-gating efficiency, or clock gating efficiency. You can do it in emulation or you can do it in simulation. The key here is this is not one of those things where you say, ‘I’m going to do my chip, and whatever the power is, the power is.”

Power modeling in context
Power modeling is complex enough by itself, given all the possible use cases. But it’s hard to strip power away from related effects such as thermal stress and aging, particularly when it comes to applications where safety is involved.

“Aging and other reliability issues used to be a niche application where you had two or three engineers in a company analyzing aging in isolation,” said Hany Elhak, director of product management and marketing Cadence. “So they’d look at hot carrier injection, BTI (bias temperature instability) and self heating, which in the past was separate from aging. But you need a way to combine all of these phenomena together. It requires a holistic approach to model device degradation and wear-out.”

The result is that thermal simulation increasingly is being combined with electrical simulation and power simulation to understand all of the possible impacts of power. “Near the edge, you may have a different temperature than in the middle of a device,” said Elhak. “That affects the aging analysis and the number of field failures, and the goal is to get the field failure rate down.”

It also has a spillover effect on IP. Most commercial IP on the market today is enabled with UPF, because that’s how SoC architects are modeling power, noted Navraj Nandra, senior director of marketing for interface IP at Synopsys. “[System architects] basically capture all sorts of power type information like power domains, power states, state transitions from power consumption in spreadsheet which is then imported into a tool. They can include support for such things as power islands, and retention cells, which allows them to define registers to retain the state of that particular power transition. This is all done through UPF, which also allows engineering teams to run a bunch of different scenarios in simulation by changing the toggle activity in the various power states and then the power consumption can be predicted from that, along with the power model of the SoC.”

A very heterogeneous future
The list of drivers for power modeling is growing. Heterogeneity in designs is almost a requirement for energy efficiency, which explains why focus is on highly specific accelerators rather than large CPUs or GPUs. “This is because there isn’t typically enough memory bandwidth, and the thermal image would all be red,” said Mohandass. “That’s the reason why there is coherency—so you don’t need to go all the way across the memory yet and you don’t need to do cache flushes. This is why hardware coherency comes into the picture, because if this were done in software, the memory caches would have to be flushed. Not only are you wasting precious memory bandwidth, you’re wasting a lot of power.”

Additionally, the growth in the amount of data that needs to be processed for new applications such as machine learning and AI will require design teams to pay much more attention to power. “There’s a whole machine learning set of chips being developed with the idea to put as much of that inference training on the edge device,” said Nandra. “There are data centers doing all sorts of big data analytics and number crunching. Then there are edge devices, for example, like smartphones that have to do the facial recognition, or cars that need to quickly recognize a moving object. The peak demand on power in the edge device basically is when you’re trying to resolve an image, and the whole idea of the power modeling is to figure out when that training and inference is happening and to apply the optimum amount of power. At the same time, once you’ve got an idea that the image is in that recognition cycle—because you can slow down and do the rest of the calculations, and because the inference algorithm on the edge device made the major decisions about the power for the number crunching—you can then optimize the processors operations at a lower clock speed to save on power.”

To do all of this, UPF hooks are needed in the IP to communicate to the SoC, because a lot of the power consumption is happening at the interfaces, he said. “You want to make sure that once you’ve got that information and the processors are starting to run mathematics, you can then put those interfaces into a low power mode,” Nandra said.

Conclusion
A growing list of factors, ranging from an explosion in data to more use cases and new market applications, is forcing power to the forefront of the design process. In the past, power was considered an important factor, but not necessarily a critical one. Increasingly, it is being seen as a critical element in achieving the necessary performance and continued functionality of a device over its projected lifespan.

As a result, getting the power model right is no longer just about a few extra minutes of battery life. Increasingly, it involves functionality of a device across a wide variety of use cases and applications, and that is causing a lot more people to take it a lot more seriously than ever before.

—Ed Sperling contributed to this report.



Leave a Reply