Maintaining Power Profiles At 10/7nm

Capturing what is driving power in a design means different things to different teams, but it’s essential at 10nm and below.


Understanding power consumption in detail is now a must-have of electronic design at 10nm and below, putting more pressure on SoC verification to ensure a device not only works, but meets the power budget.

As part of this, the complete system must be run in a realistic manner — at the system-level — when the design and verification teams are looking at the effects of power during hardware/software validation. But the best way to do this involves a number of variables and concerns.

First, there could be a specific power budget requirement, such as 12 hours of video play on one charge, or a given performance within the thermal limit. There also may be concerns about the process of developing low-level power management software mechanics and validating the hardware against it.

“The first concern is a power/thermal modeling question, and running a full system with real use cases is somewhere between non-trivial to impossible, depending on the system complexity,” observed Ashley Crawford, power architect and distinguished engineer at ARM. “There are ways of breaking that down to raise the abstraction a lot higher, with both hardware/software co-simulation and ESL approaches, that may yield a less accurate but still useful answer. The key is to use the highest possible abstraction that gets you what you need to know.”

The second concern is about running software, and it makes sense from a shift-left point of view as well as a validation activity, he said. “To make that tractable the hardware can be modeled at a higher abstraction than RTL and the scenarios can be focused just on what needs to be validated in terms of the mechanics alone, not end use cases.”

“When we think of power profile we think of what’s driving the power consumption for most modern chips,” said Piyush Sancheti, senior director of marketing, verification group at Synopsys. “It’s the actual software, and there’s multiple layers, but at the bare bones level it’s the power management software that’s eventually driving the power profile. And if you think about what’s driving power-management software, it’s actually the operating system or the layer above. Eventually it boils down to what you’re doing on the system itself—playing a game, making a phone call, whatever the end application may be. Everything that you do downstream is a means to the end. What people want to know is, when running a certain application, what it means for the power consumption on the device and how to make sure it’s still within normal operating conditions for all different types of applications.”

Fig. 1: Overheating phone screen.

This is not necessarily a new requirement. But what has changed in recent years is the number of design teams that seriously want to do something about it, particularly in connecting the software application to the power profile. “This is not a ‘nice-to-have,’ it’s a design imperative,” Sancheti said. “You can’t really put chips out unless you’ve done something in terms of managing your power profile.”

Also, software was not previously part of the deliverables for chip companies 10 or 15 years ago. “Today, you are not just supplying hardware,” he said. “You are also supplying the complete software stack. So in some ways it’s now become a problem that you have to tackle as a chipmaker. If you’re supplying the power management software to go along with the chip, then the onus is on you to make sure you’ve actually optimized your software for whatever power consumption characteristics the device has.”

This has major implications for design methodology. Traditional power analysis focused on short-duration windows, running the risk of missing power-critical events that may occur when the chip is exposed to its real traffic, according to Preeti Gupta, head of PowerArtist product management at Ansys.

Here again, early visibility into power and thermal profiles of real-life applications, such as operating system boot up or high definition video frames, can help avoid costly power-related surprises late in the design flow.

On the upswing today is the use of specialized hardware, such as an emulator, which can simulate at a much higher speed, makes analysis based on real-life applications possible.

“However, running cycle-by-cycle power analysis of such real application activity can be very compute-intensive and can take weeks, rendering it impractical even with efficient flows for activity transfer from the emulator to the power analysis tool,” Gupta noted. “Using specialized high-performance RTL engines and modern data architectures based on big-data processing techniques, it is possible to generate an accurate per-cycle power profile for very long vectors. These can run several orders of magnitude faster than traditional methods, generating power data early so it can be acted upon for meaningful and timely design decisions.”

Further, fast per-cycle analysis makes it possible to compute power profiles for operating system boot-up comprising hundreds of milliseconds of data overnight. “Enabling coverage across realistic activity scenarios, fast RTL power profiling also makes it possible to perform a comprehensive analysis of the power signature of designs that are sensitive to security hacks, and you can identify modes and blocks that need to be optimized to smooth out the power profile,” she said. “The ability to quickly run thousands of RTL vectors with millions of cycles of activity also identifies peak switching power (di/dt), which can cause large power noise. By directing analysis to power-critical activity windows, risks of design failure from power noise can be mitigated. RTL chip current profiles based on real application activity also enable early and accurate co-design of the chip, the package and the board. At the system level, power consumption can have a direct impact on the thermal performance. Understanding power profile throughout the duration of real life simulation helps you determine and address areas of the design that are consuming most power and in turn causing thermal issues. A fast power profile across a long duration is also useful to model the low frequency current signature of the package and the board along with the high frequency chip current transients.”

Timing is everything
The key is to get to this power data early enough so it can be acted upon, and that means ideally at the system level.

“The power profile is always at the system level in the context of scenarios and software,” said Frank Schirrmeister, senior group director, product management in the System & Verification Group at Cadence. “Remember the age of the teardown reports? They had power charts in them saying, ‘During boot up, this is what a power profile looks like. During video decode, this is what the power consumption looks like.’ Of course, there is the maximum power consumption I can deal with from a chip perspective, and a thermal perspective, as well. But when I’m in a mobile device, or when I am in a powered device that is plugged in, I need to observe thermal limits. Then all of that is connected to the scenarios. ‘While I’m at 100% compute, it’s that much, and I can do that for this time.'”

This is where intelligent power features in tools fit in. They can help tune an SoC for a particular power envelope. “At the upper level of the power envelope there’s the notion of where the physical limiters kick in, such as when the mechanisms on chip say if I exceed this power consumption then I will need to throttle down my processor otherwise my GPU will burn up,” Schirrmeister said.

As it has been for some time, particularly in the mobile space, engineering teams really push the limits of which components they can switch on and still not hit the thermal wall. “There are thermal sensors on the chip that are connected to the power management, which throttles the power, turns down voltage on silicon, or switches off components to make sure that the temperature and then the power remains bounded,” explained Tim Kogel, solutions architect in Synopsys’ Verification Group. “But it’s not just the hardware. It’s how the software is actually using the features that are available in the hardware. Hardware developers think of this in terms of all the low power features they provide with clock gating, power gating, and voltage frequency scaling, but in the end it comes down to how well the software can leverage these capabilities that determines how well you can keep your power profile.”

From a technical perspective, verifying that a design is operating within the power budget, first the heat must be dissipated, reminded Jamil Kawa, a fellow in the Solutions Group at Synopsys. “And packaging is expensive, so if you insist on going for a higher power consumption tolerance it means you are going to spend a lot of money on packaging.”

As such, verification at the top level of the chip is extremely tricky, he stressed. “Let’s say you have an iPhone or a Galaxy or one of those systems. There are so many functions with all of those utilities and apps that you can do. You can be talking and checking your calendar and sending a message, all at the same time. [With this,] there is a usage profile in the mind of the designer. Also, because of the turnaround time being very fast, this chip is the integration of many design teams. Each design team has a certain power budget, they have a certain stack in terms of hardware, firmware, and then software. Then, at the system level, somebody is integrating all of those blocks coming from all of those teams and running verification at the system or SoC level, assuming a certain utilization profile.”

If a mistake takes place at that level, that’s a big problem, Kawa said. “Let’s say that while streaming a video, which is a high frequency channel, you’re not going to have other functionalities available. But by some mistake in terms of power management, or in terms of power domains, that doesn’t happen, and there is a contention and some other high frequency channels are still on. Then the power dissipation of the chip causes it to overheat and shut down. So the verification of a system, or of a system on a chip at the top level—at the level of whoever integrated the whole blocks with each block having its hardware, firmware, and software stack—is the tricky part.”

On the other hand, system users will look at this from a slightly broader perspective. “They will basically say, ‘I not only have the chip, but I need to deal with the whole system,’” Cadence’s Schirrmeister said. “In this light, they will talk about the energy consumption of the display, what the external memories are consuming, and that’s all tied to the system-level simulation. How many transactions do I get out of the memory, and what does that mean from a power consumption perspective? Over time, the important aspect is not only power. It’s also thermal. I want to be able to identify from a thermal perspective whether I actually run my processors continuously for a long enough period of time, and whether the device’s ability to get rid of the energy will limit it and again and ring the alarm bells and say, ‘You’re getting too hot. Slow down or shut down those processors over here.’”

At the system level, the level of integration is even more complex. “The parts are coming from different vendors, different design teams, different software stacks, and so on,” Kawa noted. “Whoever is integrating them is starting from a certain power profile and a certain usage profile, which is also very important. And they are making certain assumptions, because if you make a worst-case assumption on utilization in your verification vectors, you might find out there is no solution. Essentially, nothing works. The higher the system level in terms of what you are integrating in it, the more difficult it is to have a comprehensive set of tests that will ensure that you are covering the usage profiles within the power budget, and making sure there are no contentions between various functionalities that would cause runaway power.”

Big changes
Given the huge impact of maintaining the power profile, it’s not a surprise that major changes are occurring.

“We are experiencing an industry shift on how to do power profiling, power exploration, and power analysis,” said Jean-Marie Brunet, director of marketing for the emulation division at Mentor, a Siemens business. “This is a move from traditional specifications. Within large companies that do semiconductor devices, there is the product marketing team, and the engineering organization. Usually the interaction between the product marketing team and the engineering organization is the spec. And over the years they were saying, ‘Performance, size, speed, and oh, by the way, power.’ The problem is that their end customers are not buying chips or making decisions of semiconductor selection with a spec anymore. About 5 to 10 years ago the concept of a benchmark reference really exploded. This is very new and up front for mobile multimedia, a little bit less for others. For mobile multimedia, what’s very interesting is particularly for power and functional behavior — but particularly for power — they no longer talk about a testbench. They no longer talk about test spec where the testbench is matching the spec. They are asking, ‘If you ran AnTuTu 5.7.1, did you run the GPU graphics benchmark GFXBench 4.0?’ Those applications are running very specific functions of your semiconductor in very specific modes that are very close to the real-life application.”

To get the power profile from this requires a system that has multiple features. Today, there are three options: simulation, emulation, and FPGA prototyping. Considering the number of frames required to run these types of benchmarks, simulation is eliminated because it runs too slow. And while FPGA prototyping might run fast, it doesn’t go deep enough into the design.

As such, Brunet stressed that emulation is the appropriate platform because RTL or an abstract view of a design is being run, “and you are going all the way to a complete software stack. Those GFXBenches are software stacks. We’re talking about a lot of cycles here, and you need something that has the speed and power to deal with full chips. When you do something like a mobile multimedia chip like Snapdragon from Qualcomm, this is a very large chip.” To get the power profile, the design is on the emulator, the operating system is booted, the application — the benchmark — is run. When the design is being run on the emulator and the application is invoked, all of the nets are traced for the switching activity, which results in an activity plot waveform.

While the top 10 semiconductor companies may be embracing this approach, others are still using the time-tested spreadsheet as the power profile.

The spreadsheet would include various corners, such as what corners to simulate, what corners to focus on. “It comes down to knowledge and experience, because the number of corners that we characterize to has increased tremendously,” Kawa said. “In the past, we used to go with ‘fast-fast,’ ‘slow-slow,’ and ‘typical,’ but now there is worst-case capacitance, worst-case resistance, and so on. And some of those corners cannot happen at the same time.”

This means if characterization is done across the board at worst case on everything, it kills the yield, and there isn’t a solution to this problem, he explained. “On the other hand, if you don’t do all the guardbands needed for electromigration, for self-heat, you might have the wrong speed, or you could have electromigration problems in a serious way. This translates to failures in the field, so the power profile is going to be literally a spreadsheet covering for each corner. What is the power density? What is the power dissipation? What are the pass/no pass of certain electromigration limits?”

While there still may be a number of ways to create and verify the power profile, making sure it is there—and thorough—is critical.