Experts At The Table: Low-Power Management And Verification

First of two parts: The problems, the myriad solutions, and the reality of getting a complex chip to tapeout.

popularity

By Ed Sperling

Low-Power Engineering moderated a panel featuring Bhanu Kapoor, president of Mimasic; John Goodenough, director of design technology at ARM; and Prapanna Tiwari, CAE manager at Synopsys. What follows are excerpts of their presentations, as well as the question-and-answer exchange that followed.

Bhanu Kapoor: There are two types of power you need to consider: Dynamic power, which is consumed because you are doing some useful activity, and leakage power, which gets consumed whether you’re doing something or not.

The dynamic power has dependence on switching activity, the frequency, the capacitance and the supply voltage. There are two components of leakage—sub-threshold and gate tunneling. Gate-tunneling is addressed by advances in process technology such as metal gates. Sub-threshold leakage grows exponentially with the decrease in threshold voltage. At 90nm it was significant, at 65nm it was equal to the dynamic power, and it grows from there.

If you look at the typical smart phone, it’s the same system-on-chip that is running different applications. These different modes of operation have different performance requirements. You can use different voltages to achieve those different levels of performance.

A typical power-managed SoC includes a power-management IC that provides different cores. One core can be a processor. And if it’s an ARM Cortex A9, there is power management in that core, as well. A second core might be for mixed signal, which potentially could require higher performance. And then this power controller, which is on all the time.

All of these power techniques have an implication on verification.

Slide5

If you look at standby leakage, one of techniques is power gating, which is cutting off power to certain regions. If you don’t need portions of the chip to be on, you can completely shut it down. That is power gating. But that has an effect on performance, because turning on and off a function is a long event compared to a clock cycle. You need to sometimes retain the state so you can come up fairly quickly.

All of this has an effect on verification, as you can see from the following chart.

Slide6

If you can do gate-level simulation, that is very helpful. You need input/ouput and power connected and you need to have appropriately modified your library definitions so power is one of the variables. With domain isolation, once you shut down you have to make sure you are not sending floating values to other regions. You have to isolate it to proper ones and zeros, which you can check with isolation gates using a rule-based checker.

If you have power in your simulation, a lot of rule-based issues can be addressed right up front. Over the years, simulation was not power aware. In the future, simulation will take a more and more important role. Simulation, by default, will incorporate power.

John Goodenough: We are verifying systems on chip. They’re large. They have lots of power domains to match all the application workloads that are going to be demanded on those devices. They have processors and software. Some of the domains are being switched on and off to meet the energy profile. They have virtually every technique available. The state space you’re trying to validate is therefore exploding by an order of magnitude.

One of the things we think about a lot at ARM is that it’s not so much the techniques that you can apply. It’s how you’re going to scale them to tackle these problems. There are lots of clever ways to validate, but not all of them scale effectively into workflows and onto your infrastructure. Power verification is not just about logical verification.

If you get a chip like the one below, you can mess it up in a lot of different ways.

Slide3

Usually, you can fix it in software. But you also can mess up the connectivity between the power domains. If you get your level shifter or always-on buffer or retention register wired up wrong, it’s not going to work. It’s going to be D.O.A. on the bench. A lot of chip failures are being caused by the failure to verify the integrity of the power network.

That’s a non-standard piece of verification, particularly where that interacts with the logical function of the chip and you’re trying to measure the maximum in-rush current and the average in-rush current. If you’re switching domains on and off, what’s the power domain going to look like from an electrical perspective? Is turning one domain on and turning another domain off going to put the voltages on either side of a level shifter into a pathological state that will damage or degrade the transistors and the level-shifting buffer?

There are some very interesting cross-coverage issues between what is traditionally more of the analog verification space on the power network and the logical verification space. We need, when considering power simulation, to run abstracted analog simulations, SPICE-level simulations, and cross between the two.

Unfortunately, the explosion in power states is also increasing because of the number of software states or the number of field configuration states. From a verification standpoint, not only are you adding a multiplier due to power states, you also have things like a secure or non-secure state. Will they work when a chip is configured for a single package and pinout if it uses another package and pinout? There’s an explosion in these operating modes.

The other pressure we have is making sure you’re going to hit a given schedule. In looking at the power metrics it’s important to see how they can be applied into practical workflows and how you can feed performance metrics from wherever you are in the process back up into reporting and closure reporting. If you combine the need for those two, one of the things it leads to is enterprise scaling, both in terms of infrastructure to support the simulation and how you scale this across workgroups that are not co-located.

The other problem you face is that if you do all of the verification, you’re never going to get the chip out the door. You’ve got to have a verification plan and really narrow down which of the power modes are going to be pathological and which ones can be worked around in software. A major part of thes power verification is the integration of a VP of engineering risk-reduction play into a more mainstream verification practice.

We’ve come a long way in a lot of the techniques, but at the end of the day you have a block diagram that needs to be simulated. Today that block diagram consists of RTL and some way of describing the power network or the power intent and power state space of the design. You also have to support the verification IP and transactors. You need coverage across the RTL and the power descriptions. It’s not rocket science. It’s just a more complicated block diagram.

Slide4