Verification implies comparison against an expected result, but the industry has yet to define how this works for power. How are power bugs found?
Functional verification continues to evolve, but power verification—a somewhat new concern—remains at levels of sophistication reminiscent of functional verification 30 years ago. When will power verification catch up and what must to happen to make it possible? These are questions that the industry is still grappling with, and not everyone believes they require answers.
Functional errors produce wrong results that can compromise operational correctness or even safety, while a power bug in most cases just results in an inefficiency. That’s not always the case, of course. There are power bugs that can lead to product failures or accelerate aging. And for some industries, inefficiencies quickly can add up to a device that does not perform in the field.
And while the mobile industry led the way in power analysis, it is no longer the industry that has the most to lose from excess power consumption.
Verification is the act of comparing two independently derived descriptions of the same thing in order to find out where they differ. Power verification requires a second model that defines what the power consumption should be for any given operation. So far, no such model exists.
There also is no single type of power bug. But there are some commonly observed effects:
• Functional errors created by power optimization or incorrectly defined power control circuitry;
• Failures caused by a current spike or rapid change of current;
• Failures over time caused by aging or excessive use of stored charge, and
• Excess activity that produces no usable result.
This list goes on, and different types of bugs have different levels of importance depending on the end application.
Rupert Baines, CEO for UltraSoC, describes a typical power dilemma. “Which elements of the system can be put to sleep without affecting performance? How do you know if it’s asleep? Much like the Heisenberg principle, in checking if something’s awake, you wake it up. The developer believes the system to be functioning properly, but when using embedded analysis IP, it becomes possible to see that blocks never went to sleep. Blocks often have their sleep constantly interrupted by too many system checks on whether they were asleep or awake.”
Optimization can create problems, too. “Clock gating is a perfect example of how power-related issues arise,” says Roger Sabbagh, vice president of applications engineering at Oski Technology. “Clock gating reduces the switching power by turning off clocks to specific registers when there is no update [in the register values or when the updated value is not going to be used by downstream logic. However, this can introduce a bug if there are corner-case conditions where the clock is turned off when it actually needs to be on for correct operation of the design. Sequential equivalency checking (SEC) can find those corner-case bugs by detecting differences between the behavior of the original design and the design with clock gating.”
In that example, a second model existed, but in most cases there is nothing to compare power behavior to, and thus other metrics have to be utilized. “Consider where power analysis started,” says Rob Knoth, product management director in the Digital & Signoff Group at Cadence. “You assumed that every net switched 50% of the time and then checked power. That was highly pessimistic, and today you would never be able to tape out a chip. An improvement was to assume a certain amount of switching on clock nets separate from data nets. It was better, but you could still make a circuit look cold or red hot depending upon how you changed those activity factors. It is not that useful. Power has become a tradeoff against lifetime, and you have to have accuracy in the design and verification flow prior to tapeout or you don’t have a product.”
Other metrics are evolving. “Consider clock gating efficiency,” says Preeti Gupta, director of product management at ANSYS. “Given an idle-mode vector or an active-mode vector, or a sustained worst-case power vector, you compute the clock gating efficiency of every register in the design. For a register you have the data signal, the clock signal and a control signal. I can look at the control signal and find out how many cycles it is shut off, but what happens when the clock is not shut off and the data is not changing? The report may show if you have enough logic built into the design to help synthesis recognize registers that should be clock gated. How often has certain control logic really shut off each block? Do I have a control and how good is it? Now we can talk about the quality of the control signal. Design teams are able to qualify and quantify wasted activity on a register or cone of logic basis.”
Power models
Attempts have been made to produce high-level power models, but so far, they have not been successful. Part of the problem is that to get accuracy, you need detail that only comes below RTL. But to do enough simulation to find power profiles over a longer period of time, you need much higher-level models. This becomes even more problematic if the thermal effects of power are also being looked at.
“If you are doing high-level synthesis and trying to get an early estimate of power, then that is one challenge,” says Jeff Miller, product manager for Tanner products within Mentor, a Siemens Business. “It is another thing to develop an IoT sensor where it will be asleep for an hour, then wake up for five seconds and it will take a measurement, it will do some computation to see if the measurement represents a significant change, and then decide if the event is significant to power up the RF transmitter and send the necessary data then power back down. It is a multi-level challenge where you are trying to optimize how much data you send. You are trying to optimize leakage, you have to optimize the CPU, and you want it to run as fast as possible and hurry up through the computation so that you can go back to sleep. All of these tradeoffs play together, and it is an interesting system-level challenge.”
This level of modeling is higher than is typical within hardware design. “If you are modeling the processor, there is a difference in a memory access going to main memory or gain to cache,” explains Kevin McDermott , vice president of marketing at Imperas. You can assign certain elements as being more costly than others. If you are talking about higher complexity routines, turning on a radio, or system tasks — they can be assigned cost values. Then if the software changes, you look at the frequency with which each of those costs are deployed. The art is not to say what the absolute result is, but the relative change. Does this software run in the same envelope of power, or does it push some parts of the system harder and increase the cost, or does it improve them?”
There are a few common requirements for which there seems to be industry agreement:
• Good scenarios that represent idle, typical or extreme activity;
• Keeping track of the power deltas as hardware changes so you are aware if a change significantly impacts power, and
• Experienced engineers who know where to look for power saving opportunities.
Future progress is likely to come by utilizing different techniques than have been used for functional verification in the past.
Power scenarios
Finding the needle in the haystack is more likely when you start with good, representative scenarios. “Testing the many combinations of power states and power domains requires navigating a sparse legal state space within a too-large-to-test possible state space,” says Larry Melling, product management director at Cadence. “This is one of the challenges that the Portable Stimulus Working Group was formed to address. The group does so by specifying test intent, relying on automation and randomization to create actual tests within the legal state space constraint, and to stress the design to verify that it will meet power requirements.”
While Portable Stimulus helps with the creation of the vectors, it cannot find the worst case scenarios without having a power model to help drive it.
The migration to Portable Stimulus will take time. “It used to be that people didn’t even look at power waveforms, they just wanted a magic number,” points out Gupta. “The paradigm shift was that people recognized that activity on the design has a strong role to play in power consumption. People still ask for vectorless power consumption and to be within 5%. How can you do that across a range of complex scenarios? The industry has a lot of work to do here, coming up with the right activity scenarios based on design applications.”
But there are true benefits to be had. “There’s always a temptation to design conservatively, which in some cases can mean over-specifying at every point,” says UltraSoc’s Baines. “For example, to ensure that there will be sufficient compute power, or enough memory, to accomplish a particular task, people tend to add too much. Availability of real-world analytical information helps in two ways. First it can help the designer to confidently implement a ‘leaner’ overall system, which generally will be inherently more power efficient. Second, it becomes possible to look at metrics like CPU/GPU utilization, and to identify scenarios in which it is possible to retain satisfactory system performance, while operating one or more blocks in a low-power mode.”
Power regression
Power bugs can creep in at every stage of the design process. ANSYS’ Gupta describes the process adopted by AMD, in which the power profile for blocks in a system were tracked. “A typical SoC may have 30 blocks and for each of those blocks, verification engineers write meaningful vectors, each of which represent differing load levels. They then run power analysis on each set of vectors and produces metrics that are actionable.”
Fig. 1: Power consumption plotted during design regression. Source AMD
Regression also must be done over the lifetime of the product. “Building a device and the initially shipped software is only the first stage,” asserts McDermott. “There is constant updating and refinement, and so you have to plan the regression testing, version control of patches and updates. As the program runs over multiple years, the hardware will evolve, as well, and there will be a spectrum of slightly evolving hardware, different versions of software on different matchups. There also could be some local adaptation, possibly through AI, that will make each one slightly different in terms of characteristics, and then you have to manage the software patch.”
Rather than setting up a lab with each hardware variant, McDermott believes there is a role here for software models, often called digital twins. “Once you have the chip and board, some people think they don’t need simulation anymore. But then you need to keep all of the flavors of hardware and software combinations. You have to maintain and configure all of that hardware. It is much easier to do that in software, and you can build an array of combinations and look at the critical cases that cause problems.”
No substitute for experience
Once you have scenarios that define important use cases of the device, it becomes easier to analyze the system to locate areas in which power could be saved. “Activity profiles are detected at run-time and can be compared to stored signatures to estimate power consumption,” says Srikanth Rengarajan, vice president of products and business development for Austemper Design Systems. “Units of activity can then be traded by a control algorithm between the two blocks such that idle cycles are minimized. An energy-constrained device, such as a smartphone, typically takes a system view of power consumption. Use cases are rigorously analyzed offline to provide consumption estimates, and run-time power optimization is done based on fingerprinting the current use case.”
Until a tool can directly point to a power bug, there will be a role for engineers. “We provide the ability to collect statistics about what is happening, and you can use those to close the loop in terms of power control,” says Benoit de Lescure, director of application engineering for ArterisIP. “You can watch those statistics and decide that a block is not working hard, so you can reduce its voltage. “
There are some simple things that companies can do. “For early analysis, people were not familiar with ways to find power bugs,” says ANSYS’ Gupta. “We added a simple graphical display that broke the design into small chunks and put a color on them – red, yellow or green, based on power consumption. That was enough for some companies to find power bugs. A designer could look at that and see things that they did not expect. The human brain can use intuition to find problems.”
But when problems are identified, few tools exist to help with the fix. “The debug challenge is different than typical functional failures,” says Cadence’s Melling. “Failure to meet the power budget requires a broader context view of the power state across the design in order to identify domains either where power can be reduced or where power can be eliminated without loss of functionality. Implementing the ‘fix’ is also different, potentially requiring changes to even the legal state space definition in order to correct the problem, or sacrificing performance to meet power budget are two examples of ‘fixes’ that might be implemented.”
Rethinking the future
Companies are still looking at ways to help with the problem of power verification. “There are many opportunities to create power tools using real world scenarios on entire SoCs and watching the block to block interaction,” says Gupta. “While metrics are evolving, we still need more work and help in creating the vector sets.”
Bigger changes are also required. “The world of point tool power analysis isn’t helpful anymore,” says Knoth. “Power analysis needs to be integrated into the verification flow, into the implementation flow, into your signoff flow. You cannot count on the human to go back and make the necessary changes. You want help and automation. Can power analysis guide synthesis? Can power analysis guide implementation? Can it be done accurately across 20 or 30 different operating modes that the device has? It has to be done in a reasonable time.”
Power verification is an area in which machine learning may have application. Designs tend to be variants of previous designs, so looking for patterns and changes to those patterns created by evolving hardware may find power savings that otherwise might be missed. Will we ever get to true power verification? At this stage it would appear to be unlikely, unless a way is discovered to predict dynamic power a lot quicker and more accurately than is possible today.
Related Stories
Lots Of Little Knobs For Power
A growing list of small changes will be required to keep power, heat, and noise under control at 10/7nm and beyond.
Power Modeling And Analysis
Experts at the Table, part 3: Juggling accuracy and fidelity while making the problem solvable with finite compute resources and exciting developments for the future.
Verification Unification
Experts at the Table, part 3: Power, safety and security—and how Portable Stimulus and formal can help with all of these.
I understand the usage model impact on power, as stated various methods of estimating include Excel based spreadsheets have been used. I was surprised that the design’s PDN target impedance was not discussed. If an implemented PDN is significantly higher than Zt, any frequency will consume more power. In addition, if various switching causes noise on the PDN, performance and/or functionality can be affected. Verifying the “logical” aspect power is just a portion of the issue.