Finding the right abstraction for power analysis and optimization comes from tool integration.
Power-aware design is still a relatively new concern for many semiconductor products, and since inception it has changed several times and in different ways.
Initially people were concerned about peak power. Today, they care about peak, total energy, thermal and other effects. The industry has tried several abstractions ranging from system-level analysis, which promised to swamp implementation-level optimization, to diving down to the gate level with physical details providing accuracy. And today, the industry is centered around RTL power models and tools.
“Power has meant different things to different people over the years,” says Rob Knoth, product management director for Cadence Design Systems. “Part of that is a reflection of where the EDA tools are and what the tools can actually deliver. Another component is what people are being pushed to design. That has shifted from the frequency wars to power discussions, to looking now at energy rather than power.”
New implementation technologies are adding reasons why power has to be managed more than in the past. “Managing power consumption and power noise are huge concerns for designs at smaller nodes because they can directly cause design failures,” says Preeti Gupta, director of RTL product management at ANSYS. “Higher device capacitance, interconnect resistance, and current densities at 7nm/5nm underscore the importance of dynamic power and thermal management. Design and manufacturing costs for lower geometry nodes are mounting, and in order to manage cost and resources, design teams need to ensure first-silicon success. These considerations are mandating a shift in design methodology toward earlier analysis, which can influence better downstream decisions and catch design issues in a timely manner.”
Architectural power
Early analysis can lead to the largest gains. “Certainly, higher abstraction provides more analysis capabilities and is much faster,” says Mohammed Fahad, product specialist at Mentor, a Siemens Business. “Estimating power at a higher abstraction level becomes even more attractive to the user since it requires less preparation from a data readiness point of view.”
Early decisions determine the power and energy profile of a product. “Key choices such as how the design is partitioned into hardware versus software, the design architecture, and how the software controls the hardware have a high impact,” says Gupta. “Once the design architecture is locked in, logic synthesis and place-and-route can minimize the power required for that architecture.”
Knoth agrees, but with a caveat. “At the big architectural level, decisions such as the number of cores on the chip, or how you will manage distributed processing, can deal with considerable loss of accuracy. Certain architectural decisions can withstand gross inaccuracies, but the most productive architectural decisions still require the best accuracy to be available as early as possible. But with power, even more than with performance, you get a death by a thousand cuts that can really change what you thought were valid assumptions compared to the reality when it hits you.”
To understand what this means, consider an artificial intelligence (AI) application that contains a large number of MAC functions that account for a considerable portion of total power. “A large percentage of the power is actually glitch power,” says Kam Kittrell, product management group director for Cadence. “This is where a circuit switches multiple times within one clock because of race conditions between different data paths or within a single data path. That requires knowing the exact timing, and you need layout to get that accuracy. You could estimate that based on levels of logic and arrival times coming from static timing analysis conditions. So you can get fairly good estimates at the RT level. I can also see a SystemC model with an added power model that is feeding back about its consumption as operations are happening.”
This is far from settled, however. “While early transaction-level power analysis can enable significant architectural power exploration benefits, that methodology is still evolving for mainstream use and accuracy is a challenge.”
Considering stimulus
Early power analysis tools used statistical methods, but that proved to be too inaccurate. Realistic stimulus is required, but getting the right stimulus can be equally challenging.
“You can take any circuit and, depending upon the stimulus, you can make it look like an ice cube or a frying pan,” says Knoth. “There is a relationship between getting the right stimulus, the right activity information, and then being able to roll that up to the architectural level to make sure that the big decisions are actually right. There is a functional relationship that is not there with performance.”
Fig 1. Power issues considered in semiconductor design. Source: Cadence
How do you come up with a scenario that will create the worst power conditions? “First, you have to define what the worst type of power is,” says Kittrell. “Is it max power or Di/Dt? Is it max average power, or is it just average power? For optimization they often look at max average power and for killing the chip it could be that power is ramping too fast, so they are looking for the di/dt cases. But where do you find these is the question. The key criteria is getting the right stimulus for the design in order to do the optimization. How do I come up with realistic and probabilistic scenarios that could happen on the circuit when I am running an application?”
Finding the worst case can be difficult. “To get to the worst, you have to examine your stimuli using high-capacity emulation with good accurate power analysis and a frame-based architecture that enables you to search and find the peak slice quickly,” says Knoth. “Once you have the pathological vectors, we use intelligent analysis to say where you should focus your time to improve power most effectively.”
Kittrell adds a warning note. “How do you know you have the best power scenario? You really never do. Portable Stimulus (PSS) is excellent for SoC-type devices, especially multi-core devices that are cached because there are so many potential scenarios. How many cores are working at any one time? With PSS it is possible to do sweeps—power this on and then off, have eight cores running while I do a cache access, etc. There are lots of interactions between power and function, some of which were unexpected.”
When thermal is being considered, much longer simulations may be required. “Traditional methodologies of identifying appropriate activity modes focus on short-duration windows for power analysis, and they run the risk of missing power-critical events that may occur when the chip is exposed to its real activity,” says Gupta. “Having early visibility into power and thermal profiles of real-life applications, such as operating system (OS) boot up, or high definition video frames, can avoid costly power-related surprises late in the design process but they can take weeks to run. High-performance RTL power engines and emulator-power flows have now evolved that can generate an accurate per-cycle power profile for very long vectors, several orders of magnitude faster than traditional methods. This makes it possible to compute power of a high-definition video frame comprising tens of milliseconds of activity within hours, as well as to analyze power profiles for operating system boot-up comprising hundreds of milliseconds of data within a day.”
New packaging technologies are adding to these challenges. “The power aspects of 2.5D and 3D design is huge,” says Knoth. “You are putting these circuits into a thermal situation that they are not used to. There are new aggressors, from a thermal capacity, that may not have been modeled in the past. Being able to effectively look at this from a systems perspective all the way down to the underlying silicon components that are causing, or being hurt by, a thermal situation is critical. This is the front lines of power.”
But some of these problems may not actually require detailed analysis tools. “We developed a technology around 3D stacking that does thermal analysis,” says John Ferguson, marketing director for Calibre DRC at Mentor. “We got a lot of people interested and then nothing happened. The reason is because they use it, learn from it, and develop best practices for what kinds of configurations would cause problems, and then they develop design methodologies to avoid these issues and they no longer have to run the analysis. The design rules will have certain settings or requirements that will start to tell you that you can’t have these dies too close together, or you do not want to stack a CPU on top of a GPU. You need things that are switching not so hot and fast next to each other. These become part of a design rule manual.”
There may still be reasons to run that analysis, however. “Focusing on power-critical activity areas can improve productivity and coverage of transient power delivery network analysis and mitigate risks of design failure,” says Gupta. “RTL chip current profiles, based on real application activity, are enablers for early co-design of the chip, the package and the board. At the system level, power consumption can have a direct impact on the thermal performance. Understanding power profile throughout the duration of real application scenarios can determine areas of the design that are consuming the most power, and in turn causing thermal issues.”
RTL provides a firm footing
RTL power analysis provides a balance between accuracy and performance. It is also early enough in the design flow that serious architectural problems can be identified and corrected. “The tolerance limits are more relaxed at RT level compared to gate level,” says Fahad. “It is a tradeoff between accuracy and time. It is a lot faster to ‘estimate’ RTL power compared to gates, plus it is happening at the very early stage of the design where there is little or no physical information available. The designer is not really looking for an accurate measurement of the power dissipation, but is only concerned about the ball-park number or a rough distribution of power across the design. In other words, trying to see the trend of power and locating power hotspots in the design so that power-hungry part of the design could be considered for power reduction.”
Accuracy has improved, though. “Traditionally, the concept of RTL power analysis always sounded good, but the accuracy was not there,” says Knoth. “Now, we see that with a tighter integration between the RTL power analysis tools and the implementation and sign-off technologies, we are able to deliver very good accuracy at the RT level.”
The time savings can be considerable. “Compared to the several hours it takes to synthesize the design and to compute gate-level power, RTL power analysis can be completed within minutes,” says Gupta. “It is also much easier to simulate design activity at RTL for high coverage.”
Emulation also can extend the reach. “Emulation basically counts toggles happening within the design as you are running software,” explains Kittrell. “If you are looking for peak power or max average power, then you can run lots of software scenarios and look for the frames where there is a lot of activity. Then, once you have the scenario, you can clip it out and see what it is looking like at the gate level. That means you do a quick synthesis and we can get the .lib power information, the known parasitics and we can load what was in the registers and do a fast cycle-based simulation in order to get the most accurate toggles, not just the probabilistic numbers but accurate toggle counts through the datapath. If you are just using estimates of probabilistic toggles, you can be off by 10% or 15%. It is matter of capturing what is an interesting scenario, and then focusing it down and not trying to do the power analysis of the entire software runtime at the gate level.”
Meaningful analysis can be performed that directly relates to power saving. “Eliminating redundant switching is a key component of managing dynamic power,” says Gupta. “RTL power efficiency metrics can identify wasted toggles in the design. RTL-based power reduction techniques also enable activity-aware complex analyses for automated identification of changes for clocks, sequential logic, and glitch-prone datapath logic to address wasted activity. Working at a higher design abstraction level, RTL provides the capacity to analyze large designs so power that is wasted from the interactions between blocks can be isolated. This is not possible with implementation tools.”
Knoth agrees. “We do some deep analysis within the circuit and look at the functional activity to understand where there is wasted power. We define that as power that does not produce a useful outcome. Perhaps the reset is enabled, but the clock is running. In analysis, we can subtract that power from the actual power to provide the theoretical ideal power would be. That kind of analysis helps designers to focus their efforts to get the largest ROI.”
Tool integrations
Much of the improvement in power analysis has come through integration of different tools. “In the past, power analysis was more bifurcated because each team was using different power tools,” says Knoth. “The block implementation team may have been running gate-level tools. The power architect was running some higher-level analysis. Today, they are all running one tool that can speak the same language and can eat all of the right inputs and do the right analysis with respect to stimulus. This is allowing the teams to change their methodologies and to inject power earlier into their flows.”
That continuity also can lead to other tool possibilities. “Monitoring power data throughout the design process ensures that downstream design changes do not inadvertently affect its power performance,” says Gupta. “Power regression provides feedback on the effectiveness of various power reduction efforts and tracks power efficiencies across multiple operating modes. Being able to query the database and compare results as the RTL changes avoids unfortunate surprises caused by subsequent design changes. RTL power regressions provide timely feedback to the designer to fix power bugs versus later in the design flow when it becomes harder to relate power bugs to functionality.”
Fig 2. RTL power exploration across different bandwidth scenarios. Source: ANSYS
This is a significant improvement. “Now, all three of the legs of the stool can benefit from power,” says Knoth. “The power architect, the RTL designer and block implementation team all working together. That is where we are today. RTL power is incredibly useful, and the tools are there. People are actually leveraging it and making their designs better.
Leave a Reply