Accelerating Development For LP

Are low power methodologies progressing fast enough? The industry is split and struggling for answers.


Power is a limiting factor in all devices these days, and while most of the industry has seen this coming for several process nodes and a succession of mobile devices with limited battery life, the power problem remains a work in progress.

No matter how much progress is made—and there has been plenty of work done in the areas of multiple power domains, dark silicon, dynamic voltage and frequency scaling, better materials and process improvements—it will never be enough. There are always more features to add, more leakage issues to contend with, dynamic as well as static, and more energy that needs to be saved for everything from lower power bills to better gas mileage in cars.

“Certain problems are very unsolved, such as low power, Randy Smith, vice president of marketing at Sonics. “We are just not progressing fast enough and we will not get what we need through process changes. Studies have shown that most of the power savings come from the architecture and yet we don’t have a standardized way to work on it.”

Such a statement has many parts and, as expected, people agree with certain parts but not others. Few have questioned if development has been fast enough, and it might be natural to always want more. What is clear is that relying on back-end improvements is no longer enough and we have to broaden the approaches used for power optimization. “Low power is not a solved problem but it is solvable if approached properly,” says Prasad Subramaniam, vice president of design technology at eSilicon. “Implementation of power management in an IC requires a multifaceted solution spanning technology, IP, design techniques and methodology.”

Continued developments at the back end
A lot of work has been done on the process side to reduce power. Some of it was a by-product of shrinking dimensions, but a lot of it was an intentional concentration on power reduction. “Process technology for low power has come a long way,” points out Vinod Viswanath, director of research and development for Real Intent. “In the early days, a low power process typically meant a 20% hit in performance. This is no longer acceptable, and process technology has improved to offer several standard cell choices with different threshold voltages for various performance, power and density tradeoffs.”

“I would agree that CMOS planar technologies in terms of shrinkage have hit their limit with leakage power,” admits Mary Ann White, director of product marketing for the Galaxy Design Platform of Synopsys, “but even then the ULP versions of the more established process nodes, such as for 55nm and 40nm, show that it’s still possible to introduce some savings.”

How much can we rely on processes technologies? “It is not something that will be solved by just making an assumption that the process will solve it,” says Paul Trayner, architect for the RTL power product at Ansys. “FinFETs do reduce static power and so only dynamic power is left, but there are many more problems to be solved related to power reduction.”

Even with advances in the fabrication technology, the tools may still not be providing accurate data. “The power and leakage predicted by FastSPICE may not match with silicon data once the chip is back from the foundry,” claims Bruce McGaughy, CTO and senior vice president of engineering for ProPlus Design Solutions. “In some cases, the estimation is off by more than 40%. Furthermore, traceability of currents is not possible in some cases because Kirchoff’s current law (KCL) is usually broken by the FastSPICE algorithms.”

Additional improvements will be made in the process technologies, but it still provides a small percentage of the possible gains. “Despite all these advances, process technology will not deliver all the gains needed in an optimal low power device,” says Viswanath.”

Figure 1: Where the biggest savings come from.

RTL Optimizations
A lot of time and effort is spent on power optimization at the register transfer level (RTL) today, although it lacks rigor. “With necessity comes innovation, which generally means that companies are willing to adapt their design methodology with advanced power technologies to achieve more savings,” says Synopsys’ White. “They tend to find the right ‘recipe’ for achieving those savings.” From their 2014 Global User Survey, they found that respondents tended to simplify and use fewer power domains, while usage of just one to three different voltage domains grew by 1.5X. However, the number of power gate domains also increased by 1.5X with the median number of shutdown regions now more than 5.

Ansys’ Trayner also sees an ad-hoc approach being used. “There is no standard in the way that there was for RTL design, but there are techniques that are standardized such as clock gating. This is a standard approach to reducing dynamic power and while there is no standard attached to it, everyone knows how to do it.”

Figure 2: Latency associated with various power saving methods.

Power analysis that helps a designer reduced power has to be consistent in its accuracy so that if the designer decides to implement clock gating, the impact of adding the logic is known. “You cannot implement clock gating randomly because you need to know if adding that extra logic will actually increase your power rather than reduce it,” points out Trayner. “Accurate and consistency are equally important.”

Trayner notes that the initial approach to power reduction is based upon idle vectors. “You start by designing a block using functional vectors. The problem here is that they do not exercise the device for peak power or idle power. You will not get significant power reduction by trying to optimize peak power. You really need to find out the power during idle periods.”

Trayner also points out the dangers associated with unintelligent application of clock gating. “If you implement all possible clock gating, you end up with a complete mess at the physical level when trying to route the clocks.” He explains that you might find 1,000 or 2,000 opportunities to do clock gating, but after power analysis you might find that 10% or 20% of those opportunities give you 80% or 90% of the power savings and the rest produce little power savings but negatively impact clock tree synthesis. “There are tradeoffs and this makes it important that you have consistent power analysis.”

“Once you have RTL, we are doing well and we are squeezing power at every step of the way,” says Krishna Balachandran, product management director at Cadence. “The industry has been doing that and the flows have become very mature.”

Above RTL
As soon as the discussion gets above RTL, there are many attempts to save power, but few tools to help with analysis. Javier DeLaCruz, senior director for product strategy at eSilicon, lists several techniques his company is using, which cover a wide range. “In the architectural area, we have found success using static voltage scaling, customized low-power memory and multiple independent power domains, to name a few. The backside biasing offered by FD-SOI technology has also been effective. Leveraging new interfaces and the associated IP, such as Wide I/O, HBM and EHMB will help.”

Micro-architectural optimizations do come with some tool assistance from High Level Synthesis, but even here there is disagreement about what is important and how to provide the necessary tools and analysis. “We have tools that enable people to do clock gating but there are so many architectural decisions that are left on the table today,” says Mark Milligan, vice president of marketing for Calypto. “This is an opportunity.”

Many of the higher-level opportunities involve software and this is where things start to get murky. “The key reason for this is that an optimal and holistic power management solution cannot be done in hardware or software alone,” says Viswanath. There is a strong need for a synergistic bottom line that encompasses RTL, system-level, OS-level, compiler-level, and application-level specification of power intent. We need all levels of abstraction of the design to be able to communicate their power intent to get the most optimized solution.”

There are several standards groups working on this problem such as IEEE 1801, IEEE P2415 and IEEE P2416, but there are other dimensions to the problem that need incorporation as well. “You need the composite, board, package, chip all in one uniform view, in order to do a complete analysis and to identify the thermal hotspots,” points out Balachandran.

Trayner agrees saying that the disciplines associated with developing products have been fragmented. “Not many people overlap or interact with each other so that the person defining the shell of the device has to work with whatever heat is being generated by the electronics. Inevitably it is necessary for all of these disciplines to interact with each other so that heat, in terms of where dynamic power is being consumed, gets communicated down to the physical level, to the guys doing power grid integrity design.” Trayner is ready to provide many more examples of failures in the process today. “It needs to become a process that encompasses the whole product design flow.”

“Running the software enables you to get realistic power profiles earlier,” says Balachandran, pointing out where Emulation is playing an important role. “Then you can identify the blocks in the design that may cause problems and you can optimize those or make changes but you still need to have all of the downstream tools to make that a reality.”

Lauro Rizzatti, a verification consultant, points out some complications with this strategy. “There are two sides to dynamic power estimation—average power consumption and peak power consumption. They are not closely correlated, and in fact they help chip designers make different critical choices. Evaluating average power consumption helps to establish the best die size, select the proper package, choose the battery size and calculate the battery life. Assessing peak power consumption helps to determine chip reliability, size the power rails for peak power loads, measure performance and evaluate cooling options.”

Rizatti also talks about some of the issues related to standards and emulation. “The industry standard is a switching activity interchange format (SAIF) file for measuring and estimating power consumption by tracking switching activities. While it seems practical, it does not keep track of when the time or cycle activities occur—just the total amount of switching. It’s a gross estimate for a granular problem good for appraising average power consumption.”

“The other approach is to keep track of when the time or cycle activities occur in the industry-standard Fast Signal Database (FSDB),” continues Rizatti. “FSDB can track switching activities, determine when peak power consumption occurs and when there is a problem. Unfortunately, the FSDB files are huge and take a massive amount of time to generate. Even worse, it takes up to a week to read the analysis.”

Software is clearly part of the key to managing power. “But due to the lack of hardware/software cooperation in power management,” says Viswanath, “the platform as a whole cannot anticipate power requirements of the application ahead of time and, instead, has to perform power management reactively.”

Trayner sees that different people will have different ultimate goals. “Although activity is important, the metric is really energy for a given function. Some designers may choose to perform activity as quickly as possible with a high clock rate and then go into an idle state.” He points out that while the average power during that period is high, the overall energy may be less because you can gate during the inactive period.

It comes back to the lack of information that the software people have available to them in order to know when to turn hardware off. Viswanath points out that software optimizations have typically been conservative. “Fine-grained dynamic power management along the software stack is where the bulk of the new-age power gains will come from.”

Sonics is one company actively pursuing fine grainer optimizations with their recently announced ICE-Grain power architecture. This has a distributed control architecture that can be driven by either hardware or software events. They believe that by relieving the apps processor from performing power domain control, dedicated hardware will enable much faster response times that enable more optimization to become possible.

Where are we today? “Back in the early days of test, people would argue about who had the best fault simulator,” says Milligan. “That was not really the issue. It told you the bad news but not what to do about it. When design for test (DFT) tools came along the test problem was solved. Power will be much the same way. Today it is about who has the best RTL power analysis tools, but this is not really the answer. They don’t tell you anything that really helps. We need to get to the point where designers can look at higher-level decisions. Power analysis is necessary and helpful but only part of the issue.”

As the power problem becomes more difficult to solve using the techniques we have today, people will have to start looking at new approaches. Power is not a solved problem and there is lot more to do.