Low Power-High Performance

Power Limits Of EDA

Tools aid with power reduction, but they can only tackle small savings in a locality. To do more would require a new role for the EDA industry.

October 13th, 2016 - By: Brian Bailey

Power has become a major gating factor in semiconductor design. It is now the third factor in design optimization, along with performance, and is almost becoming more important than area.

But there are limits to the amount of help that EDA can provide with power optimization. Power is not just an optimization problem. It is a design problem, and EDA has never been much help with design. The value of EDA comes from the automation and optimization of implementing a design once it has been designed. For power, that could be too late.

“Anyone building big chips will hit thermal walls just due to leakage,” says Drew Wingard, chief technology officer at Sonics. “They have to do more aggressive things than they have done in the past. Smaller chips that are battery-powered have to do it for other reasons because they want ever smaller form factors with lighter batteries to support those form factors. So it is inevitable. Intel ran into the power wall 10 or more years ago, and we all said we would focus on power. But we didn’t really do it. The honeymoon is over.”

The tradeoff today is really between performance and power. “Due to the generated heat, the number of transistors and the frequency cannot both be scaled at the same time,” says Preeti Gupta, director for RTL product management at Ansys. “Frequency has been limited to several gigahertz for a few years now, although multi-core architectures and 3D integration are design trends that have helped increase overall throughput. Power reduction is a key driver for handheld and wall-powered applications. Battery life is key for mobile applications and energy, and cooling costs are of paramount importance for data centers.”

Luke Lang, engineering director for low power products at Cadence, adds that “with shrinking geometries, even at 28nm, you are stuffing a lot of transistors onto a chip. The area is not so much of a concern these days.”

EDA has been attempting to reduce wasted power. But how effective has it been? “Industry-wide, there is a consensus that once you have written the RTL, 80% of the power is locked up,” says Lang. “So no matter what else you do, you are playing with the 20%.”

Is there any chance to expand beyond 20%? “Circuits are doing work that looks useful to them,” says Wingard. “The technology behind sequential clock gating is to look one state deeper into the state space. If the value computed in this cycle is thrown away in the next cycle, then it was not necessary to compute it. But we also need to look at the state space much further away. It is unlikely that EDA tools will ever get to the point where they can do this for the overall functionality of the device.”

Lang contends that having tools that recognize that is a very difficult problem. “Designers need to spend more time looking at power efficient architectures and build it into the RTL. You cannot put the burden onto the software to recognize some of these opportunities. What can the designers do to target the 80%?”

Alan Gibbons, power architect at Synopsys says that “we can define wasted power as either a) power consumed while not doing useful work or b) too much power consumed unnecessarily while doing useful work.”

Where EDA can help
One area that EDA can and does a good job helping is within datapaths. “Within datapath components we are likely to see wasted power from both glitch and switching activity used to generate a datapath result that is not required,” says Gibbons. “By optimizing the implementation of datapath components we can create more balanced structures that exhibit fewer glitches and hence reduce wasted power. In addition, by implementing datapath gating we can prevent activation of a datapath component when its output is not required thereby providing significant savings in both dynamic and static power.”

Lang agrees with problems associated with glitches. “It is difficult to get glitch information back into the synthesis tools to help reduce it. We are talking about significant amounts of power. But even if you can demonstrate that power is consumed by glitches, what can you do about it? The lack of ability to automatically reduce this power means that few people bother to analyze it.”

And it is not just during normal operation that EDA can help reduce power. “Functionally you may only have a 10% to 15% activity rate in a design, but in test you could approach 50%,” explains Lang. “You can analyze the registers in scan shifting and find which cause a lot of activity. When you look at which registers have the greatest cone of logic that could cause a lot of switching, you can gate off that register output so that you keep the logic quiet. You can only do that when it is not on the critical path. Otherwise it would upset timing.”

Power-reduction techniques
Dual-edged flip flops are another area gaining interest. These can be triggered on both the rising and falling edges, meaning the clock frequency can be reduced by a factor of two. That can save significant amounts of power in the clock tree.

Ansys’ Gupta adds a few steps that can be used to identify power wastage. “Power reduction requires a holistic approach with different techniques that are effective at different levels of abstraction,” she says. “One such analysis technique is to look at the cumulative activity of nets per design hierarchy. A block that is supposed to be active only during data transmission but is also on during data reception is a power ‘bug.’ This may not get exposed in functional simulations but will be exposed during such design activity analysis. Activity analysis runs fast. RTL power tools also provide hooks for designers to create their own rules for defining when and what activity is redundant for different modes of operation.”

There is a class of power reduction techniques that are driven by analysis, but it is the designer that needs to make the changes. Gupta provides one example. “Memories consume significant power. A rather common activity bug is when a memory defaults to the read mode when not writing, even when data is not needed. RTL techniques easily identify such redundant activity cycles.”

Wingard points to another power savings option with memory architectures. “Systems that are constrained by the path to external memory often have processing blocks that are decoupled from those memories using FIFOs. When the processing block is done with the work unit and has committed the results into the FIFO, it can be shut down while the FIFO takes on the task of getting the rest of the data back to the memory. It doesn’t make sense to wake up the processing unit until the input FIFO has accumulated enough data that he can run at full speed.”

Adds Gupta: “Focusing on the idle mode of operation has also emerged as a key methodology for redundant activity detection. Any and all activity can be targeted as redundant, in contrast to an active vector where more design knowledge is needed to qualify ‘useful’ activity from ‘redundant’.”

How to find power bugs
Finding the right vectors to identify power bugs can be a challenge.

“We have to fully understand the context in which the design is operating in order to determine if we are doing useful work – i.e. analysis of the design while it is operating under a full or representative software load,” says Synopsys’ Gibbons. “This means performing software- (or scenario)-driven power analysis where we can examine both the power consumption itself as well as the power state space for the design. How the hardware is being used absolutely determines how much power it consumes. So scenario-driven optimization of the power state space allows us to ensure that at any point in time during the scenario, only the power states necessary to provide functionality are active and the other power states are disabled.”

Where do those vectors come from? “When I talk about the use cases for estimating and measuring power, most people take the performance use cases and add 10% or 20% more,” says Wingard. “There is a huge overlap between performance optimization and what they do for power characterization. It can tell me if my power network is robust and if my packaging is capable of pulling away enough heat.”

But that may not be good enough. “Design teams competing on power are now investing more in writing vectors for power that exercise the right modes of operation for targeted power reduction,” says Gupta. “For each of these modes, various methods are used to uncover wasted activity.”

Lang agrees. “The verification folks write testbenches to verify that the chip works,” he says. “They are looking to find bugs in the design. Those activities do not reflect reality. It is only when the system folks get involved where they are putting the firmware and some software together and co-simulating them. With a smartphone, people boot up the chip in an emulator and simulate calls. That is where you get realistic power. With emulation we can do dynamic power analysis, where you create a profile of dynamic toggle activity over time. Where you have the peak toggle is probably where you have peak power. Then you can break it down and go after the blocks with the biggest power draw. This helps you focus.”

Where EDA struggles
One reason why EDA struggles with power is because it is a system issue. It involves everything from the fabrication technology to the SoC, and beyond to the firmware, OS and some software layers. In most cases the decisions are not simple and include compromise. Lang describes a hypothetical design tradeoff that ARM engineers may have made when they introduced big.LITTLE. “If you use an application that requires a lot of horsepower, you would turn on the big processor, and that has a lot of speed and capability. If you are doing something simpler, such as listening to music, it can turn off the big processor and have the small processor running.”

“They could have just left the big processor and implemented (DVFS),” continues Lang. “Here you slow down the clock, lower the voltage and run the big processor at lower power. But this still consumes too much power. Putting another processor in is a better solution. Consider with DVFS, you need a voltage regulator and what area and power does it consume? With big.LITTLE they have increased area in order to save power.”

Lang takes the tradeoffs one step further. “In addition, people doing DVFS find that the voltage regulator has certain efficiency bands. It is most efficient when supplying a certain current. If you power down circuitry, even though it requires less current, say by 90%, the regulator will not reduce that much. It may only go down by 20%. So while the logic is in deep sleep mode, the regulator is still consuming power. In some cases, I have seen people put on a small regulator and a large regulator and switch between them. This again is giving up area to save power.”

Gibbons takes the discussion up another level. “Poorly designed software that arbitrarily wakes the processor for housekeeping type functions should be identified and fixed. Software that does not take full advantage of the shutdown and DVFS hardware features will yield an energy-inefficient platform with potentially considerable wasted power.”

Gibbons suggest that “examination of the power state space while the design is running under a software load is best attempted very early in the design process using system-level design techniques (like power aware virtual prototyping), where we can use abstract models of both the design and the power characteristics of the component IP. Running at higher levels of abstraction allows us to hide unnecessary complexity and enable near real-time performance during simulation. With this type of environment, we can very quickly assess and optimize the power state space by making the necessary changes to the software, the system power management and the hardware architecture of the design itself.”

Wingard would like to see a better power architecture be defined at the hardware level. “Hardware events are one way to do this. If there is something happening in one place that knows something isn’t useful, then we can translate that into power control decisions. Because we can do this so fast, we can recover the circuit into a functional state in a very short period of time compared to software controlled systems that operate 500X slower. “

“The first thing is to agree on a set of interfaces for communicating power information,” continues Wingard. “The single wire that says if I am active or idle is an incredible valuable indicator for the rest of the system. That could trigger coarse level clock gating. You also need interfaces that are perhaps more detailed than this and could provide information about frequency needs. Some have been defined. ARM has the p-channel that has some multi-bit signaling information about power states.”

Conclusions
So how important is power? Has it become the most important factor in design? Do we need more standards?

“EDA came up with UPF and that is an excellent start,” says Wingard, “but it really just helps with the lowest layers of figuring out how to do this. It does not tell us anything about protocol or how to decide to make power state transitions and what order they happen in. We can work on trying to standardize that, but we have to be careful that we don’t do it too soon because that never works. Sophisticated companies probably have ways that they are doing it today, but we are still in the learning phase.”

Adds Gibbons: “When fine-tuning the performance of a design, we typically spend a great deal of effort extracting those last few pico-seconds to meet a pre-determined performance target and getting that last ounce of performance out of the design. This typically costs us a great deal in power consumption.”

Related Stories
Tech Talk: Power Reduction
Why getting granular about energy can yield huge savings and how to utilize idle time.
How Cache Coherency Impacts Power, Performance
Part 1: A look at the impact of communication across multiple processors on an SoC and how to to make that more efficient.
Analyzing The Integrity Of Power
Making sure the power grid is strong enough to sustain the power delivery.
Reaching The Power Budget
Why power is still a problem, how it will get worse, and what can be done about it.
SoC Power Grid Challenges
How efficient is the power delivery network of an SoC, and how much are they overdesigning to avoid a multitude of problems?
Implementation Limits Power Optimization
Why dynamic power, static leakage and thermal issues need to be dealt with throughout the design process.
Designing SoC Power Networks
With no tools available to ensure an optimal power delivery network, the industry turns to heuristics and industry advice.

Brian Bailey

(all posts)
Brian Bailey is Technology Editor/EDA for Semiconductor Engineering.

3 comments

Kev says:

October 16, 2016 at 11:37 am

Really EDA needs to ditch RTL and move to asynchronous design and implementation approaches. That’s doubly true in high-variability Silicon where you have to add a lot of slack to get yield.

Given the big EDA companies’ revenue dependence on the dysfunctional RTL flow, I can’t see that happening anytime soon.

Kev says:

October 17, 2016 at 11:40 pm

To really get the power levels down you need to ditch RTL and go for asynchronous logic. You also want to do that when you are dealing with highly variable Silicon (sub 28nm) so you don’t have to design to the statistical slowest corners.

Unfortunately the current tools can’t even handle DVFS or back-biasing properly, and big EDA companies aren’t keen to change anything. UPF is just a distraction.

Anand says:

October 26, 2016 at 8:52 pm

Absolutely correct that power is not an optimization problem, but a design problem. But the entire design is now an optimization problem. It is a balancing act with area, timing, power and schedule. Kev’s point about asynchronous is relevant from this perspective. The bottomline for design success is a strict discipline in the design methodology, understanding power, area and timing as early as possible and addressing them throughout the flow and not just specific steps.

Power Limits Of EDA

Brian Bailey

3 comments

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

Development Flows For Chiplets

New Data Center Protocols Tackle AI

Chiplet Tradeoffs And Limitations

Implementing AI Activation Functions

Die-to-die Interconnect Standards In Flux

The Best DRAMs For Artificial Intelligence

Future-proofing AI Models

Sponsors

Recent Comments

About

Navigation

Connect With Us

Power Limits Of EDA

Brian Bailey

3 comments

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

Development Flows For Chiplets

New Data Center Protocols Tackle AI

Chiplet Tradeoffs And Limitations

Implementing AI Activation Functions

Die-to-die Interconnect Standards In Flux

The Best DRAMs For Artificial Intelligence

Future-proofing AI Models

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored