Accounting For Power Earlier

Examining the power impact of design decisions much earlier can have a big effect on what else a chip can do.


Concerns about power usage in an SoC are far from new, but the adoption of power management techniques still varies by company and by project.

Leading semiconductor providers have made the necessary changes in tooling and methodology to account for power awareness because they have to, but the rest of the industry hasn’t necessarily caught up.

“The companies that are leaders are eking every little bit of added value that they can out of the process, out of their product, out of the software, and this kind of work is pervasive,” observed Rob Knoth, product management director in the Digital & Signoff Group at Cadence. “Where the big opportunities lie now are in the fast followers — the people who are sitting on a current design or a current node, and they may not be able to afford the move to the next node. Still, there is a huge opportunity to take a design that’s on the shelf, inject some new tooling in it that allows the designers to scrub the power more effectively, tape it out on the same node, and reduce the power by 25%. That’s money in the bank.”

This is a big incentive for companies that aren’t at the bleeding edge to adopt a new methodology. “They are forced into a corner. They’ve got shareholder pressures, and they have to try to find ways to improve year on year. If you can’t jump to the next hottest node, you have to look at what you can do with what you have. Doing a better job with power efficiency is a key way to deliver a more reliable part, a better and longer-lasting part for better battery life, or taking the tradeoff in power consumption and boosting the frequency. There are many different things that adopting a smarter power aware world will help,” Knoth said.

All design teams are concerned about power, but that concern isn’t necessarily shared by all the people within a design team. In particular, the big challenge is how to best educate RTL designers and get them to account for power earlier in what they are coding, noted Stuart Clubb, product marketing manager at Mentor, a Siemens Business. That is what is behind a recent uptick in low-power RTL design tools, which can automatically edit the RTL.

“That kind of alleviates the need for the engineers to really own power,” Clubb said. “But there are only so many edits that an automatic tool can make to any customer’s RTL. All of the architectural things need to be done manually — even something as simple as identifying a deep shift register and saying, ‘Hey, you know that you’ve got flop to flop to flop. Maybe you could make that a circular buffer to save power?’ Let’s say you have four registers, one after the other, and the final register feeds some logic. You don’t want to mess with the clk2q for that final one because that might mess up your timing. ‘But those back three—instead of making them a shift register, you could make them a circular buffer.’ You’re still using three registers, and just deciding which one gets written to on a round robin, and which one is de-multiplexed to provide the output. That still gives you a delay line. But now you’re doing much less register switching because you’re only writing to one flip flop and two are gated off.”

Some of the higher-end power-savvy companies do this in addition to traditional first- and second-level clock gating, but they have to go hunt for every optimization with a manual code review, Clubb said. Fortunately, there are tools available that actually tell engineers where those optimizations and wasted power are, so the engineer can then be tasked to fix the leaks. “They also get to know when there is nothing material left to fix. Zero is a good number, as you know you’ve done your job well. Manually hunting for savings that don’t exist is wasteful of engineering time and resources.”

At the same time, Knoth said it’s really cool to talk about the fact that the gigahertz war is dead, and now it’s all about performance per watt. “People have been saying that for a decade-plus. On one side, it’s a mindset change. It is a management change, but I feel like there has been a large gap in EDA tooling that only now is starting to get fixed. This is something we’ve plowed a tremendous amount of work into because the big gap here is that if you tell the RTL designers, ‘Go fix performance,’ it’s actually not that hard. It’s easy to get some rough static timing estimates of the speed of a design, and if they do some quick and dirty synthesis, run it through STA, and it doesn’t work, they can go re-pipeline it or something else. Those tools have been around for quite some time.”

But on the power side, it’s a different story. “You can still run it through a quick and dirty synthesis, no problem, but there’s a huge difference between timing and power, and that’s the functional activity of the circuit,” Knoth said. “That has a direct dependency on stimulus. You can make a circuit look like an ice cube or make it look like a hot griddle depending upon the stimulus you’re using on the same circuit. That’s been a big gap. For the longest time, people would use activity factors because it’s easy but again, ice cube to griddle, you can change the circuit so easily that you can have the RTL guys chasing ghosts. Or they could tell you they are done, the circuit is perfect, but you get it back and it’s red hot. That’s been the biggest limitation for people putting serious amounts of RTL work into this.”

The first wave involved a large number of RTL simulations, because functional verification is critical. But there was a poor correlation between annotating the results onto the accurate gate level netlist, where those simulations could be analyzed. Gate-level simulation, in contrast, is extremely slow, so that didn’t offer much relief.

“You can maybe get a small, super tiny window to annotate on, and then you got good accuracy,” said Knoth. “But again, you had to wait for RTL to get synthesized, placed and routed, and your gap is ginormous. The older RTL power estimators suffered from accuracy and flexibility/capacity issues, so the guys who had to fix the RTL for power couldn’t work with the P&R guys who have the accurate design data, or the emulation/simulation guys who have all of the functional stimulus. This is the problem we focused on solving.”

Fixing power issues
Part of the challenge here is understanding what works best at what point in the design flow, and what tools are required to identify the issues and make those changes.

Nhat Nguyen, senior director of architecture and design engineering at Rambus, said that for RTL designers to achieve accurate estimates for their RTL blocks they need to use mainstream EDA tools to assist with power and area estimates. “Synthesis, place and route tools use standard cell libraries that are well-characterized for area, power, and I/O timing. Therefore, RTL designers can achieve accurate estimates for their RTL blocks.”

RTL and higher levels of abstraction make it possible to have a big impact on power reduction. Implementation, meanwhile, is restrictive and has only a marginal impact on power consumption, said Arti Dwivedi, principal technical product manager at ANSYS (See Fig. 1 below).

Fig. 1: What has the biggest impact on power. Source: ANSYS

“In an RTL to GDSII flow, design decisions can be made only at the RTL,” Dwivedi said. “Once the RTL code is frozen, synthesis and physical design tools focus on design implementation for a given set of design constraints to automatically implement the design. As such, a low-power RTL design and analysis environment is essential for power-efficient SoC designs.”

Power reduction efforts clearly have a higher ROI when they are focused on higher levels of design abstraction, such as blocks, macros and microarchitectures. “For instance, if all the flops in a block are not switching, it’s more effective to insert a clock gate at the block level instead of inserting multiple leaf-level clock gates,” Dwivedi noted. “Similarly, analysis of control structures can help eliminate redundant switching in blocks or microarchitectures when they are not being observed. While all such reductions may not be supported through an automatic RTL rewrite methodology, as they require designer’s knowledge of the RTL, RTL power tools can certainly guide designers to identify and analyze such visualize reduction opportunities early in the design flow.”

It’s also important to note the IEEE 1801 United Power Format (UPF) is used to define a power specification of the design. “RTL designers can use it to verify the power consumption early at RTL and ensure that the low-power strategy is efficient,” she said. “RTL power analysis can identify additional blocks, which can be shut-off, enabling designers to revise the design’s power specification for optimum power efficiency.”

As RTL designers are also constrained by tight schedules, the focus often is on implementing the functionality. Here, an interactive power debug environment often goes a long way in exposing power hotspots. Power efficiency metrics such as clock gating and memory access efficiency can score RTL designs for different modes of operation and highlight power-inefficient lines of RTL code. In fact, power regressions also have emerged as a key methodology that assists the RTL designers in tracking power efficiency metrics, as well as keeping power in check as the design changes and ECOs are implemented.

At the same time, power-analysis-driven power reduction can help RTL designers focus on high-impact reductions, while minimizing the impact on other design parameters such as area, leakage and timing. “The RTL power methodology must account for physical effects such as clock tree modeling, wire capacitance, multi-threshold cell assignments, etc. in order to accurately predict power savings upfront and avoid iterations, Dwivedi explained.

In addition to average power, designers also are required to manage peak power consumption of the design. High peak power can cause the design to fail due to excessive voltage drop. RTL power tools provide the capacity, performance and coverage to identify critical peak power and di/dt cycles for real application scenarios, which can involve millions of cycles. Software engineers also can benefit from early peak power profiling, which can be used to analyze how software impacts hardware. This has long been considered one of the big knobs to turn in low power design, but software teams historically did not have good tools for how their code affects power budgets.

The human factor
But getting design teams to consider power earlier runs into another issue. Some engineers don’t have the time to focus on yet another issue because it’s hard enough to reach signoff with increasingly tight design schedules.

“Engineering team managers have a lot to worry about in terms of delivering functionality, cost-effective area, timing closure, verification coverage, and keeping to schedule,” said Mentor’s Clubb. “To keep an eye on power consumption and dedicate engineering time when they are not feeling the need from above is tough. Though it is far harder to fix power problems near the end of a project as a panicked tape-out looms, experience suggest that it is not uncommon for teams to drop power efforts as deadlines approach. Sometimes that’s okay if you’re close to the power budget. Sometimes that means far bigger problems for the chip down the road, such as design losses that in some cases turn into major stock-price-affecting news.”

So when RTL design work is in progress, a ‘little and often’ approach is probably the most effective, Clubb said. With automated tools checking the power quality of an RTL design as part of functional regression flow, engineers and management get feedback as to how much power they might be leaving on the table and how to action what is wasted.

“With such reporting, managers can be data-driven in how they encourage, incentivize and reward individual engineers in their team to aim for zero,” he said. “Zero—or near-zero—power left on the table is a great number to see, as it means engineering effort can be confidently focused on other parts of the design that have problems, rather than spending time looking for things that aren’t there.”

Related Stories
Lots Of Little Knobs For Power
A growing list of small changes will be required to keep power, heat, and noise under control at 10/7nm and beyond.
New Power Concerns At 10/7nm
Dynamic, thermal, packaging and electromagnetic effects grow, and so do the interactions between all of them.
Power Modeling And Analysis
Experts at the Table, part 3: Juggling accuracy and fidelity while making the problem solvable with finite compute resources and exciting developments for the future.


Kev says:

Gate (cell really) level simulation isn’t necessarily slow vs RTL and it can tell you a lot more about your design. There’s an argument that the UVM methodology is not worth the bother unless you are doing simulation at post P&R level with models that understand power – RTL can be tested by formal means, constrained-random is the kind of approach that finds manufacturing issues.

Ann Steffora Mutschler says:

That’s a very good point about gate level simulation, but don’t design teams end up hurrying through projects because everything gets squeezed. My take is that they simply run out of time in the schedule. Also a good point about UVM — it definitely seems best suited when it is part of a full methodology.

Kev says:

Designers spend a lot of time waiting for validation/verification – just the SystemVerilog recompile can be significant, before you get to running long sequences of random test and coverage checking. Cadence, Synopsys and Mentor would like to sell you $1M emulator boxes to speed that up, so aren’t terribly motivated to make the basic simulators go faster. However, the emulators don’t support SV directly, so the test-benches become the bottleneck if you do go that way (and it’s not a turn-key process).

There are ways to speed stuff up. I was recently doing C++ rather than SV and suggested using GP-GPUs for running verification, but there was just a reluctance to try anything new mid-project. At the moment there is a bunch of hardware becoming available for evaluating neural-networks fast for deep-learning, and that problem looks much the same as fast-SPICE or real-number circuit simulation.

Ann Steffora Mutschler says:

I’m curious, how difficult is it to try something new mid-project?

Kev says:

At Intel/Samsung (my experience), pretty much impossible.

Also, there are no methodology people on the design teams these days, so they just go with what the EDA guys sell them.

At Intel we were modeling with VHDL. Functional simulation for a major block got held up because it was accessed through a level-shifter that was not wired for power, and all signals went to X because it used MVL7 (same as Verilog 0/1/X/Z). I pointed out that VHDL doesn’t have built-in types and you could switch to a type that kept the unpowered info in a separate bit and the 1/0 data would pass through the level shifter intact for downstream verification. There may still be documents to that effect in Sharepoint at Intel, but I’m 100% confident all (logic cell) models are still MVL7.

So it’s more “never”, rather than just “mid-project”. Intel was also stuck on VHDL ’93 because they wouldn’t send anyone to VHDL committees to fix backward compatibility issues, not to mention stupid arguments about whether wires were digital or analog (the copper doesn’t care, neither should a simulator).

Leave a Reply

(Note: This name will be displayed publicly)