Watching And Waiting For DFP

While the industry has been talking about SoC power optimization for years, it is still in the early stages of shifting to a ‘Design for Power’ paradigm.


By Ann Steffora Mutschler

Although the semiconductor industry has been talking about the need to optimize SoC designs for power for many years, it is safe to say it’s still in the very early stages of the ‘Design for Power’ approach.

That’s not to say that methodologies and tools are not in place. There are actually a number of options available, depending on the level of abstraction the design team is focused on.

Abhishek Ranjan, senior director of engineering at Calypto explained that depending on where you are in the design flow (system, RTL, gate etc.) decides the kind of changes that you can make to your design to optimize for power.

“At the system level, which has the maximum flexibility, designers can play with: hardware/software partitioning, parallel architectures, pipelining, data precision, dynamic voltage/frequency scaling, bus/memory architectures and communication protocols,” said Ranjan. “At the RT level, the architecture is mostly fixed; still there is a lot of scope for making changes targeted towards power. Some of the most popular RTL techniques are memory banking, clock gating, memory gating, re-encoding of state machines, operand isolation, retiming, power gating, pre-computation, resource sharing, etc. To reach the gate level, designers use RTL synthesis tools, which  have in-built optimizations for power, like clock-gating, mixing of multi-threshold cells, pin swapping, operator gating, low-power and arithmetic architectures.”

On top of those techniques, semiconductor manufacturers keep tweaking the transistors by dropping the threshold and adding high k dielectrics to inherently reduce the power consumed by transistor.

Arvind Shanmugavel, director of applications engineering at Apache, observed that at the system level, power optimization techniques are usually driven by high-level synthesis (HLS) tools. SystemC is a popular description language to describe system-level behavior and HLS tools can mostly calculate power that is transaction-accurate and not cycle-accurate. They can optimize power by providing feedback on scheduling, clock selections and supply selections.

Then, the RT level is the stage where the first micro-architecture specification is written by a designer in the form of Verilog, System Verilog or VHDL with power optimization typically done through estimation and reduction. RTL power estimation is the first step to understand how much power is consumed by a particular micro-architecture and this is where designers can make meaningful changes to the micro-architecture to reduce power consumption. However, reducing power at the RTL is the most meaningful to a designer and is the stage where most power can be reduced, he said.

Finally, at the gate level, the micro-architecture has been synthesized as logic gates. Although power estimation is very accurate in this stage, power reduction opportunities are minimal. “Understanding functionality in a sea of gates is usually difficult to debug for a designer, as opposed to a functional representation in RTL code. Leakage power reduction is the most common power reduction at the gate level design. High Vt swapping is done for instances that are not in timing critical paths to reduce leakage. Other techniques such as fine-grain clock gating are also applicable, although the area and performance tradeoffs are quite high,” Shanmugavel noted.

Overall, as Venki Venkatesh, senior director of engineering at Atrenta put it, “The higher the level of abstraction, the greater the ability to optimize power. It is like an inverted triangle.”

Knowing what technology to implement and when is also key to a design’s success and engineering teams need to rely on tool guidance to make power decisions, Shanmugavel continued. “With the large number of power domains, clock domains and the complexity of today’s designs, it is difficulty to purely rely on engineering judgment. Tradeoffs in power, performance and area are very important to understand in such designs.”

For example, adding clock gating to all un-gated registers individually could cause a huge overhead in terms of area. However, if the clock gating is done at a higher level, understanding the functionality of the pipeline, the area overhead could be very low with good power reduction, he explained. “Another tradeoff is understanding the enable efficiency of clock gating cells versus the controlled downstream power. Achieving high dynamic clock gating efficiency is only useful if the controlled downstream power is also high. Without understanding this tradeoff, one could easily blow up the area without much power reduction.”

“Understanding the impact of power consumed on the power integrity is another important aspect of design trade-offs. Designers cannot use global clock gating techniques without seeing the impact of the di/dt on the voltage drop. Accurate average currents (for high power modes) and transient currents (for transitions modes) need to be captured in a model for different blocks and used for power integrity validation by physical design teams,” Shanmugavel added. Models such as can capture multiple operating states and transitions and provide feedback to power noise tools for verification.

Teams within the engineering group are further divided essentially based on the design flow, Ranjan said. There are system architects who decide the top-level architecture and various power management/optimization strategies and hand off the design to micro-architects or RTL designers. These engineers then deploy the tricks relevant to RTL stage and optimize as much as they can based on their expertise and time available in the schedule. From RTL synthesis onward, the task of power optimization is pretty much left to the automated tools. Sometimes RTL designers would tweak the RTL based on the feedback from gate level but this is very uncommon. So from one stage to the other, designers change and so do the optimizations.

Power budgets are basically decided upfront based on the requirements of the end product. Then from system to RTL to gate, various teams ensure that they do whatever possible to meet the budget. The end product determines the power optimization strategy.

“For mobile and tablet devices—the majority of today’s applications—the designer has to use multi-voltage strategies, as well as power gating, to minimize dynamic and leakage power, and also provide adaptive power management based on performance requirement of the product as it is being used. Chips meant for low-end applications such as toys might not require high-end power management, as that increases the cost of the product,” he added.

Lawrence Loh, vice president of worldwide applications engineering at Jasper Design Automation, said there are certain companies with certain specialties, which implies certain strategies. “Certain techniques like clock gating—everybody does that. By now most of the low-power mobile devices are divided into power domains, but there are more subtle techniques—what do you do when you turn off the power? Do you retain something? Sometimes it depends on the specialties of the company. For example, one company has its own foundry, so they know that they have very good low-power memories and they also have certain processes that give them some advantage on some techniques over others. The disadvantage is that they are taking third-party IP, so they are relying a lot on what the third-party IP is doing.”

Loh said another company, which doesn’t have its own fab but which designs a lot of its IP in-house, can use that expertise to optimize the IP. And because that IP is likely to be re-used, it can be enhanced. “With the first version of the IP, it’s about getting it right and making sure it works. With the second generation, we need to reduce the power and enhance it, so they have more capability of providing the right IP at the right time,” he said.

Still, it is important to remember that at each abstraction level there is list of candidate techniques available, mentioned above, and it is neither comprehensive nor cast in stone, Atrenta’s Venkatesh pointed out. “New techniques will emerge over time in each level of design abstraction. Engineering teams should maintain a continuously updated list of techniques at each level and use them when they apply.”

Next steps

While progress has been made, more needs to happen, and this is primarily due to a lack of standards in defining what a system is. “There are still no standardized ways of defining a complete system (with both hardware and software). Also the system level is still very abstract for estimating power accurately. Any attempt at creating power models becomes too specific and cannot be generalized. Before the automation can be achieved, effort has to be put in creating standards for describing a system,” Calypto’s Ranjan said.

Further, Loh asserted that the area he sees as lacking is the architectural front. “Certain things require a lot of human brains. A heuristic human brain is hard to automate and the system architectural side is not easy to automate. How does that system at the end, with software and everything, capture performance information, capture the heuristics they can pass on to the next design? What’s the format of how things are captured? If we can standardize that, then it will make the whole process a lot better. Standardization of formats is always a difficult thing because the architect already has their own format, and they like to do things they are familiar with. So there are some human behavior things to win over.”

In closing Shanmugavel observed, “In the past, EDA tools were more focused on functional aspects of a design. Today, the EDA industry is quickly responding to the needs of low-power designs. As power efficiency becomes a key metric for any design, we will see more power-aware EDA tools. We are still in the very early stages of the ‘Design for Power’ paradigm shift.”