I Say ‘High’ [Performance], You Say ‘Low’ [Power]

Optimizing for the lowest power in a high-frequency, high-switching design.

popularity

“…You say ‘why’, and I say ‘I don’t know…’”

Actually, I do know.

Everybody loves a high-performance product. Even just hearing that a product is high-performance sets higher expectations than if the product is simply described as “fast” or “powerful.” When it comes to SoC design, “high-performance” refers to a set of designs that run at very high clock frequencies with fast-switching elements, perhaps using some kind of processing cores or sub-units operating in the in the multi-GHz range.

Pushing performance (that is, frequency) always comes with certain costs, which can inhibit the maximum returns of a high-performance SoC. These costs are generally related to two components from the Power, Performance and Area (PPA) triad: power and area. These problems lead physical design engineers to lose sleep (though I suspect their real reason for being up at night is Netflix, but they won’t admit it!).

Area limits are generally fixed due to several factors outside a physical design engineer’s control, such as package size, yield, foundry process limits, and so on. That said, the most significant component from the above mentioned PPA triad is power. If optimized in a smart way, power can give an edge to the SoC, so it fits into certain power envelope requirements dictated by the top-down applications for which the SoC is competing. Low power means longer battery life for mobile devices and wearables, not to mention lower cooling costs for datacenter ICs.

But how do you control—that is, optimize—for the lowest power in a high-frequency, high-switching design? Where do you even start—at the RTL level, or at floorplanning stage? Do you need switching power domains and clock gating? Can we get lower power in just one power domain through optimization techniques for power reduction? What about the optimal power delivery network? These are the questions that keep hounding the SoC architects who must keep a power reality check on while they push for higher performance.

Power intent, power optimization, and power integrity are often used interchangeably when discussing power improvement, and this can muddy up the power conundrum. Regardless of the terminology, power improvement must be achieved in a holistic way, taking all power-hungry components into consideration.

1. Power intent and low-power architecture
At early stages in architectural exploration, many different functional architectures are possible. Using a good, high-level physical synthesis methodology together with a powerful RTL signoff flow can provide designers with multiple options to gain RTL power without compromising clock speeds. These days, RTL signoff has some sophisticated tools with peak power profiling and scrubbing capabilities. The maximum potential for pushing lower power using a power intent definition is only at its early stages of development, though.

A few years back, traditional power-focused architecture techniques (such as multi-supply voltage, multi-Vt optimization and clock gating) gained popularity due to their ability to push the leakage power, which was the bigger component of power usage back then. Later, complex power schemes with power shutoff and dynamic voltage/frequency scaling were adopted at the risk of flow complexity.

These days, new generation synthesis and implementation tools have some pretty elegant ways for implementing and optimizing power across different power domains using advanced power intent definitions.

2. Power optimization
When it comes to power optimization during the implementation stage, backend engineers are squeezing power out at every possible sub-stage of the flow. Vt swaps are a popular way for pushing frequency higher on paths that don’t meet timing, but there must be a smart way to make all optimization transforms power-aware. Power awareness involves taking a more global approach and optimizing over a global maximum (as compared to a local path or group). Opportunistic power reclaim from positive slack paths is a popular technique and most modern optimization engines are tuned to work in a smart way to reclaim power effectively. With the move to FinFETs, there is a focus on dynamic power recovery, and therefore on the use of accurate activity information for dynamic power analysis and reclaim. Maintaining accurate information along the whole place-and-route flow (including while splitting and merging multi-bit flops) is a non-trivial task. Effective power optimization using activity information is only possible if all of the activity calculation, propagation, and annotation is accurate.

Clock tree design and synthesis is perhaps one of the most important aspects of high-performance designs. Generally, high-performance designs go for a spine or mesh methodology. Although these methodologies are good for balancing and variability, they pose significant challenges for power optimization. But advanced concurrent “clock and data” optimization techniques can allow a combination of adaptive structured clock trees and leaf level multi-tap clock tree synthesis (CTS), which can help bring balance to the power numbers by skewing the clocks. Just as useful skew can be used to push performance, CTS tools can skew the clock for improving power—or even adding additional buffers for power purposes—to avoid too much switching at the same time. Also, smart implementation tools these days can optimize clock power by optimizing buffer to flop placements.

In advanced node designs, power optimization during the routing and post-route phase can continue for routing certain nets by choosing metal layers that switch more frequently than others. Enabling power-driven routing for certain buses by non-default rules (NDRs), and using structured straight routing for the high-frequency buses, thus reduces wirelength, thus reducing power.

Using an embedded static timing analysis (STA) signoff engine during implementation is a popular feature among sophisticated flows these days. It helps with another incremental push for improving power with signoff corners pulled into the implementation loop.

3. Power integrity
The third and most important piece to consider in high-frequency designs is power integrity, which involves power grid optimization (including the regular power grid and clock shields). Because there are many long nets switching at high frequencies, the natural tendency of physical designers (particularly those who have been burnt in the past with dark silicon due to power grid issues) is to overdesign the power grid. Through painful experience, they have learned never to take chances with a weak power delivery network. In addition, clock-shielding prevents signal integrity (SI) failures on fast-switching nets, taking away more signal routing resources. This problem becomes even more complicated when routing for lower-layer metals, particularly at advanced nodes, due to self-aligned double patterning.

In reality, all sections of any high-performance design won’t be switching at the same time, so there always could be an opportunity to free up some of the tracks used by power rails for routing. A close integration between implementation and a signoff power/IR tool can mimic some grid trimming scenarios to remove sections of the power grid and effectively make room for high frequency routes to travel with optimal distances to get better wire length, and, in turn, better power. This can be a very powerful way to optimize power further for high-performance designs, but the implementation and analysis tools must be closely tied at API level for faster information exchange to make effective decisions.

All in all, power optimization can be employed using many techniques and in many stages of the design flow. Beyond holistically optimizing power architecture and integrity, however, physical design engineers also find excitement in finding unexpected loopholes to save power even more. Pushing frequencies higher for a high-performance design while reducing power at each stage of the design flow becomes an exciting game; something to strive for, with more than a sense of accomplishment at the end. Designers know that fitting their mobile SoC into the necessary power envelope will mean fewer “goodbyes” and a whole lot more “hellos” with a “go, go, go, till it’s time to go,” thrown in.