Clock networks consume more than half the power on any chip. What are you doing about it?
By Arvind Narayanan
Among the perennial challenges of advanced-node IC design is power reduction. Clock trees are now the single largest source of dynamic power consumption, which makes clock tree synthesis (CTS) and optimization an important task for achieving overall power savings.
Building a well-balanced clock tree and effectively managing clock skew has been a challenge since the first transistor was invented and it still is today, especially at 28 and 20nm. The only difference is that now power is in the mix along with timing, which complicates things even more. At smaller technology nodes, the clock network is responsible for more than half the power consumed on any chip and the majority of it is dynamic power due to the toggling clock.
Traditionally, CTS engines are geared towards achieving the best possible skew and latency with power only as a secondary cost function. Run-of-the-mill low power CTS strategies such as clock gating, lowering leaf capacitance, minimizing switching activity, and minimizing area and buffer count in the clock tree help improve the power profile, but are not sufficient to meet the aggressive power targets for advanced process nodes.
Clock Power Challenges
There are two main characteristics of advanced node designs that affect clock power: 1) increase in number of modes and corner scenarios, and 2) effects of process scaling. At smaller technology nodes, resistance per unit length of interconnect increases when compared to capacitance. In addition to the increasing resistance, large variations of resistance seen across various process corners pose additional challenge of balancing the clock skew across multiple corners. With the proliferation of mobile devices, clock trees have become extremely complex circuits with different clock tracing per circuit mode of operation. Further, building robust clock trees that can withstand process variation is a huge challenge for the design teams.
Low-Power CTS Techniques
Because clock power consumption is a factor of capacitance, switching activity, and wire length, reducing any of those will lower overall power consumption. Some of the key techniques for reducing clock power include:
Additional techniques, such as slew shaping and the ability to define skew groups are also beneficial in reigning in clock power. Slew shaping techniques push the majority of cases closer to target slew, eliminates transitions that are overly pessimistic, and meets timing requirements while at the same time minimizing dynamic power.
Multi-Corner, Multi-Mode CTS is Key
Among all the techniques for low-power clocks, the best results come from a CTS engine that can synthesize the clocks for multiple corners and modes concurrently in the presence of design and manufacturing variability. Concurrent MCMM CTS allows dynamic tradeoffs among all corner/mode and power state scenarios simultaneously.
The experiences of designers using MCMM CTS (Figure 1) show significant reduction in area, number of buffers, skew, total negative slack (TNS) and worst negative slack (WNS), in addition to lower dynamic power.
Other Techniques to Lower Clock Power
Clock gating reduces clock power by shutting off the clock to unused sinks. Identifying and performing netlist-level clock restructuring will improve clock gating coverage by finding missed clock gating opportunities. Optimal placement of the clock gates to ensure that both the timing and power targets are being met will improve power. Clumping registers during placement also helps minimize capacitance on the clock tree nework. CTS should automatically perform clock gate cloning and de-cloning to optimize and balance the load on the clock tree network.
Because the leaf clusters (wire and pins) carry most capacitance in the clock tree, having dynamically updated RC calculation during CTS allows for leaf clustering that minimizes capacitance, and therefore reduces power. The CTS tool should also work on-the-fly with the global routing engine, during clock buffer insertion so that the CTS engine sees more accurate topology and congestion.
Using Skew Groups to Improve Skew Balancing
CTS engines usually aim for zero skew by balancing the signal arrival time across all the flops regardless of which level of the clock tree they inhabit. However, not all clock end points need to be balanced with each other; these different groups have different clock end points, better known as ‘skew groups.’ Balancing them separately from each other traditionally means writing multiple CTS specs by hand and performing multiple CTS runs. The CTS engine should automatically analyze flop interactions to derive the exact skew balancing requirements at the different clock tree levels, and also across different voltage islands. The tool should also be able to discover skew groups by analyzing connected components in the timing data structure. Using skew groups saves processing time by eliminating manual CTS specifications and multiple CTS runs, and saves power by reducing the number of buffers inserted.
Using Slew Shaping for Optimizing Power
Slew is the clock transition, or how long it takes for the clock to switch. Slower slew means slower timing and lower power, while faster transitions draw more power but can improve timing and signal integrity. Slew shaping is the ability to eliminate transitions that are overly pessimistic to reduce dynamic power while at the same time meeting the timing constraints, as shown in Figure 2.
Summary
With the proliferation of mobile devices, clock trees have become extremely complex circuits with different clock tracing per circuit mode of operation.. The growth of mode/corner/power states and the large variations of resistance seen across various process corners require designers to adopt smarter methods for CTS and clock optimization. Specifically, to lower power in clocks trees, the CTS engine must handle MCMM scenarios and use advanced CTS techniques like intelligent clock gating, skew groups, and slew shaping. With power aware CTS optimization, the designer can achieve the best QoR for both power and timing without sacrificing area, or time to closure.
—Arvind Narayanan is a product marketing manager at Mentor Graphics.
Leave a Reply