Why is that laptop so hot? Why can’t the data center be cooled anymore? And why should you care?
By William Ruby
Low-power seems to be on everyone’s mind these days, and it’s not just the chip design teams. One common consumer complaint is that the “battery life is way too short”! And of course, we all know this one, “OMG – that laptop is sure hot“! Even data center facilities managers lament, “We can’t supply enough power to the equipment—and when we do, we can’t cool it!”
But, it wasn’t always like that. We didn’t use to design with power in mind. As long as the design met functional specifications and performance targets, it was ready to be shipped. So, either power was not an issue with smaller designs at that time, or else a bigger heat sink or fan could be used.
Fast forward to now, where increased design complexity combined with the drive toward mobile applications requires designs to have power methodologies. One key to low-power design, in addition to a host of automatic optimization techniques, is to eliminate or reduce power consumption waste.
While working with a variety of customers on low-power designs, I found at least 20 reasons for wasted power. I’ve listed the top 5 here, including how customers are dealing with these issues in current design flows.
#1: Missed Global Clock Gating Opportunities
While local register-level clock gating has been automated with the aid of synthesis tools (see more below), global or “architectural” clock gating has not. To control the clocks at a global level you must understand the design intent, including under what operating conditions the clocks are required to run and when the clocks can stop. Knowing the design intent is not something an EDA tool can easily achieve, but the issue actually goes beyond clocks driving registers in the design. Clocks are also used by synchronous memories, which are the predominant type used today. Redundant memory Read and Write cycles, without the address and data changing cycle-to-cycle, wastes huge amounts of power. It is up to designers to understand the design and seek opportunities to stop the clocks when operation is not required.
#2: Inefficient Design Implementation
This large area encompasses all steps in the flow that follow RTL design and functional verification, such as synthesis, placement, clock-tree synthesis, routing, timing optimization and closure. There are several instances for uncontrolled implementation tools to introduce power inefficiencies into the design, including synthesis that may oversize logic gates, or pick a power-inefficient micro-architecture for an arithmetic component—but of course meeting timing constraints. During placement, some cells that will be connected by high-activity nets may be placed far apart, resulting in high capacitance and wasted power. Aggressive clock skew constraints will result in excessive clock buffering plus a large number of buffers, and clock tree balancing may also result in additional buffers. Routing constraints may result in long wires for high-activity nets, similar to issues seen with inefficient placement. So, the way to achieve efficient implementation is to provide proper constraints to the automatic implementation tools, and not be too aggressive, especially in timing optimization and closure.
#3: Inefficient Design Architecture
Although much larger opportunities for power reduction exist at higher levels of abstraction, it is no big surprise that inefficient design architecture does not appear at the very top of this list. Even though it is no designer’s intent to create an inefficient architecture, there are several aspects that must be considered. One aspect that is related to missed global clock gating opportunities is an architectural issue. However, true architectural considerations must go beyond that to regard how fast the clocks must be for any given functional or performance requirement; how many pipeline stages are needed to meet latency requirements; and how much work can or should be done per cycle. Another aspect is the memory sub-system organization. Once the amount of memory required is known, how should it be partitioned? What types of memories should be used? How often do they need to be accessed? All of these issues greatly affect power consumption, so designers must make power-performance-area tradeoffs for various alternative architectures in order to make informed decisions.
# 4: Poor Local Register Enable Conditions
As mentioned above, register clock gating is well automated in modern logic synthesis tools. Given an existing enable condition for a register, a synthesis tool will insert a clock gating cell controlled by the enable signal instead of implementing a recirculating mux. Synthesis tools do this while meeting timing constraints, as well as ensuring testability. So, where is the potential for wasted power here? It all has to do with just how efficient the enable condition is gating the register clock when it is not required. High clock gating coverage, i.e. the percentage of registers with enable conditions, while a useful metric, does not always translate into high clock gating efficiency. By studying clock enable conditions, and understanding how much clock power is consumed downstream of clock enables, designers can focus on areas that have the most inefficiency and represent the most power savings opportunities.
#5: Lack of a Power Gating Strategy
Leakage power is now a large proportion of total power, starting with 65nm designs, and is even more dominant in 40nm and below designs. Although automatic techniques can be used downstream in the design flow to reduce leakage power, such as multi Vt cell optimization, power gating (or power shut-off) is by far the most effective practical technique for reducing leakage power consumption. However, power gating can’t simply be left up to implementation tools. The design must be partitioned into power domains up front, and control signals must be designed to ensure proper operation, complete with state retention circuitry if required.
As mentioned earlier, there are many reasons for power consumption waste including, but not limited to: redundant data signal activity when clocks are shut off; excessive margin in library characterization data leading to inefficient implementation; large active logic cones feeding deselected mux inputs; lack of sleep or standby mode for analog circuits; or insufficient software-driven controls to shut down portions of the design. It is clear that excessive power consumption is an “equal opportunity” problem, requiring a variety of design techniques, tools, and methodologies to achieve a successful low-power design.
–William Ruby is senior director of RTL power product engineering at Apache Design Solutions.
Leave a Reply