Power is a complex multi-dimensional, multi-disciplinary problem. Does your flow address all of the issues?
Power is the flow of energy over time. While both aspects of that equation are important, they are important to different people in different ways.
Energy that moves too quickly can cause significant damage. Too much energy moving over time can mean a non-competitive product, from battery-powered devices to a wide array of locations such as the datacenter. When the industry talks about power analysis it can cover the entire spectrum. In addition, what you may do to reduce or mitigate consumption is very different between reducing total energy consumed and what you might do to protect your device against electrostatic discharge.
At one extreme of the timescale, the total energy consumed performing a function is the most important. An implanted medical device means a surgery every time the battery needs to be replaced. If U.S. datacenters were considered to be a country, it would be one of the most power-hungry countries of the world.
At the other end of the time scale, we may see very high voltages and currents that exist only for nanoseconds when static electricity is discharged through a device. This energy has to be contained and dissipated. In the middle of the scale, the power consumed by a chip has reached the limits due to thermal problems and this is limiting what can be accomplished in silicon. “When processor guys ran into the power wall it was about average power,” says Drew Wingard, chief technology officer at Sonics. “This is because it is dissipated as heat and to get rid of that requires cooling. So limits were set about the amount of heat we can effectively remove using conventional cooling schemes.”
It is probably not an overstatement to say that power has become both the limiter and the enabler for the semiconductor industry. Those who learn to tame power will be able to accomplish things that others cannot. The ability to impact power exists throughout the entirety of the product lifecycle, including software and algorithmic development, hardware architecture and implementation, layout and process technology. That affects both the analog and digital portions of the design.
In fact, more than just the time axis has to be considered. “The spatial scope is totally different based on global effects (e.g. overall consumptions) or local effects (e.g. local IR drop, voltage domain power grid integrity),” says Guillaume Boillet, power product specialist for the Emulation Division of Mentor Graphics. “In addition, because power mitigation solutions vary depending on the type of contributors (logic, memory, clock tree), analysis must report detailed breakdown information.”
Unsurprisingly, confusion is growing within the industry. “Some power-related terms are often used interchangeable,” says Tobias Bjerregaard, CEO for Teklatech. “‘Dynamic power’ for instance is used both in the sense of switching power, total (average) power caused by circuit switching activity, or ‘instantaneous’ peak power, as is often used when talking about dynamic power integrity, such as dynamic peak current or dynamic voltage drop (DVD).”
This leads to confusion when talking with designers. “While total power consumption and power integrity are both important key performance indicators, they are important for different – though interlinked – reasons,” continues Bjerregaard. “Total power shows up on the spec. Power integrity shows up in yield, and hence profitability.”
Total energy and leakage
With portable devices or things that run off of battery, we typically switch to talking about energy. “We are interested in how long the battery will last,” says Wingard. “They want to maximize that number and because of things like leakage, it can sometimes be more advantageous to run at higher power for a short period of time so that you can then shut off the leakage. Leakage is independent of the state of the circuit and the only way to affect it is to turn things off or take them to a retention voltage as often as you can. That can save a lot of energy.”
For an IoT application leakage is very important. “You need really long lifetimes between charging, and some are even trying to use ambient energy or harvesting,” says Krishna Balachandran, product management director for low power at Cadence. “You may be planning for 10 or 15 years between battery replacements, and when this is the case leakage is very important. It is true that leakage is less of an issue with the advanced nodes, but it will become more of an issue again as the nodes advance.”
The datacenter also cares about energy. Total energy includes the energy spent on cooling as well as that consumed performing the intended task. It is not just about the chip, because all devices that require supplemental cooling have to be included in total energy consumption.
So what can people do to minimize energy? “There is no tool today that measures the energy consumed,” says Balachandran. “Tools that operate at the Register Transfer Level (RTL) do not consider energy. Energy decisions are made at the architectural level. If you do so much in hardware versus software, how will the energy be affected?”
There is some early progress in this area. One company, Aggios, received a research grant from the California Energy Commission and as part of that program they developed a Unified Hardware Abstraction (UHA) format that has been donated to the IEEE P2415 committee. The standard defines the syntax and semantics for energy-oriented description of hardware, software and power management for electronic systems. It enables viewing, modeling, verifying, implementing and testing device’s energy features, covering both the pre- and post-silicon design flow.
The aspect of power that most design teams concentrate on is average power.
“Average power has been a standard measurement within the industry for a while,” says Balachandran. “Any kind of implementation tool, such as synthesis or place and route, would attempt to optimize power by comparing the power consumed between two options. It would calculate the power for options one and two, and then assuming that power is part of a cost function along with area and performance, would choose a particular option. Average power was a pretty good proxy for overall power.”
Most of this analysis is performed at the Register Transfer Level. “Designers start exploring power efficiency using vector-less power analysis for scenarios such as idle mode and max-power even before simulation vectors become available,” says Arti Dwivedi, lead technical product manager for . “This early analysis enables them to compare the power consumption of different implementations and choose the most efficient architecture. When vectors are available, average power analysis is performed for different use cases to estimate the power for various modes of operation. Designers analyze power for dominant modes, which include idle mode and several active modes such as video or camera usage in mobile devices.”
Many techniques exist to reduce average power, such as “clock gating using stability and observability analysis, efficient operation of memories, and identification of redundant switching in data-path elements,” continues Dwivedi. “Designers also verify the power intent of the design using UPF or CPF and explore further optimization of power domains to reduce leakage power in their designs.”
Most people consider average power to be the combination of static or leakage power and switching power. “Switching power scales with the square of the supply voltage, which means that by reducing supply voltage by 10%, you overall gain 21% switching power,” says Bjerregaard. “This is particularly interesting with finFET devices, as they allow good performance at very low supply voltage. This is true until a critically low voltage is reached. Pushing the supply voltage down further puts an increasing pressure on power noise margins and reduces the acceptable dynamic voltage drop DVD.”
There are times when the amount of switching activity within the design can start to create problems. This is normally referred to as peak consumption.
“You have to ask, ‘Peak over what time window?'” says Wingard. “Peak over a longer time period has to do with how quickly we can allow the temperature on the die to change. Because of thermal expansion we don’t want to see chips heat up very quickly. If you have a region of activity, say tens to hundreds of milliseconds, as opposed to the nanosecond time range where the power has increased significantly, then we can heat too fast. Failure scenarios associated with that result in things that were meant to be attached popping, connections getting broken, etc.”
Thermal analysis is itself a large problem that has to do with the power consumed over a period of time as well as the spatial aspects of the chip. Chip layout may attempt to spread computation cores out across the surface, so that if one area starts to get too hot it can be powered down and the processing moved to another cooler core. Alternatively, clock rates and voltages may be reduced to lower the power consumption until the temperature has become more manageable.
“Peak power is a newer concept,” says Balachandran. “It is not that it wasn’t measured before, but companies are getting more interested in it. If the peak power exceeds the rating of the package, then you will have problems. It could mean that you have crossed into a dangerous zone and is a reliability issue. Peak power analysis is done to ensure that you operate in a safe zone.”
When the peak power is over a shorter period of time, it has different impacts. “Peak over a very narrow timeframe is typically about peak current,” says Wingard. “That impacts the power delivery network.”
Power integrity is an increasing difficult problem. “Dynamic power integrity did not play a very big role at early sub-micron technology nodes,” says Bjerregaard. “Back then, power integrity was about static IR drop. But with increasing transient vs average power ratio at scaling tech nodes, dynamic voltage drop has become the important power integrity sign-off metric.”
Both power grid integrity and packaging are closely related issues. “Peak power analysis is an important part of low power design to identify peak power and di/dt scenarios,” says Dwivedi. “In order to ensure good coverage, peak power is analyzed for real vectors comprising of millions and billions of cycles. Peak power analysis requires real vectors from live applications to ensure the critical peak power windows are not missed. Identification of peak power and di/dt cycles can drive power grid design and sign-off.”
But herein lies one of the problems. “Peak power is like finding the needle in the haystack,” says Balachandran. “You do not know when it is going to happen. It is very stimulus-based. It may take the viewing of a particular video that causes power to go beyond the limit. You may assume that peak power is good and below the thermal envelope. Then someone will run an application in the field, a new app that someone created, and that app stresses the hardware in a way that hadn’t been done before. Peak power is about anticipating the particular tasks that would be problematic ahead of time.”
The objective of mitigation is clear. “It is about trying to make sure we can get the power into the chip and have it be sufficiently stable and reliable, such that other circuits can continue to operate normally and avoid timing problems that result from the lower supply voltage,” says Wingard.
Included in the “other circuits” is analog circuitry, which may be a lot more prone to noise issues. “If we look at high-speed interfaces, you need a large current to support a long-distance communication, and that is all about how you design your transmitters,” says Navraj Nandra, senior director of marketing for DesignWare Analog and Mixed-Signal IP at Synopsys. “You need to ensure you have a stable enough voltage going into the transmitter so that it can support large swings down the channel to the receiver, which can be meters apart. In order to control the amplitude, the supply to the analog transmitter has to be clean. Otherwise, if there is noise on the supply, what you are transmitting is somewhat corrupted. IR droop has to be contained and the power supply rejection ratio needs to be sufficiently high such that you are rejecting any power supply noise.”
One aspect of power optimization already mentioned is power gating which switches off areas of the design when it is not being used. This eliminates the leakage current from that block. However, another aspect of peak power occurs when one or more of those blocks are turned back on. This is often called in-rush current. “You have to recharge the internal nodes and wells,” says Wingard. “Rather than beefing up the supply, design teams tend to slow down the power ramp by using higher resistance transistors at first, and then switch on low resistance switches for better IR characteristics when close to operation.”
At the other extreme is current that exists for very short periods of time. “Electrostatic discharge (ESD) is a system-level issue, but it can mean a peak current of 9 Amps,” says Nandra. Typical timescales for ESD can have a ramp time of 10 nanoseconds and a decay time of 150 nanoseconds. “Packages are getting so big that things that used to be on the board are being integrated on chip.”
Power has become multi-dimensional and multi-disciplined. The industry is still learning lessons about not paying enough attention to power. It only takes one weak member of the team to throw away the work of all of the others, and today there is little in the way of tools that can span the types of analysis necessary for everyone to play as a team.
“We see enormous opportunity to save power in today’s designs by working at the architectural level,” says Wingard. “Thinking about power and energy as an afterthought is the root of the problem. If we think about optimizing power or energy as a first class piece of the process, then we would probably do things differently.”
A lot of changes had to come together to make near-threshold computing a technology that was accessible to the industry without taking on huge risk.
Power Estimation: Early Warning System Or False Alarm?
Experts at the table, part 3: The experts discuss what it will take for software to become more power aware and the progress we can expect over the next year.
Reaching The Power Budget
Why power is still a problem, how it will get worse, and what can be done about it.
Power Limits Of EDA
Tools aid with power reduction, but they can only tackle small savings in a locality. To do more would require a new role for the EDA industry.