Is DVFS Worth The Effort?

Dynamic voltage and frequency scaling can save a lot of power and energy, but design costs can be high and verification difficult.

popularity

Almost all designs have become power-aware and are being forced to consider every power saving technique, but not all of them are yielding the expected results. Moreover, they can add significant complexity into designs, increasing the time it takes to get to tapeout and boosting up the cost.

Dynamic voltage and frequency scaling (DVFS) is one such power and energy saving technique now being considered within high-performance systems, designs on the latest geometries, and for ultra-low power systems found on the edge. In the past, much simpler dynamic frequency techniques have been used. The basic idea is that if the frequency is divided by two, the time it takes to complete a certain task will double. All other things being constant, the total energy consumed would appear to stay the same.

The problem is that formula doesn’t account for leakage, which is fixed over the longer time period. The energy lost to leakage will double, and that cuts into total energy savings even though peak power has been reduced. When voltage is reduced, both power and energy are reduced. But leakage only decreases linearly with voltage, so it starts to consume a greater percentage of total energy. At some point, typically around the threshold voltage, leakage becomes dominant.

Power and energy savings also must be viewed in the context of the rest of the system. Memory voltage cannot be decreased by the same amount as a CPU to remain functional. While it’s possible to reduce core memory voltage to a minimum, where it still can retain contents, at that voltage it cannot be read or written to.

Even if the memory voltage is not scaled by the same manner as the processor, gains can still be made — especially in applications that were memory-bound. For example, there may be less contention for the memory, which leads to fewer stalls.

Other memory, such as DRAM, has to be continually refreshed. And I/O circuitry often cannot have its voltage changed because it has to conform to standards.

Put simply, the total energy to perform the entire function has to be considered, and that means being aware of the typical use cases that the system is likely to be subjected to. Only then can the additional cost and complexity of the hardware and software be traded off against the power and energy benefits of deploying a DVFS-type of solution.

Basic techniques
Many of these techniques have been in use for a long time, just not in a dynamic manner. “If something is not needed, shut it down,” says Aleksandar Mijatovic, senior design engineer for Vtool. “If something does not need to operate at a high a voltage, then lower it. If something needs to be alive in order to respond to data when it arrives but is not processing anything, lower the frequency as much as you can. Dynamic scaling is just introducing new ways of doing old techniques.”

Traditional power saving techniques may be enough for many people. “Recent developments in processor and memory technology have resulted in the saturation of processor clock frequencies and better idle/sleep modes,” says Progyna Khondkar, product engineer for Mentor, a Siemens Business. “Each of these limit the potential energy savings resulting from DVFS. UPF provides accurate guidance to implement the design as low power designs. This limits the potential energy savings possibilities that could be achieved from DVFS plus the difficulties to verify and implement a design with DVFS methodology may also be a potential reason that has restricted its widespread usage.”

Leakage can diminish returns. “What typically does not scale is the overhead power consumption,” says Tim Kogel, principal applications engineer for Synopsys. “Therefore, the most effective way to maximize computational efficiency and minimize energy consumption is to run fast then stop (RFTS). However, thermal constraints may require throttling of voltage and frequency, especially to prevent thermal runaway due to temperature dependent leakage power at small process nodes. Since the overhead power consumption of other components like memories and interconnect does not scale, stretching the execution over a longer period requires more energy to finish the job.”

Figure 1 shows simulation sweep results of an AI inference design over different design parameters such as clock frequency, number of MAC units, and DDR memory speed. The end-to-end latency (red line) is correlated with the total energy (blue bar). “The faster the job finishes, the lower the energy consumption,” adds Kogel. “This comes at the expense of higher power consumption (green bars), since the work is done in a shorter period.”

Fig 1. Power, energy and performance tradeoffs in AI inferencing. Source: Synopsys

There are several techniques being used. “While voltage scaling is becoming more widely implemented, for many it is a new and certainly non-trivial feature,” says Richard McPartland, technical marketing manager for Moortec. “We distinguish between static voltage scaling (SVS), dynamic voltage and frequency scaling (DVFS) and adaptive voltage scaling (AVS).” (See figure 2.)

Some of these schemes are better suited for particular applications than others. “For example, SVS schemes are well-suited for applications with continuous, often high workloads,” McPartland says. “For example, take a telecoms chip where data always arrives at a fixed rate and you cannot simply drop the clock rate and VDD. Here SVS would be a good choice. By contrast, DVFS schemes are particularly good for variable workloads where you may have periods with low activity or may need to reconfigure the SoC for different applications with different levels of processing.”

Fig 2. Static voltage scaling and dynamic voltage and frequency scaling techniques. Source: Moortec

Adaptive frequency scaling (AVS) is similar to SVS. “In AVS, there are a set of sensors, which are placed on the chip, and you measure the voltage at a particular location,” says Mallik Vusirikala, senior manager, product management at Ansys. “Then there is closed-loop feedback to the LDO (low dropout regulator), which raises the voltage or reduces the voltage. If the voltage in one region of the chip drops, you will probably want to scale up the voltage so that your frequency is met.”

AVS and SVS tend to utilize fine-grained dynamic voltage scaling across a limited range. “DVFS typically has a fixed set of voltages,” adds Vusirikala. “For example, you may want the block to be at 300MHz, or I want my block to be at 400MHz. It will have fixed steps based on the application needs.”

The ranges on DVFS can be extreme. “DVFS is being used between 0.5 volts, all the way up to 0.95 volts,” says Mo Faisal, president and CEO of Movellus. “That is going from near-threshold all the way up to 0.95 volts, which is above the limit of electrical overstress. It gets really challenging to do a very wide DVFS. You need to have standard cell libraries that are characterized over that extreme voltage range, which often becomes a custom project for the company that wants to do it. Not everybody can afford DVFS, because you basically have to re-characterize your library.”

To know which technique will work best, and the range necessary, the application has to be understood. “In cases where the workload is memory-bound, voltage and frequency reduction through DVFS provide quadratic savings in energy and cubic reduction in power,” says Shidhartha Das, senior principal research engineer at Arm. “Alternatively, increasing voltage through DVFS enables a performance boost for workloads that demand greater power. This is where margins — particularly due to voltage-noise events — play a crucial role, as they limit the allowable voltage-boost due to peak-power caps. Design techniques that enable dynamic adaptation to voltage-noise effects can optimize these margins, thus enabling an effective application of DVFS, both for performance and efficiency gains.”

At the most advanced nodes, it is becoming a necessary technique. “Advanced finFET nodes offer leading-edge performance but are accompanied by higher process variation,” says Moortec’s McPartland. “Voltage scaling enables the VDD supply to be optimized on a per-die or even voltage-domain basis, so a die (or domain) with FF characteristics could be operated at a lower VDD, saving significant power and energy while still achieving the performance required. The aim is to optimize the VDD guard bands per die and minimize operating at worst-case, highest VDD if it’s not necessary.”

Clearly, this only can be done with sensors, either in the tester or on the chip. “We recommend including a full complement of in-chip monitors,” he says. “These can be used to dramatically reduce test times necessary to determine the operating point(s) — the voltage/frequency pair. They also provide visibility into chip conditions, including process speed, VDD directly at the critical circuit blocks, and temperature throughout the die. In-chip monitoring is an important element when implementing voltage scaling.”

Design concerns
There are some potential pitfalls to keep in mind when designing DVFS systems. “You have to close hold timing at the highest voltage, the highest temperature, and the fastest corner,” says Movellus’ Faisal. “A hold violation is basically a dead chip. It would be a statistical error, which is very difficult to debug. However, setup time is the hardest at the lowest voltage — the opposite end. In order to hit a fast enough frequency at the lowest voltage, you end up using a lot of low-threshold devices, which tend to be very leaky. Now, when you go to your high voltage operation, you’re paying a lot in leakage, and you’re paying a lot with margins and things like that.”

Another area of difficulty is the power supplies. “When you turn on the power switches, the power supply from the board cannot respond fast enough to power up that particular block,” says Mallik Vusirikala, senior manager, product management at Ansys. “This is due to the inductance in the board and the package. When a block powers up, it discharges the decaps on the chip for parts of the circuit that already were functioning. That means there’s a huge amount of drop that happens on the always-on blocks and the existing functional logic, so you have to slowly turn it on, which means there’s some amount of time before the block is ready for computation.”

That is stage one. “First you crank up the voltage and give that time to settle,” adds Faisal. “Then you change the frequency, and you have to wait for the PLL to get to the new frequency. The time this takes is also wasted time, so looking for solutions that can do this quickly is important. When you have locked to the new frequency, you then can start to do work.”

Application-level concerns
While AVS and SVS are done in hardware, DVFS often involves software. “DVFS is mostly an application issue,” says Lauri Koskinen, CTO for Minima Processor. “The application has to be rewritten so that it can take advantage of the DVFS capability. There are design issues, such as verification and characterization in the case of ultra-wide DVFS, but if you know your application and you have good synergy between your hardware and software teams, DVFS is just for you.”

Others agree. “You can allow software to control it, and there should be hardware mechanisms to tell it when it can shut down some portions, or simply have parts of the chip that will go into a sleep mode by themselves until they are awakened by some external signal,” says Olivera Stojanovic, senior verification manager for Vtool. “It is up to the software to enable either the mode where hardware will decide if it can power down after some time of not being used. Or, software can be responsible for deciding when to shut something down. These are architectural choices. You need to understand your application and to actually figure a timeline for usage of each part of the chip.”

This requires the software team to understand the hardware architecture and the implications of their software. “If you want to turn off a certain bank of memory, you have to decide if its contents will be retained,” says Vtool’s Mijatovic. “This should not impact anybody, but it is something that software needs to understand. It means that software needs to be written in such a way that it allows power techniques to be efficient. You cannot simply tell the software to run forever, use all the resources all the time, and think that you will get power savings from automated hardware methods. Whoever is writing software needs to be aware of what they are allowed to use in a low-power mode.”

Verification concerns
The implementation of DVFS creates additional concerns at the voltage and frequency boundaries that need to be adequately addressed in your verification strategy. “One issue that happens is driving a signal from a low-voltage domain to a higher-voltage domain,” says Ansys’ Vusirikala. “If the driver of the signal pin is at a lower voltage compared to the receiver, you can end up with a crowbar current at the receiver. The gate voltage might not completely cross the threshold quickly and draw excessive current.”

Standards and tools may not directly support the methodology. “The frequency part of DVFS is not supported by UPF or power-aware simulation tools today,” says Mentor’s Khondkar. “However, users may write SystemVerilog assertions that can be coordinated through the simulation environment, with actual voltage changes either from the testbench or from voltage regulator blocks.”

But perhaps the biggest verification concern is the number of corners that have to be considered for timing closure. “You need to have your timing closed at multiple corners, at different frequency points, at different voltages,” adds Vusirikala. “That’s typically a multi-mode, multi-corner closure. Now you will have many more corners to close.”

The transitions between those corners also need to be considered. “From a power integrity standpoint, you need to do voltage drop analysis at these multiple additional corners to ensure that your frequency is met,” Vusirikala says. “Then, when you’re switching from low frequency to higher frequency, there will be increased power demand. You will need to verify whether the LDO is able to respond in time for your block to ramp up to a higher frequency and start computing. If the LDO is placed off-chip, overshoot/undershoot and package-based noise can become a concern. These problems do not exist if the LDO is on-chip because there is no inductance.”

Other concerns
Along with the additional design complexity, verification complexity, and software issues, there are other things that the team needs to keep in mind. “When considering the overall cost and benefit of DVFS, you need to consider security,” says Sergio Marchese, technical marketing manager for OneSpin Solutions. “Remote physical attacks based on triggering faults through circuit misuse are on the rise. Rowhammer is a well-known example. DVFS features also can be leveraged to provoke glitches and target faults. Depending on the application, it may be necessary to do a risk assessment, identify weaknesses, and introduce security measures that prevent malicious misuse. No pushbutton tools exist that address this security assurance challenge. Formal tools have the capacity and features to combine an exhaustive analysis of complex DVFS control functions with fault injection, which can identify issues and validate mitigation strategies.”

Perhaps one of the biggest issues is being able to predict outcomes. “Power architects understand that it is very difficult to estimate the dynamic effects of feedback loops,” says Synopsys’ Kogel. “The power manager determines the operating point based on the projected load, which in turn impacts the processing performance and hence the projected load. In addition, you need to consider constraints like application deadlines, thermal throttling, and even long-term effects like chip aging due to temperature variation, which makes it very difficult to assess the benefit and impact of dynamic power management with static spreadsheet analysis. Virtual Prototyping tools for joint power and performance analysis and optimization of DVFS policies are available. The idea is to simulate SystemC transaction level performance models in combination with UPF3 system level power models.”

Conclusion
The power and energy saving from DVFS can be extensive but those gains can be whittled away quickly if the entire system is not considered. Many designs enable blocks to be powered on and off, and many add multiple systems to perform high-performance functions versus standby mode functions. This allows for greater levels of optimization at each of the power modes. But some designs can make significant gains by having a single system that is able to operate at multiple performance/power levels.

Additionally, when variation becomes a key concern, an adaptive system can bring outliers into conformance with the specification, effectively increasing yield.

Perhaps the biggest impediment to adoption is that dynamic systems rely on re-architecting of software, and that places additional cost and risk on a project. Over time, the power savings may force that to change.



1 comments

DrZ says:

Your last paragraph hits the nail on the head. However, “re-architecting” the software to improve power is actually pretty straightforward and already fully automated through firmware synthesis. Similar to virtual prototyping, compiler/processor co-design, HW-SW co-verification and many other challenges from the past, power optimization including DVFS is a typical HW-SW boundary problem. Clever tools can help with that.

Leave a Reply


(Note: This name will be displayed publicly)