Near-Threshold Computing Gets A Boost

Why many leading-edge designs are beginning to look like near-threshold designs.


Near-threshold computing has long been used for power-sensitive devices, but some surprising, unrelated advances are making it much easier to deploy.

While near-threshold logic has been an essential technique for applications with the lowest power consumption, it always has been difficult to use. That is changing, and while it is unlikely to become a mainstream technique, it is certainly becoming a lot easier for those who want to try.

Operating logic at or near the threshold voltage always often is done at the expense of performance and area. There is a catch with this approach. Because so few companies were willing to utilize it, tools vendors have not been willing to invest in it. As a result, it remains difficult to use.

But sometimes, unexpected elements come into play. What in the past was deemed too difficult, has becomes the only, or at least the easiest, path forward. With each implementation node, and the corresponding shrink in feature sizes, many aspects scale well. But not all of them do. At about 90nm, nominal voltages started to flatten, and thus switching power stopped scaling.

The introduction of the finFET at 16/14nm (or 22nm for Intel) improved both operating voltage and leakage, but supplies have been reducing quicker than the threshold voltages, which has led to less supply margin for circuit designers. This means many cutting-edge designs are beginning to look a lot like near-threshold designs.

Let’s take a quick review of the technique. Power Consumption is a quadratic function of voltage, normally stated as, power is proportional to CV2f. As the voltage drops, you get a significant power savings at the expense of performance.

Because many IoT designs have a long duty cycle, this is often an acceptable tradeoff. However, there are further complications. Total power is a combination of static or leakage power and dynamic power. As the voltage is dropped toward the transistor threshold voltage (VT), switching power decreases. But at the same time leakage current increases (see figure 1). This means the optimal combination of leakage and switching power has to be found.

Reducing the voltage below a certain power will result in leakage increasing faster than switching power is decreased, and performance will also be degraded. The optimum operating point is usually slightly above VT and is called the near-threshold operating point, or minimum energy point.

Fig. 1: Minimum energy point is usually slightly above threshold voltage. Source: Qian Yu/Arm

Near-threshold designs aren’t new to the semiconductor market. “Energy harvesting and always-on IoT devices use this technique to reduce power consumption substantially, thus extending battery life or harvesting capabilities,” says Mo Faisal, president and CEO of Movellus. “As architects scale their process geometry to smaller nodes, they face a new obstacle — process variation. While process variation has widespread effects, it particularly impacts clock networks, resulting in lower performance (Fmax) or power efficiency.”

But near-threshold approaches are becoming more widespread. “The promise of sub-threshold and near-threshold computing is enormous,” says Scott Hanson, CTO and founder of Ambiq. “Billions upon billions of endpoint devices are demanding more intelligence without compromising battery life, and sub-threshold and near-threshold computing offer a vital lifeline. However, chip design in sub-threshold and near-threshold regions has traditionally faced a few major obstacles — poor model-to-hardware correlation for simulation models, extreme sensitivity to process variations, and extreme sensitivity to environmental variations.”

Most of the problems relate to the calculation of switching times. “If a cell is transitioning from zero to one and you have a very sharp angle to the transition, then the delay time is very sharp,” explains Brandon Bautz, senior product management group director in the Digital & Signoff Group at Cadence. “But when we’re dealing with ultra-low Vdd, they ramp up and then slowly reach the threshold. The tail of the transition becomes very critical. This is why you need to have SPICE accuracy delay calculation to handle these problems. It’s not like 20 years ago, where you could just do a simple lookup of the delay. Today, to get the delay number, you literally have to simulate the current behavior and capture the interaction of the cell with the parasitic load. That’s true with any advanced design, but with ultra-low voltages the transitions are so long, you have to have very accurate modeling.”

Figure 2 shows the ratio of threshold voltage to operating voltage. Normally VT is quite a bit less than Vdd, but as you approach the near-threshold regime, or even go to sub-threshold, that’s where things become much trickier. Even if you are using the 20nm node, you have to treat it as if it were an advanced node.

Fig. 2: Ultra-low Vdd Technology. Source: Cadence

How much power can be saved? “Power can be cut anywhere from one-fifth to one-fifteenth of what it would be for a standard design,” says Priyank Shukla, staff product marketing manager at Synopsys. “However, along with the power reduction comes a performance reduction. If the performance degrades too far, the longer time spent completing a task will overwhelm the lower power, resulting in a net increase in energy consumed.”

One of the biggest problems in the past involved models, both in terms of accuracy and a technical difference. Near the threshold voltage, the waveforms become non-linear, making it necessary to update tools to account for this. It is ironic that early EDA tools considered only gate delay and ignored wire delays because they were insignificant. Today, the majority of the delay comes from the wires and tools evolved to deal with this. As we approach the threshold and devices slow down, this needs to be taken into account and some tools may not be up to the task.

New node challenges
With each new node, additional challenges must be overcome. The latest nodes are essentially becoming near-threshold, especially for companies in the mobile space that will use almost any technique to reduce power.

“The threshold of the device at newer technology nodes is getting closer and closer to the operating voltage,” says Cadence’s Bautz. “The technologies that have come to play in advanced node delay calculation are, in essence, near-threshold-type delay calculation. Variation is really the biggest challenge that we’ve had in terms of modeling. The industry’s ability to analyze silicon near threshold is a reality today, despite all of the challenges. To the overcome those challenges involved the introduction of new models, from a static timing (STA) perspective, certainly new STA technologies and pushing a lot of that technology back into place-and-route.”

Some of the largest nets on a chip are related to clocks. “Clock architects will protect against on-chip variation by guard-banding the clock and data paths for the worst-case corners and lowest voltages,” says Movellus’ Faisal. “However, designing for near- or below-threshold voltage requires advanced analysis. That takes immense resources, employee resources, and a ton of pessimism. Below the operating voltage, models deviate from Gaussian behavior and lack key parameters, such as sigma values and the distribution.”

Margin often has been used to deal with variation. “To save power, there’s a lot of high- VT cell usage in these designs, and those tend to be 300+ millivolts threshold voltage,” says Ankur Gupta, director of application engineering for the semiconductor business unit at Ansys. “That puts us firmly in the near-threshold compute domain, because you’ve got lower headroom. And now you are forced to design your margins down from 5% to 10%, which used to be the norm, to less than 5%.”

The steepness of the slope for transitions defines how much margin exists between a 1 and a 0. The steeper the slope, the greater the margin, making it easier to ensure that even with variation, the 1 and 0 ranges don’t ever collapse together.

Using high-VT transistors can help by moving the 1 and 0 states farther apart. “If you can use high-VT devices, then you’re in better shape overall,” said Bautz. “If you have to meet a performance target simultaneously and you go to low VT, or even ultra-low VT, that’s where you’re really seeing the maximum amount of variation. if you’re not operating at the super-threshold regime, I would certainly stay away from ultra-low-VT or low-VT cells.”

Variability becomes a bigger issue when operating near threshold, but this has become a problem associated with all advanced nodes. “The extreme sensitivities to process and environmental variations are much tougher problems to solve,” says Ambiq’s Hanson. “And they cannot be solved with a special transistor recipe or even a magic standard cell library. Instead, dealing with these extreme sensitivities requires a comprehensive approach spanning analog architectures, digital architectures, timing closure methodologies, and production test methodologies.”

Some of this is being addressed today. “The industry really evolved in terms of the need to model variation on a per-cell basis,” says Bautz. “There’s a whole history of variation modeling techniques in static timing. The industry needed a statistical model for variation. This was necessary for the advanced nodes — but we didn’t even know it at the time, to be highly applicable to ultra-low voltage analysis, as well. This is how the Liberty Variation Format (LVF) came about. The LVF model significantly improved previous variation models. Once we had that from a timing perspective, we could build up the algorithms in STA and in place-and-route, to support the highly variable nature of low-voltage design. And that, at least from a timing perspective, is the key evolution over the past couple of years that has enabled this style of design.”

Analysis has shown that performance variation, due to global process variation alone, increases by approximately 5X from 30% (1.3X) at nominal operating voltage to as much as 400%, (5X) at 400 mV. And that is not the only source of variation. Small changes in supply voltage can easily add another 2X to variability.

Barry Pangrle, principal engineer and power architect at SiFive, has previously stated that, “another challenge to running at such low voltages is the variation in the threshold voltages of the transistors themselves on a chip. If the variation is large enough, and the supply is very close to the nominal threshold voltage, some transistors might actually fall into the sub-threshold range, adversely impacting the timing of the design.”

The heart of the Liberty Variation Format is a statistical model of variation that utilizes moments. Because distributions tend not to be Gaussian, moment-based LVF provides three new parameters needed to describe those distributions. Those are mean-shift, variance, and skewness to the format (the first, second, and third statistical moments, respectively) (see figure 3). Others could be added as well: The fourth moment would be kurtosis, which deals with the distribution tail.”

Fig. 3: A non-Gaussian distribution with the first three moments indicated. Source: Synopsys

These enhanced models let EDA tools do a better job of taking real variation distributions into account when predicting signal delays and power. Characterization, however, must populate that new data for the models to be effective.

Other near-threshold problems
While this helps to solve the timing problem, there are other issues that face those trying to operate designs near the threshold. Rob Aitken, who was an Arm fellow at the time of this interview, describes the problem with memories. “A memory cell as provided by a foundry will not operate in the near-threshold region. If you need it to operate at a low voltage, you need a custom memory. This is because of the required stability of the bitcells. You are trading this off with area, access time, and stability. As you lower the Vdd on a six-transistor bitcell, the signal-to-noise ratio essentially goes to zero. In order to go below about 600mV, you need to augment the design, and that typically means separating the read and write functions within the cell, and that means you need to go to an 8-transistor cell or 10-transistor cell. That adds a lot of complexity. Most people will operate the memory at a higher voltage level and use level shifters.”

Bautz agrees. “The memories are still a sensitive piece of the design, and they are often still run at higher power domains. But that being said, we’ve done a lot of work on characterizing memories. We now have flows that can handle that, but the memories can still be sensitive to near threshold design.”

The variability also has a significant impact on clock tree design. When you lower the voltage, the delay increases, but the variability of the delay increases more. With a clock tree operating at nominal voltage, delays may be well-behaved, but as you drop the voltage, variability increases much faster. This can cause the clock tree to be unbalanced. For near-threshold designs, you need to keep the trees more balanced, and the standard technique for handling this has been by adding additional buffers to ensure the number of gate elements in each path is similar.

But there are more intelligent ways. “Instead of leaning into the pessimism, clock architects can use intelligent clock networks to limit pessimism in clock network design,” says Faisal. “Intelligent clock networks dynamically compensate for process variation and can remove as much as 30% of the cycle time pessimism from a typical near threshold design. With more useful clock period data being fed back to the design team, system architects can confidently realize their architectural goals with guard-banding for process variation in near-threshold designs.”

Retrofitting to older nodes
The work done on the latest nodes directly benefits older nodes. “You can now bring the operating voltage closer to the transistor threshold voltage to save power,” says Bautz. “In essence what you’ve got now is a legacy node, from a delay calculation perspective, that behaves like a very advanced node. But from a delay calculation perspective, those same models come to bear and enable that approach. This is something we’re starting to see at various fabs, and with customers, trying to push this envelope of threshold voltage to operating voltage.”

Not all designs make use of it. “Due to the breadth of solution required, I don’t expect to see near-threshold operation widely practiced across the semiconductor industry,” says Hanson. “However, that doesn’t mean that the technology won’t have an impact. On the contrary, I expect sub-threshold and near-threshold operation to have an outsized impact and be found in billions of devices.”

Leave a Reply

(Note: This name will be displayed publicly)