A lot of changes had to come together to make near-threshold computing a technology that was accessible to the industry without taking on huge risk.
The emergence of the Internet of Things (IoT) has brought a lot of attention to the need for extremely low-power design, and this in turn has increased the pressure for voltage reduction. In the past, each new process node shrunk the feature size and lowered the nominal operating voltage. This resulted in a drop in power consumption.
However, the situation changed at about 90nm in two ways. First, nominal voltage scaling started to flatten, and thus switching power stopped scaling. Second, leakage current became a lot more significant, and for the smaller nodes even became dominant. Both of these made it difficult to continue any significant power reduction for a given amount of computation without incorporating increasing amounts of logic designed to manage and reduce power.
The introduction of the finFET at 16nm has improved both operating voltage and leakage but there are significantly increased costs with this node, meaning it is not amenable to designs intended for a low-cost market such as the IoT. There do not appear to be any signs that foundries will implement finFETs on older process nodes, though, so other avenues have to be investigated in order to get the necessary reduction in power.
Power Consumption is a quadratic function of voltage, normally stated as power is proportional to CV²f. As the voltage drops, you get a significant power savings at the expense of performance. Because many IoT designs have a long duty cycle, this is often an acceptable tradeoff, but there are further complications. Total power is a combination of static or leakage power and dynamic power. As the voltage is dropped toward the transistor threshold voltage (Vt), switching power decreases but at the same time leakage current increases. This means the optimal combination of leakage and switching power has to be found. Reducing the voltage below a certain power will result in leakage increasing faster than switching power is decreased and performance will also be degraded. The optimum operating point is usually slightly above Vt and is called the near-threshold operating point or minimum energy point.
The second problem is that the process libraries, as released by the foundries, are designed to operate at some nominal voltage and they may not guarantee operation of those devices any lower than 20% below the nominal operating voltage. Beyond that, variation plays a larger role, thus making the design process a lot more difficult.
Another problem has been memory. While logic can be scaled without too much difficulty, memories, and SRAM in particular, require higher voltages for read write operations to be reliable even though they can be dropped to lower voltages for retention. This has made near-threshold computing difficult.
A final area is that EDA tools have not been optimized for this type of design and in many cases, the available models do not provide the right amount of detail in order to be able to find the optimal operating point.
Today, we are seeing solutions to all of these areas, although it cannot be considered to be an easy path to follow.
There is no design today that could be done without a heavy dependence on EDA tools, and all of these tools rely on models. “Having an accurate transistor model is the key starting point for all tools,” says Hem Hingarh, vice president of engineering at Synapse Design. “One of the issues is that sub-threshold region models are inadequate. When transistors are operating near threshold they are going to take longer to drive loads. Near the threshold voltage, the waveforms become non-linear making it necessary to update tools to account for this. Most of the timing analysis tools today assume that delay is RC-dominated, but when considering subthreshold operation, gate delay dominates timing.”
One way to overcome some of these limitations is to restrict usage to a small number of cells in a typical library and only use those that are well understood and characterized.
All designs have to deal with a certain amount of manufacturing variability, but operating near the threshold voltage amplifies many of these effects. Dreslinski et al. analyzed variation in their paper titled Near Threshold Computing: Reclaiming Moore’s law through energy efficient integrated circuits. They state that “performance variation due to global process variation alone increases by approximately 5X from 30% (1.3X) at nominal operating voltage to as much as 400%, (5X) at 400 mV.”
Hingarh says that because of this variability, “designers must think very carefully about variation-tolerance in their circuits.” The alternative is to add larger margins, but this removes some of the gain of going towards the threshold voltage as it unnecessarily increases the leakage component. “These timing errors become more frequent in smaller gate lengths such as below 40nm CMOS, where process variations are high, but 65nm or 130nm CMOS technology can be used more successfully. In addition, FD-SOI or CMOS technology with back bias capability can help because back biasing can be used to reduce Vt variation.”
Another problem related to variation is the timing variation that can be caused by small changes in supply voltage. This means that IR-drop has to be very carefully assessed otherwise timing could easily move outside of the characterized range. Dreslinski estimate that this adds another 2X to the variability.
During a lunch sponsored by Cadence at the recent Design Automation Conference, Jayanta Lahiri, vice president of engineering for the physical IP group at ARM, pleaded for additional development of memory bitcells. “While the logic voltage has gone down, the bitcell voltages continue to remain quite a bit higher. We are talking about memories that require 0.85V and the periphery is operating at about .55V. That leads to a huge design challenge, and we have to deal with signals going from one domain to the other. We need to find ways and means to bring down the voltage of the bitcells.”
While the foundries may not be providing solutions today, others are looking at the possibilities. “If you want a DSP core and a memory operating on the same voltage rail and to be able to scale the voltage all the way down, then normally the memory spoils the party,” says Paul Wells, CEO sureCore. “So we wanted to design a memory that could scale down to 0.6V because at that point the logic can continue working albeit at a very low frequency.”
SureCore has developed some smart assist techniques that allow the bitcell, which is only specified for retention at 0.6V, to be fooled into thinking it is in higher voltage environment for the purposes of flipping the cell for a write or detecting the voltage on the bitcell in the standard manner using a sense amplifier for the read process. “That means that we can effectively operate the memory at 0.6V at a low frequency of operation of between 15 and 20MHz. Then, when the system wants to do something more interesting and needs more horsepower, you can ramp the voltage back up to the nominal 1.1V and can operate around 300MHz.”
While it may be the IoT that is spurring development, there are other potential markets that could benefit from this technology. Vinod Viswanath, R&D director at Real Intent, pointed to neuromorphic computing. “The term ‘neuromorphic’ is used to describe mixed analog/digital VLSI systems that implement computational models of real neural systems. These systems directly exploit the physics of silicon and CMOS to implement the physical processes that underlie neural computation.”
Viswanath said the brain has on the order of 10^11 neurons and 10^14 synapses. “It performs on average 10^15 operations per second. The power dissipation of the brain is approximately 10^-16 J per operation, which results in about a total mean consumption of less than 10 watts. By comparison, today’s silicon digital technology can dissipate at best 10^-8 J of energy per operation at the single chip level. There is no way of achieving similar operations per second on a single chip with today’s technology. But even if this was possible, to do that amount of computation a digital chip would consume megawatts (the output of a nuclear power station). Any serious attempt to replicate the computational power of brains must confront this problem.”
Viswanath noted that subthreshold analog circuits are also no match for real neural circuits, but they are a factor of 10^4 more power efficient than their digital counterparts.
“The main reason to choose subthreshold circuits in neuromorphic computing is that in the subthreshold region, the drain current of the transistors is related to the gate-to-source voltage by an exponential relationship. At high levels of abstractions it is possible to model a neuron’s response by using a transfer function that maps the input current it receives into the frequency of the spikes it generates. If the mapping is linear even a single transistor can implement such a model. Neurons of this type are called linear threshold units. If the mapping is sigmoidal, we can implement models of neurons using a differential transconductance amplifier (which is a differential pair subthreshold circuit with a differential voltage input and a differential current output). The transconductance amplifier operating range is linear for small differential voltages, and exponential as the variation increases, modeling a perfect hyperbolic spike.”
While it has been proven the both subthreshold and near-threshold computation is becoming a lot more practical that it was in the past, it remains a technology that is adopted out of end-application need and is not something to be taken on lightly. Expect lots of surprises and patchy tool support unless you can persuade a large number of other designers to follow you down this path. Then you may catch the eye of both the foundries, which could do a lot to help, and the tool supplier, which could add the necessary model accuracy and highly specialized needs into the tools for this type of design.