Near-Threshold Issues Deepen

Process variation plus timing are adding to low-power challenges at the most advanced nodes.

popularity

Complex issues stemming from near-threshold computing, where the operating voltage and threshold voltage are very close together, are becoming more common at each new node. In fact, there are reports that the top five mobile chip companies, all with chips at 10/7nm, have had performance failures traced back to process variation and timing issues.

Once a rather esoteric design technique, near-threshold computing has become a given at the most advanced nodes. In order to extend battery life and functionality—two competing goals—chipmakers have been forced to use every possible technique and tool available to them. But at 10/7nm and beyond, process variation and complex timing are creating new issues related to near-threshold approaches.

“The operating voltages for the low-voltage corners at 10/7nm are sub-600 millivolts, if not sub-500 millivolts,” noted Ankur Gupta, director of application engineering for the semiconductor business unit at ANSYS. “Then, to save power, there’s a lot of high-Vt cell usage in these designs, and those tend to be 300+ millivolts threshold voltage. That puts us firmly in the near-threshold compute domain because you’ve got lower headroom, and now you are forced to design your margins down from 5% to 10%, which used to be the norm, to less than 5%.”

All of this points to the fact that near-threshold computing is here today, he said. “It’s not anywhere in the distant future. It’s happening now. Why should I worry about it? We’ve been called in by the top five mobile CPU manufacturers in the last eight months or so because they have had performance failures, whereby chips designed for a certain frequency were measuring in silicon about 10% lower frequency than what they thought they were achieving.”

Accounting for this in the design can be exasperating for design teams. “I’ve gotten my models, I’m running all my sign off tools, I’m doing EM/IR, and the right timing checks,” Gupta said. “Why am I not seeing the right performance in silicon?”

There are two likely answers here. “One is process variation. When you go into near-threshold compute, the process variation effects are extremely non-Gaussian and they have to be very accurately modeled. Modeling in a standard file format like LVS is not accurate enough. It’s not silicon-accurate. Second, the impact of voltage and timing at near-threshold compute is significant.”

Simply put, power is becoming even more of a problem at each new node. “Near-threshold design brings new challenges because a lot of the statistical parameters become non-Gaussian,” said Oliver King, CTO of Moortec. “This means careful consideration needs to be given to the simulation results. In addition, by definition, near-threshold design means the design is close to not being functional at all, and so monitoring of process, voltage, and temperature become critical to ensure that adjustments can be made in the supplies to take changes in process and temperature into account.”

At finFET process nodes, supplies have been reducing quicker than the threshold voltages, which has led to less supply margin for circuit designers. “In addition to this, the interconnects are becoming thinner—and with increasing routing densities, which is pushing up resistance and capacitance,” explained Stephen Crosher, CEO of Moortec. “Compounding all of this are the dramatic increases in gate density, seen as we move down through the process nodes, which itself increases power per unit area.”

Process variability in manufacturing always has been an issue, and design flows have evolved to minimize design risk against such variabilities, often by designing for extremely pessimistic corner cases. “Furthermore, with the advent of finFET processes, and the fabrication methods to allow for the densities seen on current leading nodes, the process variation is manifesting itself in different ways,” Crosher said. “Still, due to the limited availability of production data on these nodes, it is too early to say we fully understand the localized effects of process variation.”

The first thing designers should understand is what models are available to them and the assumptions used to develop these models, reminded Leah Schuth, director technical marketing, Physical Design Group at Arm. “The two main formats to address variation are Advanced OCV (AOCV) and Liberty Variance Format (LVF). But these formats do not define the sigma values, the distribution, the moments (moments represent the asymmetry, or non-Gaussian behavior of the variation), and as such, the models used by any designer can vary widely based on the underlying assumptions and choices made by the group generating the models.”

“When you look at the distribution of process and voltage variation at voltages near-threshold, you see a non-Gaussian distribution. Existing models (such as AOCV and LVF) do not represent non-Gaussian behavior. However, the industry is aware of this, and Arm is helping drive new modeling parameters. Designers must be aware of what LVF models include, whether moments are modeled, and must take this into account when using LVF as part of their implementation. Using LVF without moments on low voltage designs can have significant impact on design yield and additional margin should be considered. Regardless of the supply voltage planned for a given design, the importance of the power grid cannot be overstated! Different FinFET processes have a varying interdependence between cell architecture and optimal power grid options. Some of the power grid challenges are related to strict and/or complex design rules. However, power grid design is a key design element to limit the impacts of the wide range of wire resistance across temperatures at small geometries and the significant increase in wire and VIA resistance from one FinFET node to the next smaller FinFET node,” she said.


Fig. 1: Near-threshold computing. Minimum energy point is usually slightly above threshold voltage. Source: Arm/Qian Yu

Designing with near-threshold computing
While near-threshold is definitely an option to lower power consumption, it can’t reduce the power of the wireless transceiver and receiver, said Andy Heinig, group manager for system integration at Fraunhofer EAS. “To reduce the power on these both components, the protocols have much more impact.”

Near-threshold approaches also aren’t free. They require a significant amount of analysis.

“At the larger nodes that we worked with previously, all of the distribution was exactly as expected,” said Seena Shankar, senior principal product manager in the Custom IC & PCB Group at Cadence. “It was pretty predictable, and we had these perfect Gaussian distributions. But now with the advanced nodes, we are having new challenges, and it’s mostly to do with very low and near threshold voltages. The operating voltage is ultra-low, and now we see a very different nature for variation. The statistical parameters now exhibit a non-Gaussian distribution. The sensitivity of the parameters to measurements are nonlinear, the distribution of the measurements are non-Gaussian, so we have a lot of challenges ahead when designing at near-threshold or low voltages. We have to figure it out how to handle all the non-gaussian distributions.”

This makes simulation particularly challenging. “Previously, we used on-chip variation, then we moved to advanced OCV, and then finally everyone has agreed on the LVF format, which is now in libraries to capture variation,” Shankar said. “However, with the non-Gaussian behavior of variation, we are having to look at new methods of generating variation data.”
 
Timing effects
Timing is not immune from the impact of near-threshold voltage. In fact, near-threshold voltage means there is a point at which the circuit starts to transition from a one to a zero or zero to a one, for example. And in full-rail voltage applications, the input of these circuits has time to reach the rail voltage, which is well above the threshold voltage, according to Ruben Molina, director of product marketing for StarRC Extraction and In-design Rail Analysis at Synopsys.

“The voltage is generally very linear while it’s crossing the threshold voltage, and it has time to settle at Vdd,” Molina said. “If the circuit is operating at 1 volt and the threshold voltage is at 0.6 volt, the edge as it’s transitioning that threshold voltage is fairly linear and it usually reaches the rail voltage and is stable. Now with circuits where the rail voltage is actually pretty close to the threshold voltage, these signals — especially if you’re trying to switch at a high frequency, like let’s say 1 GHz or something like that — are just barely starting to ramp up. It hasn’t even gotten to a sharp edge before it has reached this threshold voltage.”

In this way, the signal is not very linear. “It’s still kind of ramping up, and it continues in this fashion when it gets to the actual Vdd of the circuit,” he explained. “For example, some of the foundries are operating their 7nm designs at something like 0.55 volt. It’s not even close to 1 volt. So signals don’t really have a chance to transition all the way to the rail voltage before the circuit starts transitioning. When the circuit is transitioning, and the input is very shallow [the waveform looks very shallow], then any kind of process variations or any kind of variations, whether they’re voltage variations or process variations, have a much bigger impact on the operation of the circuit because the signal is still in this kind of ‘no-man’s land.’”

All of this has a pretty dramatic effect on timing, especially for circuits that haven’t really reached the state where the circuit has transitioned from a zero to a one. As such, it’s very sensitive in that area to things like noise.

“Again, this is something that could be a result of a voltage variation in the design and also noise that’s induced from other signals through crosstalk effects. So when somebody is trying to design for these near-threshold operational circuits, you can’t really treat these circuits as though they are digital. You’re really talking about modeling in digital tools the actual waveforms.”

This is where advanced waveform propagation techniques come into place. They are used to model the shape of the waveform because they can’t be treated like digital circuits anymore. They are a lot more analog-ish than they used to be.

Accounting for near-threshold
Based on the magnitude of the impact of near-threshold voltage, it is now imperative that design teams deal with this from the very start of the design.

“Let’s say you’re building a chip that had 1 million bits in it, 1 million bit cells or storage elements,” said Deepak Sabharwal, vice president, IP engineering at eSilicon. “Each bit cell is 6 transistors, so you have 6 million transistors representing the storage sitting on every chip. Now you think about designing the circuit so that whatever variation is going to be occurring across these 6 million transistors should all be covered in the margining. You can’t expect that the foundry is going to manufacture these 6 million transistors and they are going to be identical. It’s impossible. So if these variations that occur during the manufacturing, which leads to variations in the strengths of these devices, both in terms of saturation current as well as the threshold voltage, that would now give you some transistors that are way weaker than, let’s say, the middle of your population.”

With normalized Gaussian distributions, most of the data points fall in the middle, with some outliers around the edges. That determines how much margin to include, with the final number based both on the designer’s experience along with the data from models and tools.

“At the outset, what the designer does is determine what could be the strength in a tail bit device and then put in the margin so the chip can still be successful if it encounters a tail bit cell,” Sabharwal said. “Experience plays a huge role here. Today, the foundries give you models of devices. They represent corner models, and in the past you were told the corner models are capturing the extremes of what they are going to be manufacturing for you in device. But that is no longer the case. Today, you have the corner models, and you also get variation models of two types—global and local. All of these things add up, and this kind of analysis is all experience-driven. You have to make sure you put in enough margin that you can sleep at night, and you also have to make sure you’re not killing your product by putting in too much extra area that you don’t need to put in.”

At the end of the day, near-threshold voltages come down to living on the edge, said Magdy Abadir, vice president of corporate marketing at Helic. “With near-threshold voltages, everything is like living on the edge, and the margin for making errors are not designer errors. The design team is not to blame. Errors have more to do with the models they are using, and the tools they’re using. Also, the process technology is not perfect in the sense that what the manufacturer say they are going to build comes out to look differently. There are these variations.”

These variations are not operating in the center of the lane, either. They are actually operating on the edge of the lane, whether from a power perspective, a performance perspective, or both.

“You’re operating at the edge, which means one little slip and you fall off. This is especially true with timing,” Abadir said. “Timing errors are catastrophic and they’re not like power errors. When I estimate the power consumption of this particular block is X, using poor models because things were not perfect, the actual power number may come in differently. But because errors can sometimes can go both ways and power is a summation, when you sum up the total power consumption of all the devices and all the blocks, chances are you will get some pluses and minuses. Some might be a little off, but if you margin, you might be okay. With timing, it is not the same. With timing you’re dependent on every path making timing within the clock period that they’re trying to target. If one of them misses, or has a problem with the modeling—or with crosstalk or from the electromagnetic from whatever source, or from process variation—one of them will make the clock have an error where it would miss the signal. When that happens, the signal will not arrive in the right time and you get the wrong value, and you will have to slow down the clock to catch this late signal. One slip can cost you in frequency in the whole design. Even in billions of passes, all it takes is one of them to make a mistake to have a bad model, to have bad EM crosstalk, to have whatever the case might be, and there are plenty of reasons for this to happen.”

EDA’s burden
This puts a big emphasis on extremely accurate—think Monte Carlo accuracy– high-performance process variation prediction on the order of 100X faster than SPICE prediction, said ANSYS’ Gupta. This kind of technology creates transistor models with a level of accuracy within 2% of Monte Carlo SPICE, so that tens of thousands of critical paths can be run and true silicon behavior can be analyzed to understand what the true margins are.

Moortec’s Crosher added that accurate PVT monitors are key to implementing design optimization. “We all know the relationship between power consumption and supply voltage of CMOS logic. Being able to reduce the supply by even a few percent based on that particular die’s process point, also combined with what the environmental conditions that allow, will result in power savings worth having. The same is true with throughput performance, if a given clock speed can be met with a lower supply.”

Finally, for interference such as noise, Synopsys’ Molina, said one option is to make the power grid as robust as possible. “That may mean over-designing the width of the power grid in order to minimize dynamic IR drop issue. Some people would go so far as to try to minimize their dynamic IR drop by skewing their clocks, so that not all of the circuits are transitioning at the same time, not all the flip-flops are switching simultaneously. That spreads out the current demands of the circuit and allows less dynamic IR drop. If you’re trying to operate near the threshold voltage to save power, I don’t think there’s a lot you can do at the circuit design/standard cell design level to help that. It really is a challenge in terms of the modeling, which treats these signals as though they’re almost analog in nature.”

While there is a significant amount of certification on the foundry side, it really falls on the EDA vendors to try to help the designers to make sure that they’re capturing these effects properly, he concluded.



Leave a Reply


(Note: This name will be displayed publicly)