Low Power Meets Variability At 7/5nm

Reductions in voltage, margin and increases in physical effects are making timing closure and signoff much more difficult.

popularity

Power-related issues are beginning to clash with process variation at 7/5nm, making timing closure more difficult and resulting in re-spins caused by unexpected errors and poor functional yield.

Variability is becoming particularly troublesome at advanced nodes, and there are multiple causes of that variability. One of the key ones is the manufacturing process, which can be affected by everything from variation in equipment—no two EUV scanners are exactly alike, for example—to impurities in materials such as substrates or gases. It also can happen in manufacturing processes, such as cleaning of wafers or depositing of materials. All of this has an impact on how a chip functions, and with tighter tolerances at each new node due to higher transistor density and thinner insulation layers, the impact of variability is growing.

“Process variability is a key consideration for advanced node designers,” said Stephen Crosher, CEO of Moortec. “Worst-case simulation work will only get you so far, as assumptions are made on uniformity across the design at each PVT corner. However, in silicon we’re seeing stark thermal, IR drop and process regionality across the die, undermining assumptions previously made in simulation. This can affect timing performance regionally across each die and therefore impact overall functional yield. Schemes are emerging, attempting to compensate for this regional variability. The majority of schemes are dependent on understanding process spread and dynamic conditions across the die. Hence, we are seeing today the imperative nature of having accurate, well-distributed monitoring fabrics on chip.”

Technically, timing variability is a statistical matter. “In the past we would assume worst case everywhere just to be safe, since true statistical methods were too expensive for production deployment,” said James Chuang, product marketing manager for PrimeTime at Synopsys. “Parametric on-chip variation (POCV) was a great invention 10 years back that created a practical statistical model with feasible turnaround time and collateral requirements. POCV today is the industry standard to model timing variability of high-voltage threshold cells. Most, if not all, engineering teams designing low power SoCs at 16/7 and below are using POCV to signoff with great silicon success.”

This is not to say POCV is perfect. The methodology has full statistics of passing rate for each timing path, but it doesn’t calculate the statistical correlation between the billions of paths in an SoC, where any single failed timing path will result in a failed chip and lower yield. For example, if each of the starting players on a football team had a 50% chance of showing up to a game, the probability of having a full starting lineup on a game day could be zero, Chuang said. However, statistical correlation to calculate design yield requires exhaustive Monte Carlo simulation, which could take years to complete on large-scale SoCs—one of the top reasons SSTA was not broadly applied.

Many designers are aware of this and choose to add significant margins globally as a preventive measure, but that approach is making PPA closure extremely challenging at 7nm and below. Recent breakthroughs in machine learning and static timing analysis (STA) technology now allow designers to do full-chip Monte Carlo simulation several orders of magnitude faster than before by combining Parametric OCV and ML-driven Monte Carlo simulation results. Those combined simulations can be performed for designs of any size within a few hours if not minutes, all while using the same input collateral as POCV for immediate deployment.

Beyond POCV
POCV is not the only OCV option, though. Marc Swinnen, product management director at Cadence, agreed that timing variability has become more pronounced at the lower operating voltages typically found in low-power designs. Moreover, the usual techniques for reducing variability—increasing buffer drive strength and increasing positive slack margins—are antithetical to the goal of achieving low-power.

Over the past couple of years, as ultra-low voltage (ULV) designs have become more common, EDA tools have introduced several techniques and capabilities at the intersection of implementation, voltage drop analysis, and STA to address this variability challenge.

The first response from the industry, Swinnen explained, has been the broad adoption of statistical on-chip variation (SOCV) for STA at 7nm and below, supplanting the older, less accurate advanced on-chip variation (AOCV) technique. “Variability is statistical in nature and SOCV timing acknowledges this reality by representing all timing values in the design, as well as in the library, as normal (or ‘Gaussian’) probability distributions characterized by a mean (m) and a standard deviation (s),” he said. “SOCV uses a statistical timing engine to correctly combine and propagate these statistical quantities along a timing path. This is much more accurate than AOCV’s use of variability margins that are derated based on simplistic stage-counting.”

Second, he said, SOCV can accurately capture a component of variability that emerges at ultra-low voltage. Delay variability becomes asymmetric (or non-Gaussian) at low operating voltage. “The asymmetry in the delay probability distribution means that it is more likely that the real delay will be greater than the Gaussian mean, and less likely that it will be smaller. Modeling this effect is critical for advanced-node, low-voltage designs at 7nm and 5nm. It can be captured in Liberty timing libraries through proper characterization and inclusion of the distribution “skewness” (also called the third moment) for each timing arc. This modeling enhancement has been standardized and certified by the Liberty Technical Advisory Board.”

A third impact of variability on low-power designs is caused by the extreme sensitivity of timing to supply-voltage variation in low-voltage designs. For example, the delay of a buffer operating at 1.2V will be quite insensitive to small changes in the supply voltage (also called IR-drop), Swinnen said. “However, that same buffer operating at 0.7V will show a high degree of delay variation with even small IR-drop. The usual dodge of simply increasing the supply voltage margin is, of course, a non-starter in this situation. The solution being adopted by the industry is to, once again, acknowledge the presence of this variability and make STA timing IR-drop aware.”

By accounting explicitly for the real, measured IR-drop, STA will reveal which paths can tolerate the IR-drop and which ones require optimization and fixing. This is a much more surgical approach than across-the-board max-voltage drop margins that penalize all paths regardless of the true impacts on timing.

A key consideration for setting up an IR-drop aware STA flow is that the timing libraries must be characterized at multiple supply voltages to capture how the delay varies with voltage, he said. For example, a nominal 1.0 volt library may need to be characterized at 1.0V, 0.9V and 0.8V.  The STA tool then needs to apply interpolation techniques to calculate the cell delay at, say, 0.932V. or whatever the actual measured voltage is at that cell.

The amount of analysis needed for advanced node designs, especially when factoring in large numbers of modes and corners, can make for a challenging flow to timing closure and signoff, asserted Dave Pursley, product management director at Cadence.

The problem is that various tools in the chain (especially place and route, timing signoff, and power signoff) may each do their analyses slightly differently, causing many closure iterations that may or may not converge. Inconsistent analyses can even necessitate a re-spin of silicon, he said.

To avoid that, especially for advanced node designs, a flow that uses common MMMC analysis engines throughout the flow ensures a convergent path to signoff, Swinnen said. “For example, potential changes to power rails or driver sizes for power signoff can be evaluated for timing closure at the same time and in the same flow to achieve rapid predictable convergence to signoff.”

The impact of low voltage
From a high level, variability always has existed in manufacturing processes. But as features shrink, they are more susceptible to thermal damage due to thinner insulation layers, static current leakage and dynamic current density. To minimize those impacts, chipmakers lower the voltage as far as possible. But that creates other issues as tolerances shrink and noise from power, electromagnetic radiation and a number of other physical effects begin to impact signal integrity and timing.

In fact, the impact of dynamic voltage drop on timing has become a key metric, noted Vic Kulkarni, vice president and chief strategist for the semiconductor business unit at ANSYS. “If you look at the progress with respect to the supply voltage, the supply voltage is going down dramatically, but the threshold voltage is not. So the Vdd minus Vt is a very difficult thing to achieve now. At 16nm, the Vdd and Vt, there was enough margin in the supply voltage and the threshold voltage. You could play around with it. You could make some approximations for the designer. However, now when the delta of Vdd minus Vt comes down so dramatically close to Vt—to near threshold operation—these issues start affecting you. The beautiful ramp, which you used to see as a designer, is not a straightline ramp anymore. It has a non-Gaussian kind of distribution even in the RAM cycle. Then, as you move toward lower and lower Vdd, the non-Gaussian at low voltage takes over, which means the variabilities start becoming severe. It’s a very important phenomenon. It’s what the physics is doing.”

What this means to the designer is that what is predicted versus what is achieved starts increasing in terms of the frequency of the design, which they wish will be accomplished, he continued. “For example, people who are doing 2.1GHz-type designs now are ending up with 1.8GHz,” Kulkarni explained. “They’re literally leaving 300MHz on the table because to compensate for this, they are now creating a lot of margins. To be ‘safe,’ designers may start margining with 1% or 2% here or there, but cumulatively it’s not uncommon for margins to reach 10% or 15% of the design. This is an expensive proposition at 7nm.”

In spite of this margin, silicon is still failing specifically from these physics effects, which have an impact both locally and globally.

“The global design weakness happens with individual CPU cores because they are firing at different times. Hotspots are changing, or alternately, the instance level rail collapses with localized di/dt type events. So you’re now dealing with a double whammy. You have localized and you have global issues because you don’t know the sequence of events of these large blocks changing and switching. People are perplexed on how to manage this,” Kulkarni said.

The solution depends on being able to do concurrent chip-aware system design and system-aware chip design in order to answer questions about the root cause of the problems and how to solve it each step of the way.

“People have to work together,” he said. “It’s not a final signoff anymore. The final integration and the final static timing analysis has to go back to block place-and-route and so forth. You can be in the loop here forever. Sometimes it can be 8 to 10 days in your design cycle because it becomes a Whac-A-Mole problem. You fix something, but something else pops up because root cause versus effect is not a very linear kind of behavior. Then there are outliers. These outliers can cause hotspots, and a massive amount of data is needed to visualize this. You can miss these if you do too small of a localization/visualization, so you need to see the bird’s-eye view.”


Fig. 1: Simulation is becoming much more complex at advanced nodes and in packages. Source: ANSYS

These issues are not strictly for digital designers. “Everyone now is embracing going down to the finFET nodes,” said Geoffrey Ying, director of marketing for AMS products at Synopsys. “Even the analog guys are going that direction. Supply voltages are continuing to drop, but the transistor threshold voltage doesn’t drop as much. That really puts a big problem onto the headroom, so there’s very little margin for the transistor to operate under. Because of this, and due to variation and reliability, there really isn’t much to work with. So besides the variation problem, we’re looking at EM IR drop and device aging effects. These effects are still related to variation, because you need to consider both together. Basically, the aging effect has to be considered together with variation.”

On the flip side, designers also are faced with increasing low-power requirements. “You can design a very robust chip, but if it consumes too much power, it’s not going to cut it.,” Ying said. “Performance requirements continue to be very high. If you can make the chip bigger then, you will avoid some of these problems. But obviously you cannot afford to do that because all of this will translate into the cost of your chip, your competitiveness, and finally time to market. You can’t take too much time over-analyzing these effects or you can’t deliver the product in time,” he explained.

This pushes designs right up to the edge of failure. “You don’t want to go over to cliff, but you don’t want to be far away from it because of cost so timing closure is also very much impacted by this variation,” he noted.

Add to that list of concerns aging models at advanced nodes, and the whole design process gets even more complicated. Pre-stressed and aging-aware standard libraries are becoming more common. So are EM-aware libraries that consider current density and the activity level to be safe within EM requirements.

Conclusion
To help design teams better prepare to meet timing closure from the outset of their advanced-node design, they would do well to keep in mind the small amount of headroom between the supply voltage and the threshold voltage.

“The designer has to take that factor in mind, and really plan out their power and ground supply before they work on the building blocks,” Ying said. “In the standard cell library, they need to have a way to specify the power budget for each of the standard cell libraries before they think about the bigger picture. They also must consider the power and other related effects, such as aging and EM. Having a standard cell library that has taken that into consideration really helps to close the timing easier. It’s still not easy, but at least to help point them in the right direction.”

Related Stories
Power Delivery Affecting Performance At 7nm
Slowdown due to impact on timing, and dependencies between power, thermal and timing that may not be caught by signoff tools.
Variation In Low-Power FinFET Designs
Old solutions don’t necessarily work anymore, particularly at advanced nodes and ultra-low voltage.
Variation Issues Grow Wider And Deeper
New sources, safety-critical applications and tighter tolerances raise new questions both inside and outside the fab.



Leave a Reply


(Note: This name will be displayed publicly)