Corners Up, Margins Down

Complexity and less room for error are making it harder to account for all the corners; solution involves more selectivity

popularity

By Ed Sperling
Complexity, less room for error and concern over adding any extra wires or circuits into chips because it may boost power consumption or affect the thermal profile are making it more difficult to tackle all the corners on an SoC.

The problem gets worse with mixed signal chips, where the corners are far less definable. And it gets even more complex when it comes to turning on and off multiple power islands inside an SoC because there are often surges when the power comes back on.

“At 90nm and 65nm, all you had to deal with were the process corners like fast-fast and fast-slow,” said Amit Gupta, CEO of Solido. “Then you had to deal with temperature variations and high-low voltages, but that was still only about 20 different combinations. Now it’s increasing to hundreds or even a thousand. You’ve got process corners and environmental variations.”

So what do you do about all of these? Let’s take a look.

Corner reduction
A corner describes a boundary, and most times blocks within chips, or the chips themselves, don’t ever cross that boundary. The emphasis is on most times, because much of the current thinking about how to deal with corners is with statistical averages. A massive voltage spike can still kill a chip, but most times it never gets that high.

To some extent this is based upon application. In a high-reliability chip like a missile guidance system or an industrial temperature sensor, all corners must be addressed and redundancy is required. In a consumer application addressing all the possible corners is unnecessary and redundancy consumes battery power.

Tom Quan, deputy director of design methodology at TSMC, said one solution is to describe these corners statistically, which he said is more consistent and practical. “Looking at it as a corner is a more pessimistic approach,” he said. “Most times you really are not pushing into the corners.”

Restrictive design rules help in this regard because they determine layouts and minimize variation.

“When you’re printing a line, as the process gets smaller printing the gate gets harder,” said Quan. “You have variation associated with it, so at 40nm you may actually print 42nm. If you draw with a bent poly the variation gets even larger. With restrictive design rules, you have fewer corners, less turns and more lines are straight. Printing is easier and variation is less.”

Once that is understood, the next best step is to measure, quantify and understand the margins that exist in current designs.

“Some of this is through timing analysis tools that consider multiple corners, environment variation, noise/IR drop, etc., but another important chunk is through silicon margin validation using at-speed test techniques, record keeping on failure bins, and analysis on failing parts to identify weak points in the existing margining process,” said an ARM spokesman. “Beyond this, there can be a benefit to using methods that take into account variation in transistors and wiring (e.g. various advanced OCV approaches, or even statistical analysis of critical circuitry). This can again give insight into the types of margin in a design, and can also be employed in adaptive methods that take advantage of the fact that most silicon is not worst case and has much more margin than exists in sign-off corners, and that this margin can be exploited for power or performance gains by adjusting voltage and frequency.”

The spokesman noted that some types of circuits can be modified using Razor-type approaches—speculative execution designed to push margin boundaries by adapting to slow-moving variation such as temperature or load changes while surviving fast-moving variation such as jitter or power glitches.

“Each of these obviously has a cost/risk/reward profile that needs to be evaluated for a given design and places certain requirements on the methodology and IP used, both at the physical and RTL level. Thinking outside the chip can also help—changes in external power supply requirements, for example, can affect the margins needed on-chip, and careful understanding of temperature can lead to cost or reliability improvements,” the spokesman said.

Accuracy, accuracy, accuracy
At least part of the challenge in dealing with corners is also understanding how the pieces that might affect that corner are behaving—with far greater accuracy than in the past.

“Accuracy is at the center of all of this,” said Juan Rey, senior engineering director for Calibre Engineering at Mentor Graphics. “There is far less guard-banding allowed at advanced designs.”

He noted that the allowable variability at 28nm and 22/20nm is about 1% to 2%. “It keeps getting more stringent.”

So does an understanding of exactly what corner cases need to be addressed. Solido’s Gupta said brute force approaches don’t work anymore. “You need to intelligently pick out the ones that matter and fix them,” he said. “There are two ways the foundries model variations, five corners and random variations. But if you do a true statistical distribution for local and random variations using a Monte Carlo analysis it’s very time-consuming.”

A better approach is understanding exactly what needs to be measured, how it will be measured, how those corners will be addressed and how to check for ramifications in other parts of a design once those fixes are made. And after that, it’s all statistics.