At the latest nodes, it is becoming impossible to analyze IR drop correctly, leading to chip-killing problems.
The technology evolution in semiconductor manufacturing has led to chips with ever-higher power densities, which is leading to serious problems with on-chip power distribution. Specifically, the problems surrounding voltage drop—or IR drop (from V=IxR)—have become so acute that we have seen multiple companies starting to get back dead silicon from the fab.
For example, a recent 7nm chip designed to run at 3GHz failed to get above 2.7GHz in silicon. The failure was due to excessive IR-drop on the power and ground supply lines that remained undetected despite passing all signoff tools checks and methodology recommendations. It is this unpredictability that is raising concerns in the design community indicating that we need to reconsider our approach to IR drop signoff.
On a detailed level, IR drop causes chip failures because timing delays of standard cells and macros slow down dramatically if their supply voltage is inadequate. This effect is not new, and designers have been dealing with it for years. But something has changed in manufacturing that is making existing verification methodologies obsolete. The main culprit is the dramatically increased resistance of semiconductor interconnect, which has seen an almost 10X increase from 28nm to 7nm. And this trend is expected to intensify for nodes below 7nm. By contrast, the capacitance has seen very little change across recent nodes. The other contributing factor is the heightened sensitivity of advanced node libraries to variations in supply voltage, particularly in ultra-low voltage and near-threshold operating regimes. High-Vt cells suffer particularly badly from this. We have seen cases of up to 25% variation in delay for a swing of just 10mV at 0.5V.
The increase in wire resistance has also conspired to invalidate traditional techniques for mitigating IR drop. Traditionally IR drop was kept in check by over-dimensioning the power grid and by adding decoupling capacitors. But over-dimensioning is a brute-force approach that is becoming much too expensive in PPA (power, performance, area), and decoupling caps are not as effective anymore because heightened resistance has made IR drop a very local phenomenon. Indeed, there is so much resistance between a local problem and distant capacitors that the time constant for any current surge is too slow to help local, instantaneous dips. Increased resistive shielding makes global solutions like over-dimensioning and decoupling caps less effective. Another consequence is also the effects of local aggressors—near-by standard cells that cause the local voltage to dip when they switch—is accentuated and exerts growing importance over the IR problem. Without understanding the impact of these local aggressors, it is becoming impossible to analyze IR drop correctly.
Today’s voltage-sensitive libraries mean that there are certain paths that are inherently voltage-sensitive because of the combination of standard cells, slews, and loading that they contain. And if just the right set of local aggressors all switch at the right time, there will be a significant local dynamic voltage drop which will cause these path delays to be wildly different from standard delay calculations that fail to consider this specific activity pattern around this specific path. Such paths can descend into timing failure even if they originally had plenty of positive slack and were on no one’s radar as a ‘critical’ path. In the IR timing failure example mentioned above, the culprit was shown to be a path that was not timing critical at all. In fact, it was only ranked past the 200,000th rank of critical paths.
That is why we are seeing some designs at 7nm and below that pass all traditional IR signoff methodologies with flying colors fail in silicon on the testbench.
Given this analysis, the outlines for a solution present themselves:
The solution methodology suggested above is not a completely new idea—the concept of timing-aware IR drop has been around for a decade or more. But the sheer volume of data that needs to be pulled together in one place for this (complete layout, full STA with SI, all timing window info, dynamic IR drop simulation, vectorless data generation, etc.) has made the approach impractical, particularly if P&R, IR drop, and STA are being done by 3 different tools from different vendors. So, over and above the detailed algorithmic challenges embedded in this vision, there is an overriding urgent need for a deeply integrated full-flow solution.
I am confident that the EDA community will rise to this challenge as they have done so many times in the past, but designers will have to follow and learn to design with a smarter, more integrated approach to power supply distribution and integrity.
Leave a Reply