Once a manageable effect, voltage drop is causing more problems at lower nodes.
At one time a relatively infrequent occurrence, voltage drop is now a major impediment to reliability at advanced nodes. Decades ago, voltage drop was only an issue for very large and high-speed designs, where there was concern about supply lines delivering full voltage to transistors. As design margins have tightened in modern advanced designs, controlling voltage drop has become a requirement for foundry sign-off.
As the industry keeps pushing into lower nodes, physics keeps pushing back. A cascade of problems starts with an elementary effect in that advanced designs have thinner wires, which increase resistance. That, in turn, leads to voltage drop and performance degradation. The problem is further complicated by demands for increases in switching speed. This requires current to be delivered more quickly, which requires a higher current, which once again leads to voltage drop.
There are two sub-categories of voltage drop. Static voltage drop occurs when a circuit is off. This is often caused by gate channel leakage. Dynamic voltage drop occurs when transistors are switching. This effect has moved from artifact to design challenge.
“Voltage droop was always an issue, but it was masked by the margins that we used to be able to afford, in the world where you could over provision,” said Lee Vick, vice president of strategic marketing at Movellus. “Now, all the low hanging fruit is gone. It’s become a game that the customer has to play. Do I want to be super aggressive and design with wafer-thin margins? Or do I want to be conservative, knowing that if I go too far, I lose competitiveness? It’s a very delicate balance.”
Fig. 1: Solving for voltage droop. Source: Movellus
Drop versus droop
It turns out there’s no agreement on whether “voltage drop” and “voltage droop” are two different terms with distinct meanings or if “droop” is just a typo. Some engineers say they’re interchangeable terms. Others claim precision, that drop is an unexpected artifact of performance while droop is intentionally designed in. Or, that drop should be used only for digital and droop only for analog, given the packetized breaks in digital signals and the continuous curves in analog.
“I’ve seen circuit descriptions in which droop referred to the source and drop referred to the external elements in the path of the flow,” said Guy Kelly, a retired communications engineer who’s heard these arguments for over half a century.
Still the arguments continue, as the terms are debated by everyone from perplexed engineering undergrads to confident industry veterans. “People also use ‘IR drop,’ as a synonym for voltage drop, said Preeti Gupta, director product management at Ansys. “Yet even that is incomplete because the inductance and the capacitance are not part of that IR.”
In the design workspace, syncing terms is critical. “Even if you want to quibble over definitions,” said Kelly, “What’s essential is that your team, your customers, and your vendors all use the same term to mean the same thing.”
Power, timing underscore importance of voltage drop
Whether called drop or droop, its effects are growing, because an energy-conscious industry now demands that chips operate with as little power as possible.
“As we’ve moved towards low power design, power has become an imperative,” said Marc Swinnen, director of product marketing for the semiconductor division of Ansys. “There’s been very intense pressure to lower the voltage as low as you can. And now you have these ultra-low voltage processes from TSMC and others, but the result is that you can no longer afford to lose a tenth of a volt on the way. You can barely make the transistor work with the supply voltage as it is, never mind losing any of it on the way to the transistor.”
There are additional problems as a result of voltage drop, such as violations in timing. “What has become prominent in the last six to eight years is the impact of voltage drop on the implementation flows and timing analysis flows,” said Manoz Palaparthi, senior staff product manager at Synopsys. “Specific to timing analysis, the voltage drop impact is in two places. When you’re doing timing analysis, the tables that come from the foundry have the ideal voltage point. But in reality, when the chip operates, you don’t see that point. What customers do in response is put in some margin, but that margin set by the user is a problem in two ways. If the user sets the margin very high, they are constraining the design from operating at its full potential. On the second side of the coin, if their margins are not enough, there’ll be some timing escapes, so their silicon might fail.”
Adjusting for voltage drop can lead to other problems, Heidi Barnes, senior applications engineer at Keysight explained. “Voltage drop throws off the digital timing, and a lot of things start to have less margin, which leads to more bit errors and reliability issues. Even worse, if you turn off the current, and you don’t have the right charge storage, it ends up spiking the voltage. A lot of vendor data sheets state that if you go X% over the voltage, they’re not responsible for the reliability of the device.”
Also, trying to correctly assess voltage droop can be affected by noise, which can confound the accuracy of readings. According to Vick, “You want to have the earliest possible indication of a droop event, so you set your droop threshold at 10 millivolts, but what if the power supply noise is on the order of 12 to 15 millivolts +/-? You’ve just set your threshold below the noise envelope that exists on your power rail. And the worst part of it is, the farther down the droop curve you are before you trigger, gives the less time you have to respond. So you have to set your first droop threshold below the noise threshold but as close as you can, because when a droop starts you want to immediately start to mitigate it, since you have to both detect and respond extremely fast to be able to keep the system from violating timing.”
Traditional versus modern responses to voltage drop
Historically, the answer to voltage drop was to build in extra margin, but with ultra-low voltage processes at single-digit nanometer nodes, there just isn’t room. This means accurate analysis has become far more critical. In the past, these were static measurements, essentially statistical averages, rather than real-time analysis. Now, new tools and approaches are allowing for dynamic analysis.
“Signals are always point-to-point, but power supply is a complicated mesh that needs SPICE to analyze it,” said Swinnen. “In the early days, that work was reasonably easy. You calculated the resistance and whether your wires were thick enough for the current you’d need, which depended on whether it’s switching or not. You added up the statistics and came up with an average amount of power that would flow through the power supply network, and that was good enough.”
In modern times, with the rise of dynamic voltage drop (DVD), especially with chiplets, finFETs, and other advanced designs, the difficulty has escalated. DVD occurs where multiple cells switch at the same time and can affect their neighbors. “If you look at your victim cell, the one you’re trying to analyze the voltage drop for, you’ve got to look at possible aggressors in surrounding cells. The number of possible combinations of switching neighbors that could lead to a voltage drop is so huge that it’s unsearchable. This combinational explosion is called ‘the coverage problem,” he explained.
The issue is that static analysis makes assumptions, according to Palaparthi. “It’s not going to take into consideration how the design is actually going to behave. It just assumes that over the life of this design, this is going to toggle at this rate; this is going to switch at this level. Based on that, I can do a static or average analysis that gives some indication of where designs will voltage drop. In dynamic analysis, designers get activity files, so when someone is designing the chip, they do verification, they do emulation, which gives the actual design activity, how the design is going to operate. Based on that, you can get some vectors,* such as which specific pin in the design is switching, how much, at what time, because the actual design activity is considered in the dynamic analysis, which is why it’s much more accurate.”
*The value of all the inputs in a circuit at a given time is called a “vector.” An “activity” is reported as a sequence of vectors that capture how all the inputs go up and down as the circuit does something.
Mitigating voltage drop
Simulations and digital twins are coming into favor as approaches to deal with voltage drop, but they can have their own drawbacks, said Rajat Chaudhry, product management group director at Cadence. “There are infinite paths, meaning, how do you test every possible scenario? One solution people have tried is to run ‘worst case’ dynamic simulations, with multiple vectors in combination, to stress the chip in the belief that if you can pass that situation, you probably will not run into a problem.”
But these scenarios can wind up being unrealistic, so the best approach is to shift left. “You need to start thinking about it earlier in the design phase,” Chaudhry said. “This used to be done at sign off, with people thinking they’d just fix a few things at the end. Now, making those late changes is not really possible because the number of problems you find in the late stages is very large. You need to start moving earlier and fix those situations at the source.”
That said, the best bang for the buck is always at the architectural level. Joe Davis, senior director of product management for the mPower product line at Siemens EDA noted, “How do I organize my circuit so that I can distribute my power requirements and my activity, and I can improve my IR drop by putting things further away? But then that also increases my parasitic resistance. There are trade-offs you have to make at the architectural level, then you get to the implementation level, and it even goes to the technologies that you choose. How many layers of metal you’re using, whether you should add more capacitance into your network, etc., there are trade-offs all along. The design process says you get the biggest bang at the highest level. But there are many tools all the way down to the bottom where you’re doing the individual implementation for a given block.
Ansys demonstrated a new solution to the problem at this year’s DAC Exhibitor Forum that called for a reappraisal of voltage drop analysis similar to the seismic shift that happened when chip timing embraced static timing analysis (STA). Voltage drop analysis today relies on transient SPICE circuit simulation driven by activity vectors, just like timing was pre-STA. The coverage problem has shown that this approach is no longer sustainable. In response, Ansys offered a different, structural analysis that does not depend on vectors to deliver near 100% coverage for local voltage drop. Ansys claims to achieve full coverage with only a fraction of the transient simulation effort required by the solutions today, along with other benefits for root-cause analysis.
“We saw that just doing more and more simulation wasn’t going to get you there, so we came up with a solution that isn’t based on activity vectors,” said Swinnen. “It looks at the coupling between the cells, and calculates what sort of contributions each one of these local aggressors have to each other. Based on that, it can very quickly calculate all possible contributions and deliver a comprehensive solution for local DVD. In this way, we can help avoid voltage drop at the prototyping stage, optimize placement for voltage drop during implementation, and speed voltage drop signoff.”
Fig. 2: DVD in center “victim” cell, caused by switching in neighboring cells. Source: Ansys
However, simulations and software solutions alone won’t be enough, said Movellus’ Vick. “People care about nanoseconds, it shows up on the spec sheet,” he said. “In the modern world, the competition is going to do everything they can to outperform you, so giving away any compute is problematic. When you design a droop solution, it’s not just about mitigating the droop, it’s about doing it in as efficient a way as possible because you can’t afford to give up any more performance than is absolutely necessary. And it has to be done in hardware and not software because changing workloads, changing dynamic system environments, interrupts, exceptions, all these things contribute to a very complicated software/hardware environment. And being able to definitively alleviate all of these in software is just an enormous lift.”
While voltage drop cannot be completely eliminated, combining several approaches can significantly reduce its impact, according to Preethi Govindaraj, of the Integrated Sensor Electronics group at the Fraunhofer Institute for Integrated Circuits IIS/EAS. “The key is to consider power integrity throughout the entire design process, from initial architecture to final implementation,” said Govindaraj. “Despite the challenges, tools and advanced techniques show that voltage drop is manageable. With ongoing advancements in design methodologies and manufacturing processes, we can ensure reliable chip operation even across multiple nanometer nodes.”
This can be done with a combination of the following elements, Govindaraj said:
Test considerations with dynamic voltage drop analysis
By all accounts, dynamic voltage drop analysis is critical, and there are additional considerations that come into play when testing the chip. “How you do this analysis is really important because you need to look not only at the functional operation of the chip, but how it’s tested,” Davis said. “When you’re testing your chip to see if it works, you make sure that you’re testing the actual chip, not the test, and that you’re not overloading the circuit through test. You design your circuit so that it stays within a power window, and the IR drop is manageable. That’s how you typically do sign off for IR drop. You do this dynamic activity looking at worst case situations for functional operation, which is how your chip is going to operate. What people then don’t think about is that when you insert the scan test, the automatic test that gets run when you produce the silicon, and sometimes even at when you do start up for your chip, the way those tests function is that they’re trying to parallelize all of those tests to have the shortest test time possible. That results in very high activity that is far beyond what you would experience during functional operation of the chip. As a result, if you’re drawing more power and drawing more current, you get IR drop. During those tests, you can get failures that are induced by the way that those tests are run. But you can adjust the instructions to the test so that it doesn’t occur. You can also offset different tests and operation of different portions of the circuit so that you’re not overloading the different portions of your power domain. But you have to do that intentionally.”
In the end, design teams want the ability to analyze problems themselves. “The closer you get to that dangerous threshold level, the more you will have to rely on both pre-silicon and post-silicon techniques,” said Vick. “If my design is such that I can absolutely guarantee I’ll never have a voltage droop, I don’t need to test for it in silicon. But there are some things that only happen in the silicon. You can’t pre-compute every possible combination of code, environmental factor, aging, mechanical, and thermal stress to simulate all of those across all parameters. Having something in the circuit to catch and respond to what’s happening in the real world and physical silicon is an important counterpart to the pre-silicon work that’s being done. While there are significant benefits to pre-silicon simulation, it’s not the full answer. The problem is significant enough that it’s going to take both in combination.”
Related Reading
Where Power Savings Really Count
How chiplets and advanced packaging will affect power efficiency, and where design teams can have the biggest impact on energy consumption.
Using AI/ML To Minimize IR Drop
Heterogeneous and advanced-node designs are creating unexpected post-layout challenges for design teams, but some issues can be addressed earlier in the flow.
Leave a Reply