Failure to get sufficient current to devices when they need it results in voltage droop, timing delays, and functional failures.
IR drop is becoming more problematic for a growing proportion of designs, an indication that the power delivery network (PDN) is not providing enough current to parts of the design when required. Unfortunately, there is no easy fix to this problem.
In the past, when voltages were much higher, a small voltage droop didn’t really matter. At the same time, wires were much thicker and presented lower resistance. And finally, switching speeds were slower, creating smaller current spikes. All of these have gotten worse at recent technology nodes, and the result is that an increasing number of designs are having problems with timing.
“This used to be a second-order problem,” says Marc Swinnen, product marketing manager at Ansys, now part of Synopsys. “You have demands from very quick, large current spikes through very thin wires with very low tolerance for any voltage drop. Margining has been the easy way out, but margining is very expensive in terms of space, and it’s expensive in performance.”
To fix the problem, more attention needs to be paid to the design of the power delivery network. “The power grid development is one of the first steps after partitioning,” says Joe Davis, senior director of product management at Siemens Digital Industries Software. “But the key thing is at that time, before place-and-route, you don’t know where the gates are going to go and they’re not all the same.”
So the PDN design needs to start before all the details are known. “The basic architecture is a power ring around the rows of cells, and then the rows tap into the rings out at the edges,” Ansys’ Swinnen explains. “Because the power drops as you go down the row of cells, each cell draws some power, and so the voltage drops. In the middle of the row, you have the deepest drop. To support the power voltage, you draw straps across the ring. You draw a strap from the top of the ring to the bottom of the ring, and at every row it crosses you staple a via onto the local power rail to support it. You have this meshed structure. The number of power straps, the pitch between the power straps, the more you have, the better the voltage will be supplied.”
There is a secondary problem if the wires have higher resistance. “The problem with IR drop is thermal,” says Luca Vassalli, customer applications engineering director at Empower Semiconductor. “If there is current flowing through that resistance of the IR, it draws power. It generates power to the square of the current squared. It’s important to decrease that resistance as the currents are increasing. With today’s processors, the cores are pulling 1,000 amps. If you have 1,000 amps across 100 microohms, that’s 100 watts lost. And 100 watts could be 10% of the power of the processor.”
As with everything, there are always tradeoffs. Making power delivery wires thicker makes it harder to route, and that affects area. Pushing devices farther apart provides more room for the wires, or for thicker wires, but it reduces the area available to connect them up.
There are attempts to solve this problem by moving the power wires to the backside of the die. “The concept is to move power lines that are big and fat out of the way,” says Siemens’ Davis. “That does help, but it creates another problem. Instead of timing, it’s thermal. With the power wires out of the way, you can pack more stuff in, and things can be closer together. You can run things faster, but when that happens, they get hotter. The reliability of wires is exponentially degraded by heat, so the higher the temperature, the higher the resistance, the more IR drop, and the lower the reliability. There is competition between delivering more current with lower IR drop, but you are going to generate more heat, which has to be dissipated. To take advantage of the new trick, I have caused a new problem.”
A hierarchical problem
While chip engineers focus on the tracks on a die, that die is potentially sitting on an interposer inside a package on a board within a system, and that is where the primary power supply is located. That creates a long path, which needs to be understood. “There are a lot of methods available to verify this at the chip level, but for the chip/package/board problem, we see a lot of problems,” says Andy Heinig, head of department for efficient electronics at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “Nobody can really answer the question about how to do power delivery verification for the chip, package, board. It’s really an unsolved problem.”
The goal is to move energy to where it is needed, at the time it is needed, and with the least amount of loss. “What creates that IR drop is the distance between the power stage and the load,” says Empower’s Vassalli. “We are trying to make sure that we can put the power stages as close as possible to the processor. One way is to increase the bandwidth of the switching regulator, which in turn allows you to use fewer capacitors. If you’re increasing the bandwidth of the switching regulator by a factor of 10, you reduce the amount of capacitance needed by a factor of 10. That creates more space. The best compromise we found is to keep the voltage regulators outside of the package, but as close as possible to the package, and therefore on the backside of the PCB. There needs to be another level of improvement to really pull off the integration inside the package, but there are a lot of discussions about an integrated voltage regulator (IVR) going into the package.”
Inside the package, many of the largest designs will include an interposer. “The interposer is more relaxed,” says Swinnen. “It’s usually like 16nm or 35nm technology, but you’ve got much bigger power. On the interposer, and through micro bumps, you need to have the power, not just for the chip, but maybe the chip above that. It has to feed through the chip using through-silicon vias (TSVs) and these tiny bumps. Hundreds of watts have to be fed through these tiny connections. It’s more complex. It’s different. There are new elements that come into play. But the same issues apply. Does it get worse? It’s definitely more complicated.”
The interposer also starts behaving more like a PCB. “When you start getting interposers with long traces, Ls [inductance] can very much become an issue,” says Davis. “You can have resonance. You start to have the same signal integrity issues in a 3D-IC that you have with a traditional package on a board, but you’re talking about shorter things, shorter traces than wires, so it’s less of an impact. But interposers are getting so large today, when you look at the roadmaps for the foundries and the systems that they’re planning to put together with hundreds of dies, L’s and C’s [capacitance] become a significant impact, and K [conductance], plus a whole bunch of other ones.”
Multi-die assemblies are new for many chipmakers, so engineers may be less familiar with the challenges. “Advanced packaging introduces non-traditional power delivery paths,” says Takeo Tomine, principal product manager at Ansys. “This includes TSVs, micro-bumps, hybrid bonds, and interposers. Each of these add resistance and inductance, further complicating power integrity. In a DDR PHY, early analysis might reveal that placing I/O drivers too far from main power taps results in excessive IR drop during simultaneous switching output (SSO) events, prompting a floorplan revision to shorten power paths. Similarly, in HBM designs, where hundreds of I/O lanes operate concurrently, early detection of localized IR drop in the PHY or controller region can guide partitioning strategies to isolate high-current domains and improve power grid granularity.”
Co-designing them would be the obvious answer. “As engineers, we solve this the way we solve all problems — with boundary conditions,” says Davis. “I can only have so much drop in this die from this interface to this interface, from here to here. I can add it up within those boundary conditions. It gets partitioned and allocated, so that each piece gets its piece of the budget. If you look at HBMs, they are stacking the memory die 8 and 12 high. There’s a via between each one of those, and you have to supply power from the bottom one to the top one. IR drop, from that power pin all the way to the top, is an enormous portion of their performance envelope, and so they have to partition it up like that.”
These problems become larger when safety and reliability are involved. “We have a problem in automotive because of functional safety requirements, which are totally unclear,” says Fraunhofer’s Heinig. “You spend so much effort for functional safety on the functional side, and then we have a lot of uncertainties on the power delivery network. We have single-point failures coming from the power delivery network.”
Automotive is facing many issues in this area. “Automotive is starting to move into advanced nodes, and they are facing a problem because if they follow all the foundry rules, they can’t design a competitive chip,” says Davis. “There is no solution space, and so they’re taking a look at the rigor of the electromigration rules the foundry uses. The rules that are used today are built on Black’s equations, but those equations ignore the fact that it is a network. It is a power delivery network. Only in a few places do I have one path to deliver power to a gate. If I have a small increase in resistance in one part, that power will be delivered through another route. A true physics-based reliability analysis shows that Black’s equations and the current models not only give pessimistic results, but completely wrong results in many cases.”
The functional hierarchy
In addition to the hierarchy of chip, package, and board, there is a functional hierarchy that impacts the design of the power delivery network. This also results in multiple issues, such as average current demand, peak demands, and requirements that vary over time. “We are not exactly sure, especially when we have to consider hundreds of use cases, what we require from the power delivery network,” says Heinig. “If we fulfill the requirements to the maximum constraints, we can’t design such a network. It’s impossible, and we have to deal with average cases. But nobody knows if these average cases represent the use cases.”
Averages typically don’t consider functionality. “At any point in time, some gates are switching, others aren’t. But you can average it out and, net net, when you have a power supply that has to feed 1,000 gates, it’s seeing an average draw on current,” says Swinnen. “You can design so that the average draw is well supported by the power network. Then you also have dynamic voltage drop, where specific gates at specific times all switch together, and you suddenly have a local spike that needs to be supplied. That’s a transient, time-dependent peak in the current.”
But not all of those may be a problem. “Luckily, peak currents are usually very narrowly defined, and you don’t have a lot of gates that have their peak at the same time,” says Davis. As designs push more nets toward criticality, you’ve got close to zero slack. You have to check both average and peak and ensure that you look at not just the IR drop on that gate, but its impact on timing. A gate might have 10% IR drop, which would be by most traditional metrics, a violation. But if it’s in a path that is nowhere near to critical, it may be irrelevant.”
Finding those peaks can be difficult, because not all peaks are actually possible. “If you have a multi-core processor system, and if you assume they can all operate at maximum rate, and then look at that power consumption, you are far away from what you can really provide,” says Heinig. “It is a dead use-case, because you can never use the system in this way. There’s no use case where you have all the cores running at full speed, and you can’t provide a power delivery network that can deliver this 100% load.”
That creates a dilemma. “People have gone to using what they call vectorless analysis, which is a made-up vector,” says Davis. “It says, ‘I don’t know what you’re going to run, so let’s take this hypothetical worst case and put it together.’ But when you’ve got such a large die, what else are you going to do? Other customers say, ‘I don’t trust that. I’m going to pick some vectors that I’m going to use.’ How do you do that scientifically? They have developed processes that seem to work, but is it sign-off in the sense that it has explored the worst corner? Good luck.”
It gets worse. “It’s also the software,” says Heinig. “The problem here is that they continue to develop the software during the development phase. You might have designed it for a certain profile that they provided at the beginning, but then they change the software. You have to do software development in parallel with hardware development. That means you need some assumption where you say, ‘We start with this use-case, but we have to add a margin of 20% or 30% because of potential software changes.’ But we don’t know that up front.”
Models and simulation
To perform any kind of analysis, models are required. “Everything in EMIR is an approximation game because ultimately, to get the most accurate answer, we’re doing a full electrical simulation of the entire chip with parasitics,” says Davis. “That is perhaps the most costly step in the entire process. You have to approximate that for early analysis. And those models are built based on the history that they have with the designs, the architecture, and the partitioning.”
That doesn’t help when you have not designed everything in the system. “If the chip is a black box, the power delivery network of the chip is a black box,” says Heinig. “We do not know exactly how we have to model this. It’s a really abstract model we get for the chip, and it’s totally unclear for us if we are doing the right things with our package simulations. What I can tell you is that we definitely need more accurate models for the components, especially if they are black boxes. We also need models in the exploration phase so that we can do early predictions or routing studies.”
Chiplets may help. “If I’ve already fabricated this chip before, it can be characterized, and I know how it behaves under different activities,” says Davis. “Hard IP and chiplets are a very good way to be able to design your power delivery, because you can characterize it once it’s been implemented. Other than that, there are approximation methods that can always be improved. One of the fathers of statistical modeling said all models are wrong. Some are useful.”
Early analysis can save a lot of headaches. “Early-stage analysis plays a pivotal role in mitigating PDN-related risks, influencing architectural decisions such as partitioning, floor planning, and power grid topology,” says Ansys’ Tomine. “These avoid costly late-stage redesigns. By incorporating power integrity checks early, before layout finalization, designers can proactively identify regions prone to IR drop and adjust block placement, routing channels, and decap allocation accordingly.”
Issues detected late can be expensive. “That’s always been one of the bugaboos of IR drop analysis,” says Swinnen. “It’s very difficult to fix, because by the time you detect it and analyze it, you’re so far down the pike and your timing is being balanced that you really don’t want to start messing with your design to fix IR drop. You need to do good IR drop analysis earlier in the design, at the placement stage where you can still easily modify the placement, spread those aggressor cells further apart, and address the demand problem.”
Simulation runs are very expensive. “This simulation is a non-linear SPICE simulation with approximations,” says Davis. “Parasitics are approximated, where 5% for sign-off is typical. Everything is an approximation, so that it is a solvable problem. Otherwise, you wouldn’t even be able to do this. You typically do it at one temperature and with a few different scenarios. You identify hot spots, you analyze those, and iterate. You do this early on — as early as you can — because at the end it’s too late. If you wait until after DRC clean, LVS clean, you have no options. You can do a local reroute, or you’re just going to wave it with the foundry. People do spin chips for IR problems, and there are occasional escapes resulting in silicon failures. These typically identify some corner case that nobody thought would happen.”
“For a modern state-of-the-art chip, we’re looking at 60 billion to 100 billion electrical nodes in that model just for the power distribution network,” says Swinnen. “You need a solver that can take a 60 billion-node electrical circuit and solve it quickly, and do clever reductions on that. Traditionally, a customer would pick one corner, because it takes long enough as it is, and nobody is clear what multiple corners would imply for IR drop. That’s an emerging field.”
Some parts of a design, such as analog and high-speed SerDes are obvious places where careful design is required. “Some interconnects have critical signals, and we have to route them very careful,” says Heinig. “It’s not only voltage drop, but also noise. It may not be obvious if there is coupling between domains and the only guidance we receive from suppliers is to avoid couplings. But this is impossible. What is the maximum coupling that is allowed between domains, so that we fulfill the noise requirement targets?”
This is also where design tricks get utilized. “You try to increase the capacitance, not so much with explicit capacitors, but by overlapping the power and ground as much as possible,” says Swinnen. “Power wires running with just an insulator layer between them form a capacitor, a parasitic capacitor. But this is the one case where parasitic capacitance is actually a positive. You try to maximize the parasitic capacitance of power to ground in the PDN, to give you this capacitive decoupling.”
Conclusion
Without a robust power delivery network, modern designs will have significant IR drop problems. At the same time, more paths in the design are becoming critical, meaning that without significant analysis the chip will not operate at the target frequency, or it will simply fail.
The industry has not yet developed methodologies that satisfy all of the needs of sign-off, while also enabling early analysis using consistent models and scenarios. Additional investment is coming to address those needs, and that starts by going all the way back to the fundamental physics. With that as a firm foundation, better solutions may be possible.
Related Reading
Development Flows For Chiplets
A chiplet economy requires standards, organization, and tools — and that’s a problem.
Leave a Reply