中文 English

Redefining The Power Delivery Network

Getting power around a complex package, which include externally sourced chiplets, may come with a higher cost than some are prepared to pay.

popularity

Reliably getting power around a package containing multiple dies, potentially coming from multiple sources, or implemented in diverse technologies, is becoming much more difficult.

The tools and needed to do this in an optimized manner are not all there today. Nevertheless, the industry is confident that we can get there.

For a single die, the problem has evolved slowly over time. “For a single die, the power delivery problem hasn’t changed much in the last 10 or 20 years,” says Gary Yeap, senior R&D manager for the Design Group at Synopsys. “We use the same delivery method, and while the die gets bigger, I can still use the same mechanisms and it gets analyzed the same way. I can use the same algorithms. I can use the same methodology.”

The focus is changing, however. “We worked harder on design for performance and efficiency, and over-design is a critical concern,” says James Myers, distinguished engineer at Arm. “Static margins are no longer viable and we need systems that avoid creating large power spikes and adapt to unavoidable voltage droops.”

Much of this is a result of increased density on a single die and the difficulty in continued scaling. “If you look at a single monolithic die, that’s grown so big that it’s hitting the reticle limit,” says Ankur Gupta, senior director for product management at Ansys. “Because we are hitting the x,y max area that we can occupy, there’s a push towards the z direction. That is taking us to multi-die. Even if you look at a single die, we have moved beyond margining for the worst case, whether it’s static IR drop, or dynamic IR drop, or even voltage timing effects. At 7nm, 5nm, looking ahead at 3nm, you just can’t do any kind of worst-case margining anymore.”

This causes the problem to expand in several areas. “One of the areas that we’re seeing significantly more problems dealing with, is the thermal and mechanical integrity of these systems,” says Jim DeLap, electronics product manager for Ansys. “This is because you’ve got different multi-stacked dies, different materials, and dynamically changing power consumption and dynamic thermal maps of the different dies in different areas of this assembled system.”

Another problem area is power distribution. “You need to get power from the bottom die to the top, and there is only one way to get it there,” says Synopsys’ Yeap. “You have to use through-silicon vias (TSVs), and they are known to have resistance. If you don’t have enough of them, then you will get a power problem. Sending power through the dies to reach the final destination, where it’s going to be consumed, becomes more difficult further up the stack.”

A lack of pins is another issue. “Even now, complex SoCs have multiple voltage domains, and it is very difficult on the application board to bring the voltages to the SoC,” says Andy Heinig, group leader for advanced system integration and department head for efficient electronics in Fraunhofer IIS’ Engineering of Adaptive Systems Division. “So if we have more than one such SoC, plus a number of chiplets, it looks as if the complexity is increasing.”

Perhaps the biggest problem is heat. “With the end of Dennard Scaling, more power is pouring into every mm2 of silicon, and with each new finFET node that is just increasing,” says Richard McPartland, technical marketing manager for Moortec. “Current levels easily run into tens, or even hundreds, of amperes, while core voltages are dropping — now well below 1 volt. Combine this with 2.5D and 3D packaging  and software-driven systems, where the worst-case load may be different from that envisioned by the silicon team, and one can start to get a handle on the perfect storm surrounding power distribution networks for the latest generation of SoCs.”

But there is no cause for panic. “The good thing is that nothing is broken,” says Jerry Zhao, product management director at Cadence. “Nothing is totally out of control. People can still use the existing solutions, and we can improve it incrementally to handle the issues. In the old days, the IC guys didn’t talk to the packaging guys or the board guys, but right now they need to. We have implementation tools at the IC level, at the system level, and you can use those to put all this together.”

However, these changes are making the optimization problem a lot tougher.

Power control
Significant advances have happened in the past few years when it comes to controlling power within a die, but how does that translate to multiple dies? “You can imagine that one chip might be 1.2 volts, another chip on a more advanced node, could be 0.85 volts,” says Cadence’s Zhao. “Where do you get those voltages and how do you make them stable? You may have off-package technology from the board to supply the voltage and on the die, a voltage regulator to maintain a certain voltage.”

But where is power control performed? “It is rare to use one logic die to power down other silicon,” says Yeap. “I wouldn’t say it’s impossible, but probably fairly rare. In terms of efficiency, it is also not good.”

Ansys’ Gupta agrees. “We continue to see independent power control for each die. The number of pins isn’t changing. In fact, when you look at stacking, what becomes your weak point are the connection points. My boundary points have changed. TSVs are a proxy for the package pins in a stacked die environment. And you have to supply power — different types of power — through those TSVs. I still have the same analysis challenges to make sure that the integrity is maintained.”

If power control were to be centralized, it would most likely be through data signals. “Power has to reach the final, or near final destination, where you need to shut it down,” adds Yeap. “Then you see the power control transistors to shut it on and off. That seems to be quite classical compared to 2D.”

People are looking at several arrangements. “It is almost a miniaturized version of the board,” says Rita Horner, senior product marketing manager for the Design Group at Synopsys. “You may have devices that need multiple supplies, but then you utilize regulators that connect multiple supplies together and control them. There are people who are talking about integrating voltage regulators in the same package. Each one of the power supply domains can be independently treated.”

But that may result in waste. “One big problem is that there is no electronic format that describes if two domains can be handled by on-chip regulators or under what condition,” says Fraunhofer’ Heinig. “Because of this, the system designer is likely to connect every domain with an individual regulator. Reuse of regulators cannot be done. That also means that the number of domains is exploding in a chiplet.”

Is analysis enough? “Good simulation tools are available, and these will tell you what the IR drops and Ldi/dt should be, but not necessarily what they actually are,” says Moortec’s McPartland. “For this reason, including voltage monitors and measuring the VDD-VSS directly at critical circuit blocks, as shown in Figure 1, is invaluable for checking the power distribution network during chip bring up and optimization.”


Fig. 1: In-chip monitoring subsystem provides visibility of on-chip conditions. Source: Moortec

However, that may not always be the right answer. “In-chip monitoring or in-chip regulators are a good idea as long as the chips are not designed in 7nm or smaller,” says Fraunhofer’s Heinig. “In such technologies, it is very difficult to design an on-chip voltage regulator.”

New challenges for chiplets
If 3D-IC technology extends to include chiplets, additional problems surface. “What sort of information would you get from manufacturers?” asks Chris Ortiz, principal application engineer at Ansys. “Some vendors might provide very little information. Perhaps, at the architectural level when you’re putting this together, you may only have a golden value or a fuzzy value with error bars, and when you go through verification, you are looking at the best and worst case.”

There are a lot of open questions. “If it’s not designed by you, then you will have to conform to the spec of the power supply,” says Yeap. “If it’s a die not designed by you, then you will not be using it as the intermediary for sending power to somebody else. For that type of chiplet, they probably will only consume power.”

The problem is that you cannot avoid coupling between the various pieces of the system. “The power domains, regardless of whether they’re independent power domains or not, are always coupled in certain ways,” says Zhao. “For example, they may share the same ground pin.”

But there is an even bigger coupling – thermal. “Thermal has a very bad impact on power delivery,” says Yeap. “One of the problems is electromigration. If you increase temperature a little, electromigration could become so bad that your MTBF goes out of spec. These are complicated issues. People do analyze this for a single die, but now it becomes much more complicated. The effect of inductance also affects power delivery.”

So chiplets would need detailed thermal models. “You’re not going to be getting a great deal of information about the thermal capacitance or thermal properties of the chiplet,” warns Ansys’ Ortiz. “You will have to make estimates as to what the IR drop may be like, what the power is going to be like, how temperature will affect the power overall, and how it will affect various parasitics. And you’ll need to worst-case it and and verify that at least to some extent it’s going to be in line with what you want to do.”

If that sounds like a regression, it is. “In terms of the PDN, you need to have some margin, so that if the voltage drops, it can still function on the die,” says Zhao. “That’s not changing for the multi-die era, and people will still do that. But that leaves a big question about how much performance I’m leaving on the table. We are looking for ways to iterate for the multi-die level. How are we going to do the routing of those? How will you make connections between those two dies, and what’s the voltage drop on those routing layers that I’m adding to the outside of those die?”

Improving analysis
The only way to optimize that problem is through better analysis. “When you get into the 2.5D and 3D-IC advanced packaging system, you’re expanding the power grid to various pieces, including the interposer,” says Ansys’ DeLap. “You need to expand the analysis capability to take these into account. There are more interdependencies that come into play. One of the biggest areas is the multi-physics side which involves not only power delivery and power consumption, but also thermal and mechanical and those interdependencies.”

You cannot do the analysis separately. “You have to look at the whole thing starting from the package, then you look at the stacking, you look at the interposer or the base, and then you need to analyze the whole thing,” says Yeap. “It is very difficult to say that if I analyze each single die, and then for every die I assume an ideal power supply coming from the next die as you go through the die stack. You will have to analyze the whole system. I need to know the IR drop, going through these multi die stacks. I need to know the IR drop off every individual chip on the silicon interposer, including the silicon interposer itself. Take your classical 2D IR drop problem extend it to multi die 3D, including the package.”

And that requires a rethink about the information necessary for chiplets. “When looking at heterogeneous integration, users who are designing these packages may not have all the details about every single die,” says Synopsys’ Horner. “The practicality of running everything flat is impossible. It’s not just the time that it takes, but also the lack of information. That’s why you need to gather enough information from each die, to be able to do the multi die power analysis.”

Models are essential. “You have IPs at the chip level, which the IP provider might not give you the full description of it,” warns Zhao. “You need a model for that. And the same for the chiplet. If you supply a chiplet, the user of that chiplet needs to analyze it. They don’t have any power to change the chiplet, but they want to analyze the total impact on the multi-die package system. So the modeling could be a thermal model, or you can have a die model that can be used for the electrical analysis of the other dies in the system.”

We have to look at the requirements coming from in-house designed chiplets. “What we do today is create high fidelity models,” says Gupta. “If you are a package designer, who is tasked with looking at the interposer, and maybe one die is being supplied to you from a third party, that die can come in with its associated power model. Then you take that and extend it beyond just the power model to thermal, to signal integrity models. If something is a black box to you, somebody else is supplying it to you, you still need a high fidelity model, so that you can have a peek into the guts of that design without fully owning that design. That is table stakes.”

It may not be possible for all models to be at the same fidelity. “They may be analyzing one die that they’re designing right now,” says Zhao. “That may be their core technology competency. But they have other dies that they bought from some other sources or designed in a previous generation, and they want to reuse those. Those might not be analyzed at the same detail level as the die you’re designing. They may just generate for example, a die model or a thermal model when doing thermal analysis for the entire system.”

Going hierarchical
What will analysis tools like? “Any EDA tool that tries to take a multi-die system, flatten it, and then use the single-die algorithm is likely not going to work,” says Yeap. “We’re not doing it that way. We are truly analyzing it in a multi-die manner. You can’t re-map it to into a 2D problem and then solve a bigger 2D problem. That’s not going to work.”

Others agree. “Ideally, people want to do everything flat, because that is where accuracy comes from,” says Zhao. “In reality, a hierarchy flow is the only solution, because you don’t have that much machine power to run that size of simulation. So along the way people have to develop the power models, grid models, thermal models, and we also have the die models. Those are the things that people tend to use, but we just enhance it to make sure it can support the level of accuracy you want.”

It is an optimization problem. “if you think about that challenge it has three axes — accuracy, capacity, and memory footprint,” says Gupta. “That’s what it always comes down. We have developed a big data compute platform that is allowing us to handle these big systems and do the concurrent analysis, or model based analysis, do the multi-physics analysis — whether it’s power, thermal, or on-chip electromagnetics — all the way to system electromagnetics. But these are tough problems and there’s a lot of compute hours that go into providing provably correct, highly accurate answers.”

Continued innovation
It is not a fully solved problem today. “This is not only a very hot topic in product design, but also in academic and industrial research groups, because we know it gets harder for 3D-IC,” says Arm’s Myers. “This covers everything from rapid analysis flows to control algorithms and circuits for distributed clocking and regulation. There is even a technology aspect, with scaling boosters such as buried power rails offering potentially large IR drop improvements while freeing up backend routing resources.”

The design techniques may evolve, but users need the analysis solution today. “If you look at the EDA industry, we have a track record for identifying these challenges and then working with partner companies, the early adopters and key companies that are moving very fast in a certain direction, and then collaborating very tightly,” says Gupta. “First, we do this with customers, and then across EDA companies, as well. Only then can we look at what standards should be introduced into the community to make these workflows smooth across third-party data exchange. To get there, you need to first build a very smooth workflow working with partners and customers.”

One thing is certain — abstraction is essential. “When you get more and more chips, if you don’t abstract it, then you’ve got a capacity problem,” says Yeap. “The industry is going toward abstraction. However, because this is such a new problem, there is no standardization, not even common practices today. Everybody will be looking at different ways to abstract it.”

Conclusion
Power delivery networks may not be the sexiest part of a chip or package, but they are essential to its correct operation. As functionality becomes disaggregated, new techniques will have to be developed for power distribution and control. The EDA industry is doing the best they can to ensure power issues do not stop the industry moving forward, but there are many areas of uncertainty today that may make technologies, like chiplets, wait until some of these issues are resolved.

Without the correct models being identified, the additional costs in terms of margin requirements may eat into the advantages they offer.

Related
Power Challenges In ML Processors
Machine learning engines present some new power challenges that could trip up the unwary. Some of the issues were known once, but since have been forgotten.
Where Timing And Voltage Intersect
The limitations for power delivery networks and what processors can handle, why the current solutions to these issues are causing failures, and how voltage reduction can affect timing.
3D Power Delivery
The design of the power delivery network just got a lot more complicated, and designers can no longer rely on margining when things become vertical.



3 comments

BillM says:

Maybe it is time to go back in time (ala 1970-80’s) and do silicon breadboarding. This was a cheaper and faster way to check functionality/performance that all the various manual and archaic tools (SPICE) back then. Get the design to a ‘good enough’ confidence (with manual LVS checks) and DRCs, then cut the mebes and wait for silicon. Companies that are focused on selling millions of end products might start to create prototypes to check PDN, thermal, etc of their system with chiplets. Each customer has their own mfg as well as thermal, power, etc rules and unless the chiplets state “you can not do this”, end integrators will do all implementation combinations never thought of by chiplet provider. Too much investment not to do prototypes to validate models/assumptions.

Unfortunately, over the past few decades, we have relied on models to do ‘everything’ and we might be at an inflection point (or delay in modeling).

Brian Bailey says:

An interesting thought, but I wonder if the additional RCL of the breadboard would make too big a difference from what the final package might loo like. High frequency operation may require that model-based analysis.

BillM says:

Lots of issues to explore and determine different ways to verify systems. All depends on various risks, costs, schedules and tolerances for failure. Will be interesting to see how disconnects between real vs. virtual worlds are resolved (all of EDA is a virtual world/playground).

Leave a Reply


(Note: This name will be displayed publicly)