Necessary fixes to reduce margin and improve power integrity.
Experts at the Table: Semiconductor Engineering sat down to talk about voltage droop in analog and mixed-signal designs, and the need for multi-vendor tool interoperability and more precision, with Bill Mullen, distinguished engineer at Ansys; Rajat Chaudhry, product management group director at Cadence; Heidi Barnes, senior applications engineer at Keysight; Venkatesh Santhanagopalan, product manager at Movellus; Joseph Davis, senior director for Calibre interfaces and mPower EM/IR product management at Siemens EDA; and Karthik Srinivasan, director for R&D, EDA Group – Circuit Design & TCAD Solutions, at Synopsys. What follows are excerpts of that conversation. Part one of this discussion can be found here. Part three is here.
L-R: Ansys’ Mullen; Cadence’s Chaudhry; Keysight’s Barnes; Movellus’ Santhanagopalan; Siemens’ Davis; Synopsys’ Srinivasan.
SE: What happens with voltage droop in analog or mixed-signal designs?
Srinivasan: On the analog side, traditionally it’s been about sign-off. You either use SPICE on smaller designs, or you use FastSPICE or other approaches to do sign-off . Things are shifting left just like digital, although there are not too many levers that designers have today to play around with on the analog space. But there is an increasing realization about the need to increase IR drop awareness in the design, and EM awareness during the design phase. That’s a trend that’s going to continue. One of the challenges is also the coverage. Unlike digital, there aren’t many ways of coming up with a vectorless, or guided vectorless or vectored approaches using functional simulations or emulation data. Analog is limited in terms of coverage, as well. People are looking for more innovative ways of dealing with this in order to address the coverage aspect for analog.
Barnes: Power integrity, the power droop problem, and power delivery is AC, not DC. You’re fighting that inductance in the path. It’s really a power integrity ecosystem — and that’s the challenge. You can’t just look at the die design in isolation. You have to look at how the power gets from the package to the die. There’s an inductance there they call the Bandini mountain, and there’s a tradeoff. How many power and ground vias can I get versus how much capacitance can I put on the die? If I can’t put enough capacitance on-die, I’m probably going to need more power and ground vias to reduce the inductance, because every inductance has to be compensated with a capacitance to provide that charge delivery while the inductance is impeding it. The same thing goes for the package. How much capacitance I can put on that package is going to determine how many power and ground vias I have. Then, on the printed circuit board, how much inductance is getting into the package is going to determine how much capacitance I’m going to need on the board. So an engineer really needs a pre-layout, engineering design exploration that can look at that whole PI ecosystem. The challenge we have right now is that we have a lot of die designers who just toss the problem over the wall to the package guy, and the package guy just tosses it over the wall to the end user, and we’re seeing a lot of frustration. The end users want an impedance model. This is something the die and package people could help with in giving that S parameter impedance model. Maybe it’s as simple as, ‘How much capacitance are you guys providing? How much inductance is there? Then I can start to design my power delivery network.’ I still remember when I first got into the SI world back in 2005 and Xilinx had come out with the Virtex-4. It was a miserable failure in the sense that they put all of the ground pins together in the center, and then all the power pins around that, which caused high inductance. If you look at almost all the chips now, you see the power and ground pins alternating so that you can lower that inductance. Each power and ground pin has a pair, and it’s trying to reduce that inductance to get into the package. A lot of our really large designs now are 90% power and ground to get 2,000, 3,000, or even 5,000 amps into some of these supercomputer chips for the cloud. Also, a lot of the social media companies are driving crazy designs with thousands of amps.
Chaudhry: Yes, it’s very important to make sure that your inductance is not very high, and determine how much current you actually can pass through that package. What’s happening now is that inductance from the system side has a new bottleneck. On advanced nodes, you have such high resistance at the lower levels that you will start seeing a macro problem. Then, you have the micro or local problem. For that local micro problem, even if you add a lot of decaps [decoupling capacitors], it doesn’t help with the droop because there’s so much shielding from the high resistance at the lower levels. So now we have these two problems. And you really cannot mitigate this second problem unless you have EMIR intelligence built into the place-and-route tools. It has to be done very early, because at the sign-off stage it’s very hard to fix. So more and more, the only way we’re going to actually solve this problem is to put more of this EMIR intelligence into the place-and-route tools.
SE: What are EDA, IP vendors, and foundries doing to reduce the effect of voltage droop —especially as we go to the advanced nodes?
Chaudhry: We are working with foundries on this mitigation problem, and we are working on adding more IR intelligence into our place-and-route tool and integrating with our system tools.
Mullen: Ansys has capabilities for power integrity analysis. We also partner very closely with Synopsys on the Fusion Compiler capability, so we provide that guidance to place-and-route. And we’re working closely with the foundries. We see a lot of things the foundry is coming out with, and we’re working to make sure that can be analyzed to high accuracy, and that the extraction and power integrity analysis is very complete. We do see with the foundries a trend toward backside power. The main promise end users are looking for is a reduction in the routing congestion for signal routing. That’s a tremendous benefit if it works. We hear some initial numbers on the voltage drop mitigation, but it’s a little too early to tell how that’s going to pan out. Still, it’s an interesting development.
Davis: In terms of levers, EDA companies can integrate tools. We can provide capabilities to bring those modeling capabilities earlier into the flow. The real levers, though, are in the design area — architecture, and so forth. The backside power gives you a fundamental shift in capability, because it improves your signal congestion, which then allows you to have higher utilization, which will then drive more droop. That’s going to be a battle that goes both directions. The EDA companies are working on integration across EDA vendors because most design teams use a mix of EDA tools for different things. Within the ecosystem, every tier one company uses every place-and-route tool, and has a copy of just about every tool. There are best-in-class flows, there are mixed flows, and so forth. One of the key aspects here is the ability to integrate across the EDA landscape to make sure the capabilities are available not just with your own set of tools, but across tool sets so that you’re more agnostic and your customer can get the value regardless of which tool set they use. Ultimately, there is no one company that has the best tool for everything, and often that’s what customers want. On the EDA side we have to make that balance between enabling our individual sales versus overall value to the customer. But big levers for addressing voltage droop are in the designers hands.
Barnes: You talked about customers wanting everything in one tool, but customers also want a magic button they just push and get an answer. The reality is it’s a really complex problem, and every day I’m amazed our simulation tools can tackle the complexity that we’re talking about — thousands and thousands of pins in the whole system. Going forward, though, if you look at a lot of SI and PI engineers, they’re multitasking while trying to get a product out the door. They’re looking at a lot of different aspects — thermal, electrical, SI, PI. They also have to get the design to work to get that 200-Gbit data transmission or communication going. So what you’re also seeing in the tools, as we try to address the bleeding edge technology and design issues, is a push toward automation. One of the things we’re trying to do is enable the average hardware engineer to at least get a better understanding and be able to jump in within a few mouse clicks, be able to import a board to do power integrity impedance simulation, and just get an idea of what’s happening. They also need to start looking at that PI ecosystem and bring in models so they can look at the stability of the power delivery, as well as maybe power supply rejection ratios. You usually have more than one power supply. There are also multiple power rails, and you have to look at noise between those. The system gets very complex very quickly. A lot of automation is needed going forward.
Srinivasan: 3D-IC is really changing the whole picture here. Whatever used to be a system designer’s expertise now is something a chip designer also needs to worry about, like the thermal SI/PI. They have to talk in terms of the inductance and capacitance, which they may not have dealt with in the past. There is a requirement to go from transistor to system, as well as from system to transistor, in one cockpit — either with a single set of solutions, which is highly unlikely, or with a suite of solutions that interoperate seamlessly. That’s something our customers are dealing with on a day-to-day basis, and asking the EDA community to interoperate in a more seamless fashion. It’s about taking a transistor-level model to a system-level or SoC-level analysis, and taking a chip-level model into a 3D-IC level analysis and so on. That’s an evolving area.
Santhanagopalan: From an IP point of view, we’re looking at providing a fully integrated, turnkey solution to be able to detect and respond to these types of di/dt voltage droop events. Designers want programmability and flexibility throughout this entire design cycle to deal with the complexities. Having that flexibility, programmability, and the ability to observe what is going on in real systems in each of these phases would provide a lot of value to be able to design the best possible system. Our focus has been on providing a platform from the IP side, which is a digital-based solution for the droop response.
SE: With early prediction of voltage droop/IR drop, what kind of precision is there today, and how do engineering teams determine the level of precision they need?
Chaudhry: If you have a very accurate vector, EMIR tools can be pretty precise. Customers have done some work on scan vectors and compared them to what they observe in production. The tools can be quite precise if you know exactly what you’re simulating. The problem is this space is infinite. How do you cover all the different possibilities? Today, most people do a lot of margining. They just over-design and put in margins to account for worst-case-possible scenarios. So although we can be very precise, just to cover the full space, customers are putting in margins. A lot of people do vectorless analysis and will run the chip at 2X, 3X power, sometimes even 4X to 5X of the actual power they will see on the chip. It’s mainly through margining right now, but we can be precise, and that speaks to the need to start bounding the problem in a more precise way.
Mullen: While the precision is there, it’s for a very specific case. It might be one corner of PVT, one mode of operation, one very small set of possible vectors. Even within that vector, if you know the scan shift vector, your modeling instance is switching at precise times. But in reality, there’s a lot of variation when they could switch, and a couple of picoseconds can make a huge difference in the voltage drop on an instance.
Davis: We can be very precise. It’s a matter of the models that we’re using, although you have to realize everything about EMIR — especially in the digital space — is about approximation. We’re not using sign-off extraction, because if you did, it would take all year. You’re not using the same sign-off simulation for the same reason. EMIR is all about approximation. It’s good enough to get the scale to then zero in. Accuracy is another matter. As everyone has already said, given a vector, you can get a pretty precise answer for a given process condition. What about process variation? What about environment variation? What are all the environmental factors they affect? Even though people do sign-off at “worst-case corner,” that’s a point within the distribution, and so they’re designing for that worst case. There is a philosophy that can be adjusted of whether you can design to accommodate variation rather than just planning on absolutely no variation whatsoever. EMIR in both analog and digital is about scale. Can you get close enough to be useful? And can you get the coverage? Coverage is where the accuracy part comes in.
Barnes: One of the challenges involves the same place where “power integrity” came from. “Is this a wide-bandwidth problem?” It’s covering a wide range of frequency, from DC to the switching noise type thing. Originally if you just did traditional power electronic step load, you’d say, “Hey, my power rail looks great, no problem.” But if you look at the issue of having multiple resonances, and my dynamic digital system excites all of those resonances, I can get a rogue wave and greatly exceed the power rail voltage there. Classic example: We did a simulation to measurement correlation with the Xilinx ZCU104, and I was sure 200MHz (20 amps) was going to have more noise than it did. It actually was like 30MHz, which is only a couple amps. I was wrong, and that really shows you what power integrity is about. You have to look at the impedance domain. What is the impedance versus frequency? Where are those impedance peaks, and what happens when I excite them, even at the die level? We’re going to have to see if that world is going to catch up and really start looking at power integrity from the point of view of, ‘Let’s use impedance to get a glimpse of what is going to be worst case, and what is going on with that power delivery.” With impedance versus frequency, the frequency domain gives us a lot of insight into how robust our power delivery network is. That’s what power integrity is about, and that’s how you fight this dynamic voltage droop. We’re going to see a lot more of that in the design of ICs going forward.
Srinivasan: I agree that EMIR is more about approximation then precision. However, you also need accuracy. As we move to the smaller geometries, the resistances do play a critical role. If traditionally it’s been dealt with as a divide-and-conquer approach, especially on the analog world, in digital it’s a given that standard cells are characterized at different voltage levels. So you try to take the impact of voltage droop in the currents, and so on. However, in the analog world, it’s extremely painful or almost computationally impossible to actually simulate everything together in Fast SPICE simulators. Lately there have been several breakthroughs in simulation, especially with GPU architectures, taking advantage of those to handle large power distribution network problems. That’s one of the things tht many of our customers see value in — taking in the power grid noise and its impact on timing. Looking at things together provides a lot more value than looking at them independently, because margin is something that you live with for lack of impact. If you don’t know how it’s going to impact the design, you assume it’s a 10% margin. You try to meet the target, but if you precisely know how it’s going to impact your timing or your functionality, then you’d rather focus on that instead of looking at the margin and trying to meet that target. The other part is coverage. Trying to come up with the right set of vectors is still an unsolved problem. And looking at other aspects of the picture, like variation, is something customers are trying to deal with in various ways, and it’s something the EDA industry needs to think through to see if we can have more innovative solutions there.
Santhanagopalan: You can be precise and accurate, as long as there is an understanding in terms of what is actually causing these effects. From that point of view, having real observability in the system in different portions could come in handy. The level of insight you gain from having a clear correlation of the effects of certain types of workloads, and the effects for certain types of packaging, could provide the type of observability and insights you need to be more precise in subsequent iterations. Having observability and visibility into what is actually going on in the system would be really useful.
Fig. 1: Source: Ansys. Voltage drop due to parasitic resistance in the interconnect between supply pin and the cell.
Part one of this discussion can be found here.
Part three is here.
Leave a Reply