Power Integrity And Voltage Issues Get Harder To Detect And Solve

Inconsistent demand from multiple features can greatly increase the number of corner cases.

popularity

Voltage and power integrity are becoming increasingly critical and challenging for chip designers and architects, regardless of which process technology they are using or which market they are targeting.

An explosion of features vying unevenly for current is increasing the number of constraints and possible interactions that engineers need to sort through to ensure reliability. These include voltage conversion challenges, mixtures of low-voltage and higher-voltage features at different technology nodes, and management of heat that can vary by workload and usage. In general, the more transistors, the higher the demand for current. The problem is that demand isn’t consistent, and it can cause voltage drop and power integrity issues as multiple devices in an SoC or multi-die assembly try to draw current simultaneously. Higher current demands can cause wires and devices to fail, and this is exacerbated by the increased number of transistors and the higher operating frequencies of modern chips.

The inconsistencies compound, too. In systems like AI racks, there is a move from 12V to 48V power to deliver more power without increasing current. However, converting from 48V to the lower voltages used by semiconductor components is challenging and can result in power losses. Applications such as AI, IoT, HPC, and automotive require devices to operate at lower voltages to reduce power consumption and manage heat. This low-voltage operation increases sensitivity to process variations, narrows noise margins, and introduces greater timing uncertainties. As technology scales down to 7nm and below, the challenges related to voltage and power integrity become more pronounced. Lower voltages and smaller geometries increase the impact of manufacturing variations, making it harder to model and predict device behavior accurately. The integration of more devices and the use of advanced packaging techniques like 2.5D and 3D systems increase the thermal load on chips, requiring efficient thermal management to ensure reliable operation.

All of these factors need to be understood up front, too. According to Josefina Hobbs, senior director of product management for the Logic Library IP and IO IP product lines at Synopsys, design pressures around voltage are intensifying as applications demand ever-lower power consumption without sacrificing performance or reliability.

Application areas under pressure from voltage concerns include:

  • AI and IoT: Devices require extreme energy efficiency for longer battery life, often operating at very low voltages (0.4V and below). AI processors, especially for edge and wearable devices, must support local, low-power computation and frequent memory access.
  • High-performance computing: HPC systems face significant pressures to reduce heat and energy costs. Lowering voltage in server farms helps manage power consumption and cooling requirements.
  • Automotive: Electric vehicles need to maximize range and reliability, making low-voltage operation essential. Features like infotainment and sensor arrays must deliver high performance with minimal power draw.
  • Crypto: Crypto SoCs run massively parallel workloads at high activity rates, making power efficiency critical to maintaining profitability in data mining operations.

These market segments are particularly sensitive to the challenges posed by voltage and power integrity, as they require both high performance and energy efficiency to meet their specific demands.

“Most of chip design is about getting the right functionality and timing, but during much of the process, designers assume they can get perfect voltage from the pins to the devices to the gates,” said Joseph Davis, senior director of product management, Digital Design Platform Analysis at Siemens EDA. “But then, at the very end, they do an actual analysis that says, ‘Wait a minute, I put all these things on the chip. These things are all trying to grab current at the same time, which creates a voltage drop. Do I get enough juice to make it work the way it’s supposed to work?’ The power integrity tool is saying, ‘Can I get enough power to all these devices so that they work the way I intended them to work within their windows?’”

These growing complexities place immense pressure on every aspect of chip design, making it increasingly important to address voltage and power integrity challenges early and throughout the development process. As designers grapple with these realities, the conversation shifts to how these factors impact long-term device reliability and the practical hurdles faced in real-world semiconductor systems.

“If I’m pulling too much current, does that start causing the wires to fail or the devices to fail? This is getting worse because we do more, we pull more current, we operate at higher frequencies,” Davis said. “And the scale of what we’re doing, the number of transistors that you cram in there, and how big these devices are getting [make it even more challenging]. You look at the number of transistors, the number of gates on an NVIDIA chip, well, it’s no longer a chip. It’s a chip assembly. For all the biggest die, they’re not just a die anymore. They’re systems. It’s a 2.5D or a 3D system, rather than an individual chip. That means the biggest problem is just scale. It’s the same problem, only bigger because there’s more of it.”

An interesting issue is how some systems, like AI racks, are moving from 12V to 48V power into the server chassis and then converting down to 5V, 1V, or below for the semiconductor components.

“The move from 12V to 48V is needed so that more power can be delivered without increasing current, allowing existing wiring to be used,” said Steve Woo, fellow and distinguished inventor at Rambus. “However, converting from the higher 48V to the lower values used by the semiconductor components is more challenging, since there are some losses during the conversion. And it’s tricky to convert from higher voltages like 48V down to the standard voltages that semiconductor components use. One challenge, therefore, is providing efficient conversion, and new power management components may be used to improve conversion efficiency and to perform the conversion closer to the devices that consume the power so that the quality of the converted power remains high.”

The issues are similar but different in IP design. “Particularly for foundation IP embedded memories and logic libraries, engineers face a series of increasingly complex challenges,” observed Synopsys’ Hobbs. “Operating memory bit cells at or below 0.5V raises reliability concerns, as process variation and aging effects can degrade read and write stability. For logic libraries, deep low voltage operation heightens sensitivity to on-chip variation, narrows noise margins, and introduces greater timing uncertainties, complicating both characterization and verification.”

Integration across multiple power domains within a single SoC, especially in multi-rail architectures, further adds to design and testing complexity. “The move to advanced technology nodes, such as 7nm and below, amplifies these challenges, with greater transistor variability and a more pronounced impact of low voltage on overall performance and yield,” Hobbs said. “These challenges are most critical in high-performance, ultra-low-power applications like mobile AI, automotive safety, and large-scale HPC. Addressing them is key to enabling efficient and reliable next-generation solutions.”

The voltage-related challenges extend to foundation IP embedded memories and logic library IP design, making them increasingly challenging to model, especially as technology scales down to lower voltages and smaller geometries.

“At lower voltages and smaller geometries, manufacturing variations have a greater impact on device behavior, leading to non-linear and asymmetrical statistical distributions,” she said. “Second, reduced voltages weaken signal strength, making setup and hold time violations more subtle and harder to model accurately. Third, effects such as aging-induced write failures in memory cells may only appear after extended operation, requiring sophisticated long-term simulations. Fourth, as designs integrate more domains operating at different voltages, subtle interactions and corner-case failures are more likely — but harder to predict in testing.”

Verification issues
These increasing complexities in voltage management and process variability set the stage for even greater verification challenges in modern systems. As design teams face mounting pressures to ensure reliability across diverse operating conditions, the interplay between digital and analog verification becomes more pronounced, especially when dynamic loads and multi-domain power architectures are involved.

“Especially in systems that have totally different load conditions, the verification of all these different states and situations is very difficult,” noted Andy Heinig, group leader for advanced systems integration and department head for efficient electronics at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “If you have a multi-core system with four cores, for example, you have so many different situations. One can run at 10% while the others are off, then you switch one on, and it’s running at 90%. These dynamic loads make it really hard to understand the coverage you need for that.”

This is especially problematic in mixed-signal designs. “What’s often a big problem here is the combination of digital verification tools and analog, because this dynamic behavior is always an analog behavior, and combining the verification methods is very difficult. If we want to check this together with the package, often it seems it’s not fully verified and fully validated, or it’s not verified and validated enough, as we would do. So, there’s a lot of verification on the digital objects themselves and the functionality. We believe we will have more problems in the future because of these uncertainties in the power delivery coming from the voltage regulator going through the package to the transistors.”

It’s also problematic in advanced packages and large SoCs. “You often see Nvidia struggling with one of its latest products on power delivery,” Heinig said. “Some of the processes in the past from AMD and Intel were struggling with power delivery, as well. Intel even took some processors back. There was a lot of discussion around this, but it was not exactly clear in the end if it was even a power delivery issue. They brought forth some fixes in software. There are still problems with that, and there are failures, but it’s not exactly clear where they’re coming from. We expect that some of it comes from the power delivery network.”

Limited options
A closer look at voltage scaling trends reveals even more nuanced challenges. As technology nodes continue to shrink, questions about how far voltage can be reduced and the implications on reliability become increasingly urgent for both process engineers and IP designers.

“Lowering the voltage is hard because we’re getting closer to the threshold voltage,” Rambus’ Woo explained. “Process engineers work to provide different types of transistors at each new process node that balance transistor performance and power. It’s becoming more challenging as we move to smaller process geometries. Chip designers rely on the transistors provided in the process technology and use these building blocks to assemble their chips.”

From an IP design perspective, further voltage scaling at 3nm and 2nm is more complex because the operating voltage is now close to the transistor threshold, risking reliability and performance. “As a result, we focus on innovating in IP architecture, advanced assist techniques, and sophisticated characterization methodologies to address these challenges,” said Synopsys’ Hobbs. “In addition to implementing advanced voltage scaling, it is essential to consider co-optimizing IP design, implementing robust assist schemes, and leveraging advanced modeling to achieve power efficiency and maintain reliability at the most advanced nodes.”

These voltage and reliability concerns are tightly coupled with the physical realities of chip design, particularly as devices scale down to ever-smaller geometries. As the conversation shifts from the theoretical aspects of voltage scaling and power domain integration, it becomes crucial to consider the tangible effects on parasitics, wire delays, and the overall impact on signal timing and power integrity.

“The resistance goes up, the capacitance goes up, and that makes the parasitics between the devices all the more important,” Siemens EDA’s Davis said. “And there is an impact on the delay between devices. We’ve always had this same problem, but it gets worse. And then, as you’re pushing margins, one of the things decades ago we used to talk about is critical paths. What are the critical paths? If you only have a few critical paths, you’re wasting area, so you want to push as many of your nets to criticality, or as close to it as possible, so that you’re most efficiently using your area and your performance. The result is that everything is on the edge, and you have much lower margins. Therefore, your approximations have to be more precise, your margin of error goes down, and going from technology to technology, this problem only gets more difficult, and your error margin goes down. That means you need better modeling, you have to run more scenarios, and you have to be more careful.”

As these physical and architectural constraints continue to intensify, the pursuit of precise modeling and efficient verification grows more vital. This convergence of challenges underscores the importance of striking the right balance between accuracy and practicality in power integrity analysis, especially as designers navigate the shrinking margins inherent to advanced process nodes.

“Power integrity is really all about scale and approximation — good-enough accuracy,” Davis said. “To do the right thing, you’d never be able to finish. You’d do the full extraction and run a SPICE simulation, and then come back a couple of years from now. The reason power integrity exists as a market segment is that it’s impossible on any large circuit. So you have approximations for parasitics, and you have approximations for the device operation. You have all of these, and then you have approximations of the operation of the circuit. The accuracy of those approximations is very important, and as you get further down, you have to model more effects to get the approximation in the same bucket. Also, as part of that, pushing down to lower margins, you now get glitch power, which you used to be able to ignore. That’s when signals arrive either together, where they cause simultaneous switching that you really don’t want, or they come in slightly delayed, so it starts to switch.”

New approaches
Against this backdrop of ever-increasing modeling demands and shrinking design margins, engineers are exploring new strategies to address power integrity and reliability in real-world applications. One approach gaining traction is the adoption of more granular voltage management techniques, which offers both opportunities and tradeoffs for advanced system designs.

But Rambus’ Woo emphasized this is a tradeoff. “Having more voltage levels allows better tailoring of voltages to circuit needs,” he said. “But then, the system or the chip must generate a wider range of voltages, which can add cost and complexity.”

Active power management, i.e., having sensors in the die to measure voltage drop and make corrections to the clock or other avenues for affecting the power consumption and IR drop, is another option. Davis called this “an absolutely viable and useful tool. It is a way that you can say, ‘It’s impossible for me to always know everything up front. Therefore, I’m going to dynamically adjust for it in real life.’ That has been done in the industry for some time, and there have been innovations there in the last few years. In a sense, it’s your insurance. But you can take it further than insurance to actively manage. I can lower my margins because I can actively manage them. It is a way to be more robust and to refine the margins a bit. Those sensors and the controls that get implemented there take space, but if you can have a greater surety of meeting your reliability and your performance goals over time, that is a huge win, especially for long-term and mission-critical applications. We see a lot of this in high-performance computing, AI, and so forth.”

Given these active management strategies and the ongoing evolution of both process technology and verification methodologies, how should architects and SoC designers practically integrate these insights into their design flows?

“Achieving reliable low-voltage design centers on a comprehensive, multi-level strategy,” Synopsys’ Hobbs said. “Advanced characterization and verification techniques, including machine learning-based LVF, moment-based modeling, and high-sigma Monte Carlo simulations are crucial for accurately capturing process variation and timing behavior. Design architects should utilize co-optimized IP, design flows, and EDA tools to address variability and ensure reliability. Implementing assist techniques such as advanced read/write schemes, power gating, and dynamic voltage and frequency scaling (DVFS) helps manage power while maintaining reliability. For applications with aggressive voltage or performance targets, collaboration with IP providers to develop custom memory architectures and logic cells may be necessary. Additionally, designing for robustness by incorporating extra timing margins, rail-to-rail pulse checks, and clock skew recommendations is essential. Ongoing innovation and teamwork are key to successful low-voltage solutions.”

Again, part of the design of any chip involves understanding process rules and their impact on performance, power, and area. “Design teams study the impact of process rules and account for them in the design of their chips,” Rambus’ Woo said.

Additionally, in an attempt to mitigate some of these issues detailed here earlier in the design process, much more simulation and verification can be done. But Fraunhofer’s Heinig said he is not sure if this is even seen as a big problem. “People spend so much time and so much money on verification of the functionality, that the logic really does what it has to do. And sometimes it’s really hard to understand why companies are doing so little on the power delivery network. Then, we have seen products from the automotive industry where companies spend so much time on functional safety, and we have seen how they design the power delivery network on the package. It never follows the functional safety aspects, and what they have done on the processors themselves.”

Conclusion
Faced with these multifaceted challenges and the evolving landscape of design methodologies, what practical steps can architects and design teams take today to proactively manage voltage concerns and avoid costly pitfalls in their projects?

This is not an easy question to answer because it is a design question, and there is no single tool that solves all these problems. “It’s more complicated than just using a tool, because it’s a methodology, and there aren’t perfect models for all of these pieces yet,” Siemens’ Davis said. “Everybody’s working on it. It’s evolving very quickly, and the tools are evolving very quickly. But ultimately, right now it really comes down to careful planning, careful partitioning, and real engineering work.”

A final consideration is that these challenges are not just happening with the most advanced technologies. “You’re starting to see, especially with 2.5D and 3D, the mixing of advanced technologies and more mature technologies for sensors, for all kinds of different parts that you’re putting together in these more complex systems,” Davis said. “So even for mature technologies, a lot is happening in image sensors and other kinds of sensors these days. IR sensors, radio sensors, lidar sensors, and vision sensors are not all about automotive. Some of them are in automation applications like smart cities, and in all kinds of different things. They’re putting in image sensors. They’re taking those chip dies and gluing them together so that they’re working together. That means you now have all those stacking problems, and that might be on a 90nm or 180nm technology, not 2nm. A lot of these challenges that the leading edge is seeing are also driving the more mature technologies. So don’t think it’s only the pointy end of the spear.”

Fundamentally, for this power integrity problem, the absolute certainty of whether or not the chip is going to pass depends on the circuit design, how it’s being used, and where and how that design has been implemented. “To fully predict whether or not your chip is going to pass, you’d have to know all possible combinations that are real that you’re going to do in the implemented chip, and make sure that it can get enough current and voltage to everything,” Davis added. “That’s not a solvable problem. It can only be approximated.”



Leave a Reply


(Note: This name will be displayed publicly)