Power Grid Analysis

Promising technologies emerge to break the capacity and performance wall.


By Christen Decoin
With increasing design size at each technology node, power grid analysis (PGA) has been stretching established software capacity and performance for some time. At 32/28nm, capacity and performance issues finally presented significant barriers to achieving signoff.

In this article, we explore existing approaches that EDA vendors have been trying to leverage to work around these problems, as well as current flows customers have put in place to complete full chip sign-off PGA. After that, we’ll examine some potential paths to truly resolve these problems going forward.

EDA Vendors: Tackling the PGA Capacity and Performance Wall
Hierarchical Power Grid Analysis
One of common approaches used by existing PGA solutions is called Hierarchical PGA. This flow is based on the fact that you can run PGA on IP blocks of a manageable size, then generate an IP Power Model that can represent this IP in the chip level analysis (Figure 1). Using this approach, users avoid the typical long turnaround time and memory usage issues they would have encountered during a true full chip PGA. While this approach is widely used by the industry, it has known shortcomings. The main issue is that several IP blocks often share the same power supplies, which implies that an IR drop issue in one IP block could be directly linked to another IP block sharing the same power supply.


Figure 1 – Hierarchical PGA uses IP power models to perform full chip analysis

Additionally, this approach is not compatible with in-rush types of analysis. For instance, if you turn on IP1 in Figure 1 while using IP power models, you will not be able to see the in-rush current effect on IP 4 and IP2.

Power Grid Extraction Hierarchy Leverage
Within the PGA process, there are two main components that are computationally expensive in term of performance and memory usage: the power grid extraction (PGE), and the static or dynamic solve that generates the analysis data. To tackle the PGE capacity and performance issues, established solutions commonly use hierarchy leverage. The principal of hierarchy leverage is pretty simple—by leveraging the fact that DEFs of full chips are hierarchical these days, the PGE software runs sub-DEFS separately, then merges the resulting netlists into one (Figure 2). This approach has a dual advantage: not only does it allow the PGE to run each DEF on a separate machine (which mitigates performance and memory usage issues), but also, when a sub-DEF is called through several instances, the PGE tool only needs to extract the sub-DEF once, then copy the results for each additional instance, providing a substantial performance gain.


Figure 2 – PGE leveraging design hierarchy

This approach does have some key issues that will generate systematic inaccuracies in the overall PGA results:

  1. When running extraction, the in-die variation usually takes into account the nets’ environments, but that is not the case at the boundaries of these sub-DEFs taken out of their full chip context. For instance, while running extraction on DEF 2, the in-die variation does not take into account the environment of the nets at the boundary of DEF 2. From a top-level viewpoint, DEF 2 and DEF 3 are adjacent, and the nets at the right border of DEF 2 will have neighbor nets from the left border of DEF 3 (which is not the case if the extraction is run separately).
  2. When running the extraction on a sub-DEF, designers miss potential top-level nets that overlap sub-DEF areas.

Design Slicing
Although slicing a design to be able to run it through PGA is a technique usually used by designers (as discussed in the next section), some vendors are looking at ways to support this natively in their PGA solutions, so far with limited success. Native design slicing is not an easy task, as the slicing generates many accuracy issues (due to boundary effects) that can trigger false voltage drop or current density hotspots. Design teams have the advantage of knowing their designs really well, which is not the case with an automated approach that implements basic slicing based on pure design size and software limitations. Such an approach is not likely to be accepted by customers, who will prefer to control the process.

Semiconductor Companies: Process Flows to Complete Full Chip Sign-Off PGA

Design Slicing
Design teams that need to run existing PGA solutions on large designs often use a design slicing approach. To limit boundary effects, the designer needs to clearly understand the optimal slicing strategy for each design. Because they know their designs very well, and have past experience from previous projects, designers have the know-how that enables them to effectively leverage such techniques.

Even then, the process is extremely costly time-wise. Once each slice is run through the PGA, the merging and analyzing portions of the process are extremely slow, as the design team needs to study each reported error and determine if the issue is linked to the slicing, or if this is a real error in the power grid network. In fact, even though the runtime of PGA solutions on such large design slices is horribly long, designers commonly spend even more time on analyzing the results of these runs.

Hierarchical PGA
Another approach that designers are using to speed up their PGA flow and work around capacity issues is hierarchical PGA. Some designers, when dealing with repetitive types of design (FPGA, GPU, etc.), do use the power model approach available in the existing software. In doing so, they use the repetitiveness of their design and the fact that they can over-design its power grid to consolidate the flow and try to limit the risk inherent in this approach (as described earlier).

Coverage Limitation
One last approach used by designers is simply to reduce the coverage of their PGA flow. Sometimes design teams must over-constrain the design of their power grid, which they know will impact their margin, but they have no good alternative. The coverage limitation is used primarily to speed the tedious turnaround time, but does partially address the capacity issue, using three different approaches:

  1. Selective PGA: designers run only static PGA for their full chip sign-off, limiting the impact of the poor performance of established solutions. They either do not run dynamic PGA, or run it only on IP blocks or chip parts (if they need to fine-tune their analysis on an issue flagged by the static run).
  2. Nominal coverage: designers rarely run PGA on more than two corners, and it is not uncommon for them to run only nominal coverage, hoping that their over-designed power grid makes this coverage sufficient. Clearly, running PGA on only one or two corners (instead of a dozen) speeds up the sign-off PGA.
  3. Fewer clock cycles coverage: for dynamic PGA activity, propagation is key. Designers commonly run their dynamic PGA for 5 to 10 clock cycles to get optimal coverage of potential issues, but in some cases, they run dynamic PGA for only a couple of clock cycles.

A Look Ahead
As of today, there is no real silver bullet that can break through the PGA capacity and performance wall. Designers are stuck with painful working flows, while EDA vendors release workaround solutions to keep their aging PGA technology afloat. But there are some interesting paths to explore to tackle this problem, especially for the PGE component, which is the main source of capacity and performance issues. For instance, taking advantage of multi-voltage domain definitions for SoC implementation, or employing an enhanced hierarchical approach, where the environment could be taken into account at some level. On the solver side, there are existing offerings on the market that could enable a distributed solver to be used in a PGA solution, which would drastically improve performance. While there is no clear road yet, new technologies and approaches are emerging that may lead us to a resolution.

—Christen Decoin is product marketing manager for new and emerging markets for Calibre Design Solutions at Mentor Graphics.


[…] errors, due to boundary effects. While both approaches have their pros and cons, as detailed in my previous article, the design constraints introduced at 20 nm and below make both of these approaches irrelevant […]

Leave a Reply

(Note: This name will be displayed publicly)