A number of technical challenges have come together to make power grid design one of the most challenging design issues today.
Creating the right power grid is a growing problem in leading-edge chips. IP and SoC providers are spending a considerable amount of time defining the architecture of logic libraries in order to enable different power grids to satisfy the needs of different market segments.
The end of Dennard scaling is one of the reasons for the increased focus. With the move to smaller nodes, the amount of power per unit of area has been steadily going up since the mid 2000s. Combined with larger chips, and increased interconnect resistance, it adds up to a criticality for carefully designing the power distribution network, and then monitoring the IR drops — especially at critical points in post-silicon validation.
“Designing to the wrong power grid, especially the lower grid, where robustness is required to ensure rapid, instantaneous current response, means the design may suffer in terms of utilization, and therefore area,” said Leah Schuth, director of technical marketing for Arm’s Physical Design Group. “In fact, designers may also find violations on design rules that they can’t fix, even manually, without growing area. In addition, especially for finFET processes, the increased current capabilities of the transistors and high wire resistance prove challenging for IR drop and EM closure. For designs using low-leakage process nodes, low voltages and/or high threshold devices, the variation of these devices adds additional complexity.”
Schuth recommends process and voltage variation be analyzed early in the design cycle so designers can understand how signoff will be impacted, particularly IR drop closure. “Often IR drop validation is done toward the end of a design flow, but if the impact of voltage variation has not already been assessed, a more robust power grid may be needed to help compensate for the large voltage variations seen for these low-power designs.”
A key part of power verification is power grid integrity. That requires an understanding of how low-power designs complicate the supply reaching out to various instances in the design, and all of that takes time. But even with unlimited time and resources, fully verifying a low-power design is still difficult to achieve.
“Static timing analysis can perform the timing analysis, but could we figure out ways of just saying, ‘Yes, this design is completely low-power verified’,” asks Preeti Gupta, director of RTL product management at ANSYS.
Gupta says this is difficult because power goes hand in hand with vectors and with the activity in the design, so it would require scanning through many realistic scenarios and using an approach for power verification that includes functionality, power budgeting, power inefficiency, and power grid integrity. “From those perspectives, many of them actually rely on vectors. The same block with the same set of logic would be consuming 1 milliwatt of power in a certain mode and consuming maybe 100 milliwatts of power in another mode.”
Specifically for power grid integrity, not only are functionality, budgeting, and power efficiency important, so is power integrity and thermal integrity particularly in low-power designs.
One semiconductor company found out the importance of this the hard way. “This company had a design that went from a non-low-power state to a low-power state, and then from that low-power state back into the active state. There was a huge increase in current, and because of that, it coupled with the package inductance and led to a high voltage drop. That high voltage drop subsequently impacted the timing on the design, and it ended up impacting the functionality such that a functional bug was introduced because the timing impact was such that the signal that needed to be registered did not get registered,” Gupta said.
Here, it is extremely beneficial when tools identify such scenarios. Whenever there is a large change in power from one cycle to another or across a four-cycle boundary, or whatever the semiconductor company feels is the right threshold, those problematic cycles can be identified. Then, the engineering team can determine whether the power grid can handle that kind of a current surge.
Gupta cited another example from a leading cell phone company. “They introduced clock gating within a design, and a functional bug turned up in the design phase. A designer, in all their earnestness to employ this technique, marked that clock gate to turn on the clock all the time, thereby rendering the clock gate useless. The functional bug was resolved, but the power consumption went up, which shows how functionality and power get tied together in so many different ways.”
Fig. 1: How clock and power gating transitions cause large current swings that result in voltage drop. A chip going from “idle” to “on” results in a sudden increase in current, which can couple with package inductance and result in a drop in power supply, which increases circuit delay. Source: ANSYS
Leading-edge nodes
While there are benefits to advanced nodes, there are also a lot of challenges. “As engineering teams design on the 7nm, 5nm and small advanced nodes, one of the benefits is that they put more functionality on the same die,” said Jerry Zhao, product management director in the digital and sign-off group at Cadence. “Chips keep getting bigger and bigger, and the power consumption is huge — both static or dynamic. When you have that much power consumption on a piece of silicon, how to deliver that power is the critical design issue, which means that your power grid needs to be strong enough to deliver that.”
On the other side of the equation, the power supply is dropping to less than 1 volt, and the permissible margin of error is shrinking. “That is a huge challenge to design communities,” Zhao said. “You don’t want to over-design the power grid because that’s very costly. The silicon is very costly. Also, you definitely don’t want to under-design the power grid, either, because that’s devastating. A chip’s not going to work if it can’t deliver the power.”
Design engineers need to know if they put the full chip all together, is there a tool that can run everything, Zhao noted. “Today’s designs have multiple power domains, some of which will be switching on and off during the function of certain activities. If you don’t put them together at the block level, you can never analyze whether the grid is strong enough or not. Part of the reason is due to the power grid being highly coupled. Somewhere at one corner of the chip, if it draws power, the other corner of the chip may not have enough power, so it will totally fail. If you don’t put them together, you won’t know that.”
The second challenge involves ever-changing, increasingly complex design rules that the power grid has to obey. These include EM rules, and whether the design can carry that much current.
“Right now, if you can carry the current, it’s one indicator that your grid is strong,” he said. “But you also need to consider the long-term effect, and whether the power grid can sustain this much current in 1 year, or 5 years or 10 years. That becomes a statistical failure because of the EM rules, and it must be considered, as well. It’s very important for automotive and those kinds of applications. For cell phones, if you change cell phones every two years, you may not have that problem. But if you have a car on the road, that’s huge.”
Further, to make the design more efficient requires a big picture of the entire design methodology and the flows.
“Sometimes that can be what’s called a full flow,” Zhao explained. “You need to consider them from a very early stage of the design and come all the way down to sign off. So, for example, place and route. That’s where the physical implementation is going to be done, and that has a direct impact on how power is delivered and consumed. That’s perhaps the most important way to see if you have a strong grid. In fact, you may need to consider the power grid as early as the floor-planning stage. You may have some rough numbers you want to throw on the chip. Previously, you just looked at the floorplan. But now you look at whether the floorplan will create a grid problem. You can also think about when you start doing the placement, finding hotspots because of IR drop, and whether they can be fixed by the placement. In this way, the physical implementation can help the IR drop. The IR drop also will lead the way to how the place and route should be done so that by the end of the design cycle, before you go home and you say that you’ve closed the design entirely, the physical is one area that needs to be considered along with timing and the voltage variations on the grid that will have a direct impact related to the timing.”
Not all paths are affected. Power variation on a non-critical path may not have an impact. But the path also may be very sensitive to even minor voltage changes on the grid, and that could have a big impact on timing. The challenge is finding those paths, which can vary from one design to the next, and that requires a good understanding of how to run the power analysis on the grid.
Machine learning is starting to be applied here to recognize problematic patterns faster. That has an impact on IR-aware static timing as well as thermal management.
“If you have 100 watts, or 200 watts on one die, some of the wires could actually melt,” Zhao said. “How are you going to analyze your thermal and your IR drop (which is electrical) together? This requires co-simulation of electrical and thermal, where you consider your electrical charge of all the current waveforms and how they’re going to impact the temperature on the die. The beauty of this is that some of the critical paths in a design may not be voltage-sensitive at all, and what will fail is the one they will pass when you run the traditional static timing. However, if you combine the voltage sensitivity to those paths, then all of a sudden they become dangerous because they will fail with the timing. That regular path becomes a critical path in this IR-aware static timing analysis.”
Connected and interdependent
Within a complex SoC design there are many interdependencies. Voltage, for example, is related to place and route, timing, and thermal. And while power consumed in one area needs to be protected from big voltage drops, it increases the heat inside a chip, the package, and even the system. So what used to be point tools are now part of a broader system analysis.
Some of this needs to be measured internally, as well, both during the design phase and as chips are used in safety-critical and mission-critical applications.
“Did I design my power distribution network right, or have I got nasty unexpected voltage drops in the supply to my critical circuits in the middle of the chip,” said Richard McPartland, technical marketing manager at Moortec. “Or is the process variation a bit more than I was expecting around my chip and power/performance/reliability optimization? All SoC teams want to optimize one or more of those and in-chip monitoring is necessary for that.”
This can help provide more precise measurements for the power grid design, as well, when designs are implemented in silicon.
“Designing a good power distribution network on an advanced node for a large chip (and some chips are really large) is not an easy task,” McPartland said. “So when you get the chips back from the foundry, it is extremely helpful to be able to check that you have the correct supply voltage at each critical block. Circuit speed is strongly dependent on supply voltage, so any unexpected/unplanned for voltage drop may cause a timing violation. With in-chip monitoring IP, voltage monitors can be easily embedded, each of which supports 16 sense points. Those provide visibility on the supply at the critical blocks, as well as the voltage drop between the supply pin and the critical block.”
A final consideration is the impact of the power grid design itself on overall power.
“When you want to do a very solid power analysis, but know your chip actually works, it is possible to measure the impact of a very solid power grid on actual power consumption,” said David Ratchkov, founder of Thrace Systems. “More metal is more overhead. It gets more difficult to route. You get more routing and you get lower Vt cells. So there’s a cascading loop there. For the very solid power grid, this can actually cause problems.”
Related Stories
Power Complexity On The Rise
New architectures, different markets and more variables make it increasingly difficult to design and verify low-power chips.
Focus Shifts To Wasted Power
Low power is no longer enough. Is all of the power consumed usefully? Low energy is the new goal.
3D Power Delivery
The design of the power delivery network just got a lot more complicated, and designers can no longer rely on margining when things become vertical.
Low Power Knowledge Center
Leave a Reply