Power Impacting Cost Of Chips

The cost of designing a power delivery network is rising, and that’s not likely to change.

popularity

The increase in complexity of the power delivery network (PDN) is starting to outpace increases in functional complexity, adding to the already escalating costs of modern chips. With no signs of slowdown, designers have to ensure that overdesign and margining do not eat up all of the profit margin.

The semiconductor industry is used to problems becoming harder at smaller geometries, but until recently power was rarely viewed as an extraordinary expense.

“In the good old days, before 130 and 180nm, the power grid usually was an afterthought and people did not even spend a lot of time to look at IR drop,” says Jerry Zhao, product management director, power signoff at Cadence. “People now spend more time to close power than timing because there are so many variables.”

A lot has changed in that time. “From being a non-issue, designers started to realize the issues surrounding power integrity,” explains Tobias Bjerregaard, CEO for Teklatech. “Then they started to analyze the problem, using power integrity sign-off tools to understand the implications. Finally, they have come to the stage of needing to optimize designs, in order to stop leaving money on the table. At 10nm and 7nm this is not nice-to-have but need-to-have in order to succeed business-wise in an increasingly competitive semiconductor market.”

How we got to this point
The power delivery network started life as an almost invisible part of the infrastructure of a chip and had negligible impact on power, performance and area (PPA). “Back in the ’90s the chip power ‘plumbing’ was implemented according to rules of thumb, and the sign-off check was about connectivity,” says Bjerregaard. “I’ve heard of one or two ASIC engineering runs that failed simply because power or ground pins were not electrically connected.”

As chips got larger dynamic switching activity increased, and this started to create issues. “This made the resistance of the supply network an issue,” says Drew Wingard, chief technology officer at Sonics. “This was an issue even for 0.5 micron. We were already starting to see the need to calculate the IR drop of the PDN. We were starting to add dummy transistors, used as capacitors, to supply a local source of charge that would minimize the effect of di/dt on the bond wires.”

But most of the problems were solved by using more wires for the PDN, which was not a great concern because metal layers were being added. “An increasing fraction of the total number of meters of wire on a chip is spent on the PDN, because you can afford to,” adds Wingard. “But those wires got a lot thinner and their resistivity did not stay constant. So the PDN got more complex because it used more wires. This was an issue by 0.35 micron.”

In fact, according to TSMC, the resistance of metal layers has doubled between the 40nm and 7nm nodes.

Analysis became essential. “In the ’00s, power integrity sign-off with dedicated IR drop analysis tools – static at first, then dynamic – became the norm,” says Bjerregaard. “There were specific voltage drop criteria to meet, that could be specified as 10% static/17% dynamic voltage drop. As long as you were within the margins, you were fine.”

But even then, timing and power became interrelated. “Timing constraints were impacted by voltage drop, so they had to create margins,” says Zhao. “When the voltage drops, you have to consider what happens to the clock.”

Andrew Cole, vice president of business development at Silicon Creations, explains this interaction. “Supply noise hurts the clock by creating jitter. Jitter is important because it reduces timing margin and can limit the speed of logic circuits, or even cause them to fail. When the clock is used for SerDes, the jitter can prevent the SerDes from operating as fast as it should. Jitter also can be seen as distortion, so that the signal-to-noise ratio of clocked analog circuits is lowered.”

This can be particularly problematic if a design requires low skew clocks. “Gate delay changes with supply voltage,” says Cole. “So if the supply voltage changes quickly (while a clock edge is inside a clock tree), the clock edge will be moved faster or slower. This means the edges will come out of a clock tree that has a noisy supply with period jitter, even if they were perfectly clean going in.”

But that was only the start of the problems. “Once we get to around 180, and certainly by 90nm, designs were having issues with leakage,” says Wingard. “Now, CMOS devices do not switch off completely. The fraction of the energy that is spent in switching versus static leakage begins to go out of whack, and there are only a few effective ways around that. Two options are Power Gating or Dynamic Voltage and Frequency Scaling, and they require that you can partition the PDN into multiple pieces so that you can either take a piece and shut it off or vary the supply voltage. Both of these make power distribution a lot more complicated. Alternatively, you can have transistors with multiple threshold voltages. Then you only use the transistors that leak a lot only on the critical paths.”

The problem space continues to grow. “Inductance (L) became important from a system-level perspective,” adds Brad Brim, senior staff product engineer at Cadence. “It is not just the inductance of the package. Coupled with the die capacitance, it can cause resonance, and that causes things such as droop. For FPGAs or large processors, it was not just the inductance but the amplitude of the current plus the di/dt. So whenever you had high switching activity and really high currents, that is when the L came into play.”

As current density increase, additional problems come into play. “Electromigration (EM) gets more severe as you get down in node geometry,” says Zhao. “This affects the reliability of the chips.”

Self-heating is another problem created by the 3D structure of the finFET. Heat is trapped inside the transistor and it heats the wires above it. This can impact many of the factors already discussed.

“If you put that on a PCB, the current comes out of the chip into the PCB an generates heat, which causes expansion and can create warpage,” said Aveek Sarkar, vice president of product engineering and support at ANSYS. “That can cause solder balls to fall off or detach.. This is where significant work needs to be done. The chip is the source of the heat, so you need EM solvers for the package and you need to understand the current flow and how that affects mechanical stress. This is all one chip-package-system workflow.”

ansys6
Fig. 1: Self-heating in dielectric layers can cause temperature decay. Source: ANSYS.

A final factor that has added to the complexity is reducing voltage. “At ultra-low operating voltages, small voltage variation can have a significant impact on timing,” says Manoz Palaparthi, technical marketing manager at Synopsys. “Voltage drops differ significantly for different scenarios. Noise margins have shrunk to the level where using pessimistic voltage setup to drive full-chip timing analysis gives sub-optimal results.”

The situation today
Today, at 10nm and 7nm, things are coming to a head. “Power is now deeply entangled with everything,” says Bjerregaard. “Metal has become a scarce resource because routability is determining area, and so designers are scrambling to achieve the desired area utilization. At the same time, more metal is needed in the power grid to accommodate the increasing power density. In addition, on-chip decaps cannot solve the problems as they are getting less effective and have shorter reach.”

Power switching adds another layer. “You needed to make sure that the inrush current would not break the network, especially the blocks close to the ones that are switching,” says Zhao. “Neighboring functional blocks may experience a large IR drop and this may mean that they lose their functionality. To avoid that, people tended to overdesign the grid. But at the finFET nodes, transistor density is scaling more than the wiring and that leaves you with a routing congestion problem. If you over design the power grid, that will consume too much of the routing resource.”

This has had an impact on the tools and flows being adopted. “Power and IR drop analysis have become an integral part of the design implementation flow,” says Arvind Narayanan, product marketing architect for Mentor Graphics. “Tools and technologies have evolved to provide designers with the ability to accurately perform power analysis as early as the RTL stage to better understand the impact of power on the power network and also on the design.”

Tighter integration between the analysis and implementation tools has enabled designers to analyze the complex interdependency between power, voltage drop and timing. “Accurate power and IR drop analysis enable optimal power grid design without over-design,” adds Narayanan. “In turn, this enables optimizing the available routing resources and silicon area. Power optimization techniques such as finFET-aware optimization, activity-driven placement, power-aware clock tree synthesis, register clumping and others enable total power reduction.”

Some work can even start before RTL. “The current consumption of the functional blocks will have an impact on the grid,” Zhao points out. “So you can start as early as floorplanning. When you have the complete physical design done, you can determine where you need a stronger power grid and which areas do not have problems. This causes some people to want to do early analysis of the power grid. You need to look at how the current is distributed.”

For the back end, power analysis is now deeply embedded. “It is no longer just a sign-off check before tapeout,” says Bjerregaard. “This is done in order to avoid costly overdesign and to avoid power failures. Careful balancing is required, as generous margining impairs profitability.”

Emerging problems
The good news is that there do not appear to be any large new problems on the horizon. The bad news is that all of the existing problems continue to get worse.

“New nodes do have much stricter design rules,” says Zhao. “Many of these are related to EM and set maximums for each layer of metal. They also can define limits for connections between the layers using via technology, and they also look at the EM failures in a statistical way and people talk about failure in time.”

But the biggest issue may be the problem itself. “The size of the design at 7nm and 5nm will make it very challenging for any power analysis tool because of capacity,” continues Zhao. “You may have multiple Vdd and Vss, and you want to analyze them together. It could exceed several billion instances. Power grids are very highly coupled, meaning that you have to analyze the full chip – flat. That is the biggest challenge.”

An orthogonal change that may add to complexity is packaging. “Fan-out packaging has a lot more area fills, so it is hybridizing what we think of as a package with continuous or contiguous planes,” says Brim. “That brings some of the electrical constraints, such as increased inductance and resistance, into those packages.”

The design of the power delivery network is certainly not an afterthought anymore. Aspects of it have moved all the way to the head of the design flow. They have become part of the partitioning and floorplanning task, putting pressure on tool vendors to provide estimates much earlier in the flow. But increasing accuracy will become a challenge as design sizes continue to grow.

Related Stories
Designing SoC Power Networks
With no tools available to ensure an optimal power delivery network, the industry turns to heuristics and industry advice.
Choosing Power-Saving Techniques
There are so many options that the best ones aren’t always obvious.
Partitioning For Power
Emphasis shifts to better control of power when it’s being used, not just turning off blocks.