Adding power islands and reducing the voltage may save battery life, but they also can affect whether the chip works properly.
By Ed Sperling
Reducing the voltage in a system on chip is like turning down the water pressure on a home plumbing system. Pretty soon you find out that not all the faucets work properly because there isn’t enough pressure behind them.
While it’s vital to drop the voltage to boost battery life in mobile devices, not to mention reduce the overall power consumption in plug-in devices, the effects aren’t always well understood ahead of time. Power delivery changes with the voltage, and not always in anticipated ways. The problem is that chips are getting so complicated with power islands and multiple cores that it’s difficult to anticipate all the possible permutations up front.
“There are indeed challenges,” said Jan Rabaey, who heads the Wireless Research Center at the University of California at Berkeley. “Fluctuations in currents are an obvious result of turning domains on and off.”
In fact, the more abrupt the on/off states, the greater the likelihood of power delivery problems. “It’s like hitching a car to a trailer and taking off,” said Srikanth Jadcherla, group director for R&D in Synopsys’ verification group. “It doesn’t move the same way.”
And the more power islands, the worse those problems get. “This is something that’s well known in the cell phone industry,” said Bhanu Kapoor, head of Mimasic, a low-power consultancy. “They’ve got ARM cores, DSPs and memory blocks on a cell phone processor and they have a power supply for all of these different modules. But when you need to switch on a new block, the power supply has to deliver power to both. The power supply inductor tries to guard against any change, though, so it actually gives parts a lower voltage. That causes a temporary malfunction.”
Thinking about delivery in the architecture
While the effect of power islands have gotten the lion’s share of attention in low-power designs, they’re certainly not the only things that can go wrong. Failing to account for all possibilities up front can cause problems that grow as the chip moves from architecture to design and verification.
“Blocked frequencies and domains shutting off are a result of badly designed power distribution networks, which can happen even if you don’t have power islands,” said Rabaey. “By changing the resonant frequencies of the power network, you may see potential interplay with the clock frequency of the modules. But again, this is a generic problem with power distribution networks and has nothing to do with having power islands or not.”
Problems also grow as the semiconductor process shrinks. One of the problems in delivery of power at smaller geometries is the width of the wires themselves. While most engineers went through school with the assumption that electrons move through wires at a fairly constant rate–depending upon the type of wire rather than the thickness of that wire—that’s clearly not the case. IBM first began noticing earlier in the decade that resistance of smaller wires was increasing due to electron crashes with the atoms in the wires. Increased density meant more crashes.
The typical route for chipmakers is to engineer a solution to these kinds of problems. But that also increases the complexity and the price, because it usually means more parts. A 10-cent decoupling capacitor for a chip that is sold in quantities of 50 million adds $5 million to the overall price. And that doesn’t include the additional cost for assembly, which typically adds another nickel, or $2.5 million.
More parts also mean more complexity in the design. And more complexity means more things can go wrong.
“There was one chip we were developing where the clock gating domain produced a spike in current,” said one engineer, who asked not to be named. “We came up with logic to control the wake up, but when you shut down the clock it staggers it. As you’d expect, it got stuck. So we took off the clock-gating circuitry and there was a huge droop in voltage.”
In another real-world example, chip development was stopped the day before tapeout because there was insufficient decoupling capacitance. That affects timing. The chip arrived at tapeout two days later because a crew of engineers worked solidly for 36 hours to fix the problem. Needless to say, they wished the chip architects had figured this out ahead of time.
Leave a Reply