Heat Wreaks Havoc

Problems caused by leakage continue to grow as density increases, but the effects are not always linear; accurately predicting power becomes essential.


By Ann Steffora Mutschler
As semiconductor manufacturing technology has scaled ever smaller, the density of power grid networks has caused on-chip temperatures to rise, negatively impacting performance, power, and reliability.

CMOS technology, still the predominant material in SoCs, was originally conceived as a low-power technology when compared with the bipolar approach, which was a very high-power technology.

“For many years it has been a very low power, very power-savvy technology. Moving from one technology node to the next would basically cut the power consumption by half. This was great because you could basically integrate twice as many transistors within the same power budget,” explained Marco Casale-Rossi, production marketing manager for Synopsys’ implementation group.

That was ideal when electronic devices were powered by the plug in the wall and weren’t hampered by batteries, he said, but when we moved to mobile it was already too late. “Basically, what has happened in the last decade is that we have shrunk basically the width and length of the transistor but our ability to shrink the thickness of the transistor is much, much lower.”

With the move from 45nm to 32nm, and then from 32nm to 20nm, there are twice as many transistors. Without leakage, the power consumption would remain the same, but because of leakage it goes up quite significantly. At 45nm, in a typical process technology the total power consumption is dominated by leakage. There is more leakage power than active power and the problem is that it’s there whether you are doing something or not and drains power from the battery.

“There are no secrets here,” said Greg Bartlett, senior vice president of technology and integration engineering at GlobalFoundries. “Power problems started at 130nm and have gotten worse since then. Historically, the problem was standby power, but it has shifted. There’s been a lot of talk about operating at a lower Vdd to help with this, but the only thing we’ve been able to do with every new technique is to forestall the problem. It comes back one generation later.”

With each process shrink leakage goes up exponentially with temperature—by a couple of orders of magnitude when going from room temperature to 125 degrees.

“Heat is a killer of electronic components. One of the issues, especially as we’ve gotten into some of the smaller geometries, is that the leakage current becomes exponential with the temperature. Small increases in temperature can have a large impact in the amount of current and heat that’s being generated by the actual chip or silicon, and clearly that’s not a good situation because if you add heat to it, it generates more current, which generates more heat, which generates more current—it’s going the wrong way fast,” said
Barry Pangrle, solutions architect for low-power design and verification at Mentor Graphics.

“Any type of engineering problem really, if you’re going to address an issue like that, it’s important that you have tools that can actually give you an accurate analysis so that the designers know what’s happening with the design and they can take measures then to control that and change the design to keep the design within the parameters that they need to,” he continued.

Vic Kulkarni, general manager of the RTL business unit at Apache Design Solutions agreed. He said that before the power densities can be measured, the right power-sensitive stimulus must be selected and that knowledge must be pushed all the way through the design planning process towards package selection. “The stimulus is becoming more and more critical in terms of really looking at and predicting what will happen to power before it reaches the physical side and meets these power density issues. If you are not properly predicting the power in the beginning of the flow then everything becomes academic. You overdesign or underdesign by definition.”

He recalled the horror story of a system designer who thought he would be designing a chipset at 10 watts, but when the ASIC came back from the manufacturer it ran at 5 watts. The explanation from the ASIC manufacturer was that they could not predict that well because they are at the back end of the flow from the RTL and microarchitecture decisions.

This illustrates the huge disconnect between designers using conventional spreadsheets, looking at library elements or taking guesses as to the activity factors and true power predictions. “The root cause of the issue is the ability to predict as much as possible, as close as possible to what your final, worst-case power states will be and then designing for such in terms of test patterns and so-called power patterns,” Kulkarni said.

Adding up the power
Add some 2.5D or 3D stacking to the power density mixture and things really heat up. Matthew Hogan, technical marketing engineer for LVS in Mentor Graphics’ Design-To-Silicon Division, observed one of the big issues for designers in terms of dealing with power density and stacking is that the thermal profile is cumulative going up the stack.

“One of the big concerns that they are looking for is if they do have a hotspot on a die and they are stacking it with either the same type of die or different die, and if there is a way they could rotate or make sure that the hotspots are not coincidences as they move in the vertical direction,” he said.

Engineers try to even out the thermal profile naturally through its operation and try to get a better understanding of what the dynamic thermal profile would look like. They do this so that when it’s in its operational mode that they don’t have, for example, the bottom left-hand corner be the hotspot for the entire stack while the top right-hand corner is cooler. Designers want to know how to move the hotspots on each of the dies around so they can create a more even thermal profile for the whole system rather than on a chip-by-chip basis. That turns it into a system and system verification problem, Hogan said.

When and where that analysis is done depends a lot on what internal flows and processes have been implemented.

“Ideally it would happen at a floorplanning stage where each of the design groups get a thermal budget or a power budget, because thermal and power are somewhat intertwined in the IC side. When they get their budget for their block and you’ve got a floorplanning region, you should have a reasonable estimate as to how much power is going to be used by this block, or at least what your budget is,” Hogan said.

Added Mentor’s Pangrle: “If you’re starting with cruder estimates at the beginning, as you get more information about what the final implementation is going to look like, you can improve those analysis and estimate numbers and continue to do a sanity check the whole way through. Any type of flow that somebody’s going to put together they’re going to want to make sure that as they’re crossing these different levels of abstraction that in fact they’ve got a framework where they can have these paths and loop back and put this information as they get it in to make sure that in fact its all still going to hold together.”

The temperature-leakage loop
The temperature-leakage loop discussed above is the very reason why 3D IC is causing so much concern. Its structure hampers the ability of the silicon to dissipate the heat.

“My impression is that manufacturing will not help in terms of heat dissipation. The only way to reduce the power consumption and avoid the heat dissipation issues in the future will come from design techniques,” Synopsys’ Casale-Rossi said. “If you think of today’s processors they are built with voltage islands so you can turn on and off a portion of the IC when you don’t need it — this was not needed 10 years ago, but now it is a method of survival. Moving forward, design and of course design automation—because all these techniques are awfully complicated—will be important to mitigate the power related and heat thermal related aspects.”

But there is at least some continuity in the tools flows. To accurately model, analyze and predict worst-case power problems in today’s chips as well as future 3D ICs, it is now widely agreed that most EDA tools will undertake an evolutionary change – not revolutionary, as some had predicted.

“Evolutionary means, for example, a place and route tool will need to understand that a certain area is forbidden because there is a TSV there,” Casale-Rossi said. “But this is not a big deal. Test will evolve because the accessibility of the various tiers will go down. I anticipate that all of the JTAG and BiST related technologies and algorithms will have a great future ahead. Extraction will need to account for the capacitance and the resistance introduced by the TSV. It will be the same for DRC and LVS. There will be more rules to be verified but it’s not a revolution.”

The reality is that many people are quietly doing 2.5D and 3D IC experiments without much pain because there are workarounds and scripts. Later, when it is understood what is really necessary, the scripts will get incorporated into the code of the tools and become an integral feature of the tools. “For the time being, the amount of code that is really needed is minimal,” he concluded.