Heat is a multi-disciplinary problem, and it’s growing harder to deal with at advanced nodes.
Making sure that smartphone you’re holding doesn’t burn your face when you make a call requires a tremendous amount of engineering effort at all levels of the design – the case, the chips, the packaging. The developers of the IP subsystems in that smartphone must adhere to very strict power and energy thresholds so the OEM putting it all together can stick to some semblance of a product design plan.
Diving into the internals of the smartphone, particularly the temperatures of the wires in a chip, is critical information for determining the allowable currents on those wires to meet the expected mean-time-to-failure as described in Black’s equation, according to an ANSYS/Apache technical paper submitted to the IEEE Electronic Components and Technology Conference in May.
This measurement can be used to predict reliability failure from the electromigration (EM) phenomenon on metal wires, which over time generates undesired open or short circuits, said Norman Chang and Stephen Pan, the paper’s authors. “Wires/devices temperature impacts power — particularly leakage power, an exponential function of temperature —along with resistance, EM limit, and consequently on EM, IR/dynamic voltage drop, signal integrity, ESD, and timing.”
Chang, who is vice president and senior product strategist at Ansys, points to thermal coupling as one of the big new worries for finFET-based designs. “One of the problems we’re seeing at 16/14/10nm is thermal hotspots and thermal coupling on wires. At 16nm, in 1 cubic micrometer, there are 3,000 wires. If you go 1 more node, to 10nm, there are 5,000 wires. In such a small space, each wire is carrying a significant amount of current.”
Further, while overall system power is going down, largely due to more aggressive power management techniques, the blocks within those systems are staying relatively stable. There are finite limits, for instance, on how far the voltage can be ratcheted down in a memory subsystem without losing data. This makes thermal management a particularly thorny problem.
“Every platform has a thermal envelope, and every module within that platform has a thermal envelope,” said Ely Tsern, vice president of the memory products group at Rambus. “So far, the envelopes for those modules are staying pretty stable.”
But that also means much more work for integrating an increasing number of functions into mobile devices such as smartphones.
“Quite simply, the superphone is trying to cram as much capability, as much performance, as much user experience into this handheld device as will physically fit,” said Jem Davies, ARM fellow and vice president of technology.
Physical fitness in this case depends on how much power a device can dissipate thermally, because almost all of the energy that goes in comes out as heat (a small amount escapes as light). This makes it a power problem rather than an energy problem.
“If battery capacities increased tenfold tomorrow it wouldn’t help me in this case one little bit because it’s a power problem,” Davies said. “The absolute magnitude of that number will be dictated by many different things, predominantly well outside of our control starting with case design. Is it an aluminum case, is it a plastic case, has it got holes drilled in it, has it not got holes drilled in it? The main applications processor chip is usually one of the biggest producers of heat in the device. What is the packaging made of? Is it cheap packaging? Is it expensive packaging? How is the heat being dissipated?”
That’s a key point to consider in subsystems, as well, because they contain a set of diverse pieces, according to Cadence Fellow Chris Rowen. As a result, engineering teams need to consider the physical composition of those pieces and whether the power is reasonably evenly distributed.
“Even if the average power is okay, if there are hotspots within it, you’re going to suffer some unexpected side effects,” Rowen said. “The reliability in those spots is not going to be good. You may not be able to stay within specified ambient temperature limits for a given packaging system, so you’re going to have to think a little bit about how to make sure the power is reasonably well distributed across the subsystem. Ultimately, most of that packaging question, most of the question of how you distribute the heat from around the area of a semiconductor device, is addressed at the whole chip composition level. We can provide some guidance there, but some of the problem is going to fall back on the chip constructor — the SoC guy.”
And while the SoC team won’t share the temperature limit number, there is a number, Davies said. “Different manufacturers, different devices, different form factors, is it a small phone, is it a big phone — there is an overall heat dissipation number. Maybe 4 watts or 5 or 3, and then they say, ‘You don’t actually get to spend all of that. Somebody is going to spend a whole bunch of that in the display, as displays are now a very big source of heat. And you don’t get to spend some of it directly. Some of it will be dissipated in the dynamic memory, which is therefore indirectly spent by me depending on how much bandwidth I use because RAM quiescence doesn’t use much energy, therefore create much heat. But if I use the RAM a lot, that will generate a lot of heat.”
He explained that after many more calculations of secret origin, “finally you peel down to a number, and that’s for everything we’re using. That might be CPU, interconnect, GPU, video processor, display processor, everything, and you usually get a number. That number, interestingly, is smaller than the sum of all the maximum dissipations of all of the components, so any of us, flat out, could probably use that number. The name of the game then is in sharing that power out intelligently amongst the various parts. The handset manufacturers who make the most interesting devices and most successful devices are often those who have best solved this allocation problem. It’s a many-dimensional problem. It involves the individual IP blocks, the way the IP blocks talk to each other, it involves the software, the operating system, the scheduler, who gets to run fast that dissipates more heat, who gets to run slowly and generates less heat.”
Measurement is critical
The key to thermal management in IP subsystem development is this: that which gets measured, gets done.
“Measurement of what you are doing must become a first-class citizen in the design flow,” Davies said. “What you find is you can send everybody on fantastic courses on low power digital design. They design things the way they think they ought to work, and lo and behold, it still uses too much power because there will be little bits that have been missed along the way. It’s very easy to leave a bit of the design ticking away in the background. Anything that’s actually working is using power. You will often overhear digital electronics design guys talking about, ‘Oh, I’ve got the latest waves back. I’ve had a look at the waves.’ All they are doing is feeding the RTL design through an RTL simulator, which runs on huge datacenters.”
ARM itself has tens and tens of thousands of CPUs in datacenters, consuming megawatts of power to run simulations overnight. The story is similar at the other major IP players such as Cadence and Synopsys.
“Yes, you can start with a great, efficient design, but unless you’re rigorously measuring everything you do and feeding back, it’s very easy to step into a power problem,” Davies said. “Rigorous testing and cycling round feedback of individual blocks—for that you’ve got to have measurement and testbenches front and center in your design methodology and validation methodology. Then, you have to build that up. You’ve now got your kit of parts, and as you put your LEGO blocks together you have to instantiate a measurement methodology on top of that to see how the blocks interact and make sure none of them are active when they don’t need to be. You finally get to the big block, such as the GPU itself, and then of course you’re going to connect that to an interconnect and a CPU, a DRAM controller, some real DRAM memory. Again, you have to go through the whole procedure again.”
Start at the source
Another thing to keep in mind about thermal management is that it’s just power.
“You really have to think about what the power dissipation is of this subsystem under realistic application scenarios, so Design For Power is something that is absolutely pervasive in the way we approach the design of IP,” said Cadence’s Rowen. “It’s essential to figure out what the application scenarios of interest are, and what the power dissipation is that you’re going to see because thermal management at the IP level — since we don’t have any direct control over the package per se — is about how to not generate heat, not about how to dissipate the heat that does get generated. You have to start at the source.”
There are a number of approaches to take from here, starting with figuring out the optimal architecture for the application scenarios at hand.
“When you look at processors, for example, there are things that you can do in the microarchitecture, in the logic design and in the instruction set architecture for a class of applications that can have really quite a dramatic impact on the power dissipation,” said Rowen. “In fact, the biggest levers you have on power dissipation are really about what you can do architecturally.”
Specifically, the architecture can be tuned to have the right kind of computation so if there is a choice between doing it in floating point versus in fixed point or integers, integer computation is lower power.
“If you can map it such that you can do integer computation instead of floating point, you may save a factor of two or three in energy,” said Rowen. “Then, if you can find the right kind of parallelism that’s expressed in your application — it may be instruction- level parallelism that allows you to do several independent operations, it may be data parallelism that allows you to do all of the computation on different data with the same instruction — those things will tend to increase the performance more than they increase the power because they reduce the energy.”
One of the big advantages of parallel architectures is they can significantly lower clock frequencies required to do computation.
“Low clock frequency has lots of benefits —lower power, typically less electromagnetic interference, the possibility of hotspots,” Rowen explained. “If you can find a way to get the throughput you need at lower megahertz, that’s a good thing. In fact, some of it you see even with the emergence of multicore, which is a relatively crude form of parallelism, but generally if you can distribute a task across multiple lower-performance processors, you will have less energy at the same level of overall performance than if you had one super-big processor that was optimized to the hilt for high megahertz.”
There are lots of little steps that can be taken that add up to big savings, as well, including clock gating and data gating in the architecture.
“There’s not a magic bullet, but there are several different techniques and architectural enhancements that people make across the board, whether it’s the subsystem or just the processor or peripheral development,” said Rich Collins, product marketing manager for IP subsystems at Synopsys.
And while not specific to subsystem development, the stalwart power reduction techniques apply.
“Whether it’s clock gating at the module level or just at the flip flop level in order to shut down the clocks because that’s probably toggling the most so that’s a big architectural tradeoff, those are important, too, said Collins. “There is power islanding, so you can have separate voltage domains and shut off large chunks of the SoC, for instance, if they aren’t needed. Sub-threshold libraries is a newer development in the industry. If you look at the equation for switching power, it’s exponential on the voltage, so that’s the biggest knob to dial down the voltage and get the biggest bang for your buck in reducing power.”
At the end of the day, thermal management in IP subsystem development is a multi-disciplinary problem that includes software, drivers, operating systems kernels, as well as hardware. “It’s all of these things together. It’s hard, and when we get it right, we’re proud of it,” said Davies.
At FinFET level you really want to move to asynchronous design techniques for your logic so that you can tune it’s speed to match the thermal limits, rather than just have stuff fail because it got too hot. Unfortunately some EDA vendors are still wedded to an archaic RTL only flow with simulators that can’t verify asynchronous circuitry (outside SPICE).
If you die-stack it gets worse because mechanical stress adds more variability to timing and you are also more likely to get hot spots, and more CDC issues.
With regards to starting at the source… Richard Feynman, though obviously more well known for his work in fundamental physics, wrote a book about Computer Science. In it he tackled the fundamental energy needs of computation. In that book (Lectures on Computation) he describes the ability to greatly reduce or eliminate the energy needed to perform computation by ensuring that every computation is reversible, which basically involves outputting ‘noise’ in addition to the answer (the noise is actually the information necessary to reverse the computation). Currently, that noise turns into heat through resistance. Perhaps new components could be developed which do not ‘resist’ this noise, but pass it along?