Advances in devices, materials, and packaging technologies all contribute to power problems. But do you need to be concerned about each transistor and wire?
With every new node there are additional physical effects that must be considered, but not all of them are of the same level of criticality. One that is being mentioned more frequently is self-heating.
All devices consume power and when they do that, it becomes heat. “In essence, all active devices generate heat as carriers move, creating channels for current to pass through the gates,” says John Ferguson, marketing director for DRC applications in the Calibre Design Solutions Group at Siemens EDA. “In the historical CMOS era, the impacts were largely manageable as the heat captured at the gate has a relatively simple path to dissipate through the silicon substrate. Unfortunately, the CMOS approach eventually hit physical limitations under which it couldn’t continue to shrink in dimensions while still operating reliably.”
Dennard scaling also had an impact. “Up until the ’90s, the problem wasn’t an issue,” says Victor Moroz, fellow in the TCAD product group of Synopsys. “That is because people were able to reduce the power supply voltage, and that was really helping with power consumption. But once people found the limits of technology, you cannot reduce your power supply voltage much below about 0.7V.”
Self-heating is activity-related. “Self-heating effects are highest in the most active parts of the chip, creating an additional bottleneck for designers,” says Jay Madiraju, product management director at Cadence. “When the circuit operates, the temperature of the devices and interconnect will increase based on the amount of activity. When a device consumes power during its operation, it generates heat. Similarly, for metal interconnects, the temperature rise will result from current flowing in the interconnect generating heat by the resistive losses, i.e., Joule heating. Both of these factors, contribute to the heat being generated.”
Shrinking makes things worse. “As the geometries of devices and wires decrease, current densities increase,” says Rob Kuhn, physical design lead for Lightelligence. “That exacerbates self-heating and its associated effects on performance and reliability. Also in advanced nodes, finFET and nanowire devices further increase self-heating as the thermal conductivity decreases and heat is trapped near the device for extended periods.”
Materials have an impact. “The changes in materials used to fabricate the devices result in further increases of thermal effects,” says Cadence’s Madiraju. “For example, low-K dielectric has higher thermal resistance than previous inter-metal dielectrics. Self-heating started to be an issue for legacy node designs below 65nm and has only become more of a challenge as feature size has scaled down, and in particular for advanced-node processes, i.e., finFET designs.”
The shapes also create problems. “The self-heating effects on new three-dimensional structure of finFET is more significant than planar devices,” says Tianhao Zhang, director R&D at Ansys. “In addition, the poor thermal conductivity of materials used in finFET structures, and difficulty of heat dissipation to substrate from isolated fins surrounded by dielectric materials, results in higher temperatures of finFET devices. That, in turn, can cause higher thermal coupling effects to interconnect wires.”
While scaling helps with some issues, others get worse. “From finFET introduction, the power density became large enough for people to notice and start worrying about it,” says Synopsys’ Moroz. “Transistor density increases something like 10% to 15% per year annualized. The power consumption for one transistor reduces slowly, because people improve technology little by little. When you reduce geometries, capacitance is reduced. Power consumption is mostly about capacitance because all the circuits are switching. Whenever you switch you have to charge the capacitance, and that capacitance comes from the next transistor that you’re trying to switch. Every time you charge it, or discharge it, energy passes along the other components — the wires. So as you reduce transistor size, transistor capacitance reduces, and that helps you to reduce power. And because they are smaller, the wires between them becomes shorter, and that helps as well. But it doesn’t keep up with the density improvements, and overall you see your power density keeps increasing.”
While devices generate heat, heat is dissipated through the package, board, and heatsinks. “New materials and very thin layers with minimal direct interaction to the silicon heat sink increase thermal resistance,” says Siemens’ Ferguson. “This makes it more difficult to dissipate the heat. As a result, devices are forced to work under higher and potentially increasing temperature loads. These high temperatures can impact the device threshold voltages and performance, ultimately resulting in reliability challenges. The trend toward 3D-IC design may further exacerbate the issue, increasing the total heat dissipation path even further.”
Heat generation and dissipation have to be balanced. “Consider figure 1 (below),” says Moroz. “At some point, you start operating your device and the temperature goes up goes up until it saturates and becomes stable at this equilibrium. This is because you have to manage your power budget for your chip to not overheat the whole thing. If you look at individual switches, and start with a planar transistor (on the left,) it would turn on and then off. Locally, transistor temperature would get higher and then lower, higher and lower. But because there are many of them, the background will get to these limits and stay there. When people switch to finFET (middle), nothing changes overall because it is still dictated by your power budgets and your circuit activity and your package’s ability to dissipate heat. But the finFET has a narrow fin, which is not as good at conducting heat and letting it escape compared to planar, so locally the temperature would get higher. Now people are switching to gate-all-around (right), and it is even more difficult for heat to escape from these because they are small and surrounded by things that are not conductive. So locally, there is a bigger problem. But for chip scale, nothing changes.”
Fig. 1: Chip temperature and self-heating. Source: Cadence
Heat is the killer in chips. Even if temperatures do not get high enough to destroy a device, that device can be impacted long term. “For devices, self-heating impacts the mobility and the threshold voltage, which in turn will limit the device performance and increase power dissipation,” says Lightelligence’s Kuhn. “Long-term device reliability degrades through hot carrier injection, time-dependent dielectric breakdown (TDDB), and negative bias temperature instability (NBTI). The inherent resistance of wires increases with increasing temperature, and that affects chip performance by slowing data transport. Over time, self-heating also will accelerate electromigration effects that can result in chip failure. Both trends worsen with advancing technology.”
Action plan
As with all issues, the earlier a problem is understood, the easier and less costly it is to deal with. Problems with prematurely aging devices not only require corrective action to the design, but also much more costly field replacement of defective parts.
Ferguson outlines three overarching principles:
Most people will start at the highest level. “Due to the highly detrimental effects on chip reliability and performance, it is essential to model the heat flow of any chip or system-in-package (SiP),” says Kuhn. “Tools provided by companies such as Ansys and Cadence have become increasingly relevant in this area, as they allow the designers to identify reliability and performance issues and mitigate them through techniques such as increasing wire dimensions (lowering resistance) and improving thermal conductivity through the substrate.”
All analysis starts with models. “The need to account for self-heating has been recognized by the device modeling community [CMC], and recent device models BSIM-C and BSIM-I include self-heating effects,” says Art Schaldenbrand, senior product manager at Cadence. “These models, along with simulator enhancements, enable the calculation of the power dissipated in the devices and the interconnect. Foundries typically provide models with self-heating enabled for advanced node processes. Designers can account for the effect of self-heating on their designs when using SPICE simulations providing insight into the changes in circuit performance due to its operation. While designers can include the effect of self-heating in simulations, there is a simulation performance cost, so they need to be strategic in how they analyze thermal effects. Designers need to be aware that the self-heating simulations do not consider mutual heating of neighboring devices and depending on the device density, the simulation results may be optimistic.”
Fig. 2: Temperature profile of IC package interconnect structures. Source: Cadence
Other simplifications can lead to pessimistic results. “The uniform worst-case environment temperature across a chip is often too pessimistic,” says Ansys’ Zhang. “To have accurate, high-resolution results, tile-based, or even metal layer-based environment temperature, along with self-heat, ∆T is necessary to analyze circuit reliability.”
Is modeling transistor self-heating enough? “The amount of heat you generate is proportional to your resistance,” says Moroz. “For the wire, it would have tens of ohms per micron. If you look at a signal net, it is the wire that connects your switch to the next one. That would usually be a couple of microns long and the resistance of that net would be several hundred ohms. If you look at a transistor, it has two states. There is a transient between the two stages, but the two states are on and off. In the off state it has mega-ohms resistance. In the on state, usually it has something like several kilo-ohm resistance. It dominates the wire resistance. So, if a wire is 100 ohms and your switch is 10 kilo-ohms, then the wire would generate heat but it’s 100 times less.”
While self-heating of signal wires may not be an issue by itself, thermal coupling adds to those temperatures. “The higher temperatures of wires becomes a challenge for reliability since smaller allowable currents are defined for those wires to meet the expected mean-time-to-failure (MTTF),” says Ansys’ Zhang. “This is a failure from electromigration, which over time generates undesired open or short circuits.”
The wires within the power network have different concerns. “A power net is much more complex than a signal net,” says Marc Swinnen, director of product marketing at Ansys. “A signal wire is point-to-point or multi-point, but the power net is a grid. You can’t solve it using the same solvers. You have to use a SPICE-like circuit simulator. The network is huge. On a chip with 50 billion transistors, you have 50 billion power and ground points that you have to connect. That is more complex than the power grid for the entire U.S. Each little piece of wire has to be modeled as a resistor, so you have hundreds of billions of resistors, and you have to reduce that down so that you can simulate it. Only then can you tell exactly where the current is going, and the voltage at every point. EM analysis comes along for free – it is a reliability issue, but you need to know the current flowing through all of the wires. This is also temperature-dependent, so you need to know the global temperature, and that depends on the heatsink and the environment. But temperature varies across the chip. In the past it was considered a single temp across the whole chip, but now we need to do thermal modeling and include Joule self-heating.”
The most direct way to reduce the impacts of heating is to reduce activity. This is often referred to as dark silicon. “High-end server packages can dissipate about 50 watts per square centimeter,” says Moroz. “The key is that you use a fraction of your switches on the chip to not exceed that. Otherwise, it overheats. And if you look at the technology today, you achieve that power consumption with about 1% activity factor. That may sound bad, but it is getting worse and creeps down little by little, by 5% or 10% every year.”
It all starts with early system analysis. “Knowing your power budget starts with complex chip-package-board co-design,” says Zhang. “Late-stage thermal issues result in large ECO circles, hard-to-fix issues, or even design failure. To overcome this, thermal effects should be considered during the early design stage, which includes thermal-aware functional block placement and thermal hotspot assessment. This consideration helps produce not only an optimal design, but lowers the self-heating impact and improves overall improve design reliability.”
Getting heat out is becoming more difficult, especially with multi-die systems. “You have some bumps that connect a die to the circuit board,” says Moroz. “And then there’s some radiator that helps you to dissipate heat. There is silicon in the middle, and that’s a fairly good heat conductor, so the heat would be uniform everywhere, and you wait for the package to dissipate heat. Now, if you take that chip and start stacking things on top of it, what is the heat conductance of this material? If it’s a dielectric, that is an issue, because it is an additional barrier for heat escape. So you have to make sure that the dielectric is not that bad. It will definitely be worse than silicon. But the question is, ‘Is it still okay or not?’ And then, if you don’t cover the entire surface with chiplets and you have some gaps, this is also going to be filled with some organic glue, which is not a great heat conductor.”
Some new technologies are being looked at. “While I’ve done no studies on this, perhaps the trend toward backside power distribution may lend a hand,” says Ferguson. “By bringing power to the backside through TSVs, and in close proximity to the devices, we may decrease additive heat on the devices from the surrounding wiring resistance, and that may enable a bit of improvement in heat dissipation as device heat passes laterally and ultimately out through the TSVs.”
Conclusion
Self-heating may not be the biggest issue facing circuit designer’s today, but the problems have been growing enough to cause concern — and they will only get worse in the future. New devices, new materials, and new packaging technologies all are causing an acceleration in the problem, and if the maximum activity factors continue to decline, the amount of work that it is possible to do in any package may plateau.
FinFET self-heating can come as a nasty shock — especially in the latest processes with high-mobility SiGe PMOS, leading to a “WTF?” moment when you find the local self-heating is double compared to Si NMOS (or previous nodes)… 🙁