Cooling Chips Still A Top Challenge

Heat pipes, lids, thermal interfaces, and micro-channel cooling help remove the heat generated by chips.

popularity

Increasing levels of semiconductor integration means more work needs to be done in smaller spaces, which in turn generates more heat that needs to be dissipated.

Managing heat dissipation in advanced node dies and in multi-die assemblies is critical to their functionality and their longevity. And while much of the focus has been on improving power efficiency, which reduces the rate of power growth, that alone is insufficient.

Also needed are a variety of techniques to help move the heat up, down, and out. And the good news is that progress is being made in multiple areas.

More work means more heat
The energy needed for circuits to perform work comes from the power pins, but not all of that energy results in work. Some of it is wasted as heat, which must be removed from its source and expelled into the environment. For a design to be successful, the rate of dissipation must be balanced against the rate of energy usage. But in addition to the power, the area within a die where the heat originates also needs to be factored in. The smaller the area, the greater the power density, and the greater the need for improved cooling strategies.

“The key is trying to remove watts from a small number of square centimeters,” said Dave Fromm, COO of Promex. “The power per unit of area is massive.”

It’s also becoming more problematic. “The power density is climbing,” said Mike Kelly, vice president for chiplets/FCBGA integration at Amkor. “That’s exacerbated by things like copper hybrid bonding, where the power of the 3D stack is still in the same x, y footprint.”

A silicon die is limited in its maximum size by the reticles (26 x 33 mm) employed to pattern them, but packages have no such upper limit. Sizes cannot be arbitrary, but that’s partly because the industry hasn’t required such large packages in high volumes. Production lines aren’t yet equipped for them. Nevertheless, the effect of a larger package is to spread the heat out further, reducing the power density.

“It isn’t like we’re continuing to put all this content into a fixed size,” observed Kelly. “The size is growing, and that makes the power density maybe stay even or climb more gradually. That’s different from silicon dies, where you’ve got a reticle limit.”

That larger package, however, may be more prone to warping. “Currently, the 60 x 60 mm2 body size [is common],” said YoungDo Kweon, senior director, chiplets/FCBGA development at Amkor. “Amkor also has an 85 x 85 mm2 body size in production. In a few years, we will get over 100 x 100 mm2. That means the thermal stress is potentially increased.”

The thermal conductivity of the materials is measured in W/Km (watts per Kelvin-meter). It rises with shorter path distances, so the thinner anything in the path is, the better.

How heat moves in a package
Heat primarily is generated in active silicon layers. From there it can move up, which in a flip-chip package means through the bulk silicon to the backside and out the package. It also can move down through the various metal connections terminating on a PCB, and in some cases it might be able to move sideways. Which way depends on the application.

“If you look at things like notebook PCs, they extract heat from both the die backside and the other side of the main board,” said Kelly. “But for data centers and high performance, it’s a high resistance thermal path to go down through the board. So 95%-plus of all your heat is going out the top.”

Heat sinks, some with built-in fans, have been standard in high-power packages for many years. They’re made of copper or aluminum, and the choice of metal can depend on where the heat goes after the heat sink.

Aluminum changes temperature faster as it draws in heat from the package. That greater temperature change makes heat exchange more efficient. “Changing the temperature of copper is harder than changing that of aluminum for an equally sized heat sink,” noted Fromm.

If a heat sink exchanges its heat with air, then the air must be moving. Air is a very poor thermal conductor. If the heat sink bolts to another thermally conductive solid, then copper may be preferable. Copper has a higher specific heat, meaning it can store more heat without rising in temperature as much as aluminum. It’s therefore less effective at exchanging into air, but if attached to another solid, it can be very effective in channeling that heat into the follow-on sink.

If the computing work being performed is bursty, with long idle periods, then copper also can serve with a fan as it will have more time to exchange with air. “If it’s short pulses that are really high with a large downtime, copper is going to be better at dampening that out over time,” said Fromm. “Aluminum is going to get really hot instantaneously.”

Hot spots
Die hot spots create another challenge. Rather than equipping the entire package to dissipate enough heat to handle all hot spots at the same time, heat spreaders can average the heat out within the package. A traditional metal spreader resides inside the package. It can be a separate slug of metal or it can be a metal enclosure with a thermal connection to the die.

“The best way to get good heat spreading is to efficiently remove it in the vertical direction,” said Kelly. “If you’re removing it really efficiently, the hot spot doesn’t get a chance to get hotter and spread its heat out.”

The means of attaching spreaders and other elements is an area of active development. Called thermal-interface materials, or TIMs, their role is to ensure a conformal layer between two surfaces. “You want it to be a glue, although people do greases, as well, if it’s not supporting the part,” explained Fromm. “The key is to get rid of air gaps. An ideal TIM is something that’s going to stay put, but which has a lot of conformity from a stress perspective.”

Typical packages can involve two TIMs, sometimes called TIM I (roman numeral 1) and TIM II. “You have two different interfaces inside the package,” said Kweon. “One is between [the heat sink and] heat spreader. The other one is between die backside [and the heat spreader].”

Fig. 1: Two typical applications of thermal-interface materials. TIM I lies between the die and heat spreader; TIM II lies between the heat spreader (in this case, the enclosure) and the heat sink. Arrows indicate heat removal. Source: Amkor

Metal TIMs coming
Traditional TIMs primarily have been polymers. But because polymers don’t conduct heat well, they’re often doped with conductive additives. “People are doping them with carbon or graphite or with various highly conductive metals,” said Fromm. “Diamond is another filler that people are starting to use. Diamond’s thermal activity might be five to ten times higher than that of copper.”

Even so, TIMs tend to have poor conductivity, so keeping their layers thin helps keep that thermal path as short as possible. They’ve worked adequately for packages dissipating heat around 100 W, but newer chips and advanced packages are anticipated to rate more like 1,000 W, challenging the current materials.

Metal TIMs, and an indium alloy in particular, are now available with much higher thermal conductivity. Amkor found that switching to an indium alloy could reduce the junction temperature of a die by more than 10°C. “A 10°C increase [with a polymer TIM] usually means the die lifetime is cut in half,” noted Kweon. “Many customers now want metal TIMs [for chips having power] higher than 400 W.”

Fig. 2: Molded FCBGA with metal TIM. Source: Amkor

TIMs expand with heat at rates different from their attached materials, so adhesives may experience more thermal stress than a grease would. This may be an issue for those larger packages Kweon sees coming in a few years. “That means that if you apply a polymer TIM, [it may not work] well because the tensile stress around the die edge [may cause] delamination,” he said.

System-side components
Moving air can provide only so much cooling, so for more challenging assemblies, liquid is being employed in multiple ways. Surrounding a package or subsystem with liquid (immersion) can pull heat away more effectively than air can.

“At some point, when you get up somewhere between 800 and 1,200 watts, depending on the package’s construction, you just can’t live with an air-cooled system,” said Kelly. “You’ve got to move to some kind of liquid cooling where you’re providing a cool temperature in immediate contact with the die.”

That requires a closed system, within which the liquid can circulate from the heat-generating components to an exchanger that can cool the liquid before returning it in a closed cycle. It also raises the temperature gradient between the die and cooling solution. “That makes stresses higher everywhere,” noted Kelly. “The good news is that the materials in IC packaging are so much better than they used to be 10 years ago.”

Traditional liquid cooling relies solely on liquids, but a more advanced version uses liquid and gaseous phases. “The most advanced cooling methodology is two-phase boiling flow,” said Satya Karimajji, senior engineer at Synopsys.

Immersion takes liquid cooling one step further, plunging entire systems into a flowing liquid that’s far more effective at removing heat than other techniques. It’s complex and expensive, however, because the system must be sealed to contain the liquid. Research is focused on finding the most effective liquids. “They are looking at different types of dielectric fluids and refrigerants that they can use,” said Karimajji.

When space is limited
Liquid/gas also features in two different approaches. Vapor chambers, while not new, are becoming more popular as a means of spreading heat. “Nowadays, many customers are moving to vapor chambers with a cold plate on top of the package,” said Kweon.

Instead of a metal slug, vapor chambers feature a sealed chamber containing vapor that contacts the die on one side and a cooling plate on the other. These are two-phase systems, with the heat sources acting as evaporators and the cool side as condensers. They typically have some kind of wicking material inside that helps bring the condensed liquid back to the evaporator.

“Let’s say heat is dissipated in a small area, but you want to spread the heat into a bigger area,” said Karimajji. “[Vapor chambers] enhance the uniformity of temperature in the heat sink base.”

Heat pipes can move heat farther from the source in systems such as laptops and phones, which lack space for heat sinks. The condensed liquid will move by capillary action to the evaporator, pushing the vapor along on the other side. The generated heat drives the system.

“Let’s say in a laptop, you don’t have enough space to add a fan [near the CPU],” said Karimajji. “They run a heat pipe from the top of the CPU to the edge of the laptop, where they can put a fan. The advantage is that you don’t need a pump.”

The big benefit is the size of heat pipes, despite moderate cooling capability. “By themselves, they are probably not enough to cool GPUs,” noted Karimajji. Liquids employed in these structures are typically de-ionized water, but refrigerants also can serve depending on the operating temperature.

Topping it off — or not
Lids on packages provide protection and mechanical stability for the package contents. But exposing the die backside opens the door to different cooling techniques.

“The lid helps spread the heat, so that can help your total thermal performance,” said Kelly. “But there’s also a big benefit to having a protective structure during tests because those functional or system-level test insertions are mechanically very rigorous. So our customers that have lids appreciate having the lid. When they don’t have a lid, they’re always cautious about the mechanical integrity during testing.”

One of the cooling techniques being developed is water impingement, in which water is literally sprayed on the backside of the exposed lidless die.

“If you literally spray water onto the top of the silicon, you can remove a lot more heat than if the water is contained in a water jacket of some kind,” said Kelly. “The water doesn’t change phase, but the boundary layer of the water next to the silicon gets very thin, and so the thermal resistance is very low.”

For chips without the mechanical support of a lid, stiffeners such as rings placed around the edge of the substrate can help provide rigidity and mitigate warping as temperatures change.

Even more exotic is microfluidics, involving internal micro-channels through which a coolant can flow. Instead of simply surrounding the package, liquid flows through the channels, absorbing heat internally.

“A micro-scale heat sink has two parts, one that sits on top of the CPU block and then another part that has a heat sink with a fan attached to it,” said Karimajji. “There is a liquid loop connecting them. The liquid flows through the CPU block, picks up the heat, and then goes to the coolant reservoir, called the radiator, where the heat sink is located. It exchanges the heat back into the environment, and then the cold fluid is pumped back to the CPU block.”

This is particularly promising for cooling stacks of silicon, where the top of the stack can easily lose its heat to the environment while the dice in the middle must somehow push their heat through the stack. The microchannels now give those middle dice a more effective way to lose their heat. The tradeoff is complexity and expense.

These are primarily single-phase systems for the time being. “The industry is trying to make a two-phase [system] move from the research stage to the commercial stage,” added Karimajji.

Moving heat down to the PCB
Heat has a more complex path to travel down to the PCB and out into the rest of the system. The natural paths for heat to flow are through the interface between the die and the substrate — i.e., the die attach — and the metal leads that travel from the die down to the connections on the PCB.

In an advanced package, not all leads end up outside the package. Those internal signals transfer heat between components in the package. Those that go outside may have to travel through an interposer or silicon bridge before getting to the substrate.

“We can have up to six layers in the interposer,” said Karimajji. “But if that’s not going to be enough, then pulling the heat from the top side of the package is an alternative parallel path.”

More thermally conductive eutectic alloys can improve heat transfer through the die attach. The leads also play a role.

“Metal density will help you move heat out,” said Fromm. “Ground connections and planes are good for this. However, if high-interconnect regions of the die are actually generating the heat, then it’s a net source of heat, not a sink.”

“The maximum die temperature is dependent on the density of the [interconnect] bumps,” said Keith Lanier, product management director at Synopsys. “Using EDA optimization tools, you can change the bump density, and that affects the maximum temperature of the die.”

New solders and substrates
The type of solder also matters. Gold-tin solder performs well in this regard. “Standard solder is on the order of 20 to 30 W/mK,” said Fromm. “Gold-tin is about 60 W/mK, a factor of three better.”

Sintered silver is also receiving some attention, particularly for power devices. “There’s a class of materials that are pastes. They’re dispensed like epoxies,” said Fromm. “You sinter them, and they have very high thermal conductivities — 70 to 100 or 150 W/mK.”

Amkor also is working on copper-lead attach, according to Kweon, but it’s a more challenging material and requires more careful processing, raising its cost. “It can be done, but the surface has to be so clean and the surface oxidation has to be controlled, so you have to do these things in inert atmospheres,” said Fromm. The challenges mirror those necessary for copper-based hybrid die-to-die bonds.

All these potential heat paths travel through the substrate before arriving at the PCB, whether through leads or the die attach. Standard organic substrates conduct heat modestly, but the future may bring ceramic substrates with higher thermal conductivity.

“In my mind, the holy grail would be a high density, high thermal-conductivity ceramic that can take the heat and provide [sufficient] I/O density,” said Fromm.

Such substrates would be more expensive than organic versions, but they’re also flatter and stiffer than organics, which can improve production yield. “Maybe the assembly yield will drive the economics where the substrate costs more,” mused Fromm, “If I can build it with better yield or get more performance out of it, it might be worth it.”

Moving heat to the side
Moving heat out of the sides of a die adds yet one more thermal path to help cool a die. Although a single die may be too thin for this to have a big effect, a stack could benefit from a sideways path that doesn’t involve the cost and complexity of microfluidics. One approach is the molded flip-chip ball grid array (FCBGA).

In a standard FCBGA, air surrounds the components. With the molded FCBGA, that space is filled with a thermally conductive molding compound, allowing heat to move sideways from a die in a stack.

“In the case of a die stack, the sandwiched die has no good thermal dissipation path because the air around the die [inside the package] is a very poor thermal conductor,” said Kweon. The molding material replaces that air, improving the sideways thermal path.

This may become even more important with advanced silicon nodes with greater stresses. “[The silicon process] will soon go to 2nm,” added Kweon. “In this case, the inter-layer dielectric is very brittle. The molded FCBGA can reduce the thermal stress barrier.”

Fig. 3: A molded FCBGA. The molding material replaces air in the package and improves the sideways thermal path. Source: Bryon Moyer/Semiconductor Engineering

So many options
The number of cooling options continues to grow as chips and packages generate more and more heat. Given the number of interactions between components in a package, assembly changes tend to occur incrementally. It’s unlikely that a revolutionary new system will replace what we have, even if such a thing were in the offing. So the bits and pieces we’ve seen here will continue to evolve in varying combinations.

Getting an early design start matters. “We do see more work up front, with architecture exploration and even at the RTL level,” said Shawn Nikoukary, senior director of SoC engineering at Synopsys. “We have to influence the architecture of the chip to get the optimal thermal performance. The more work we do in the architecture phase, the easier it becomes at the end.”

It’s important not to lose sight of the cost ceiling dictated by the application. “The data center guys tend to have some pretty exotic solutions,” Kelly pointed out. “They’re in a market where it’s easier to afford that. But if you think about a notebook or desktop or some other edge device, we really have to be wary about cost and efficiently getting rid of heat.”

Related Reading
Many More Hurdles In Heterogeneous Integration
More resources will be needed for IC-to-package design, process extendibility, and improved reliability.
Auto Chip Aging Accelerates In Hot Climates
New data shows significant reduction in lifespan and potential new security issues as global temperatures rise.
Navigating Increased Complexity In Advanced Packaging
Variability is a growing challenge; achieving higher yields requires even tighter control and precision.



Leave a Reply


(Note: This name will be displayed publicly)