Thermal Challenges Multiply In Automotive, Embedded Devices

The small sizes and tight tolerances of embedded devices create big problems for reliability.


Embedding chips into stacked-die assemblies is creating thermal dissipation challenges that can reduce the reliability and lifespan of these devices, a growing problem as chipmakers begin cramming chiplets into advanced packages with thinner substrates between them.

In the past, nearly all of these complex designs were used in tightly controlled environments, such as a large data center, where overheating could be quickly addressed by shifting some of the computing to a different server blade or rack. But these stacked die assemblies are now moving into safety-critical applications, such as automotive sensors and pacemakers, where there are fewer options available. Fundamentally, reliability is affected by thermal cycling, so every time a chip is switched on, it heats, cools, expands, and contracts, but when it is tightly enclosed that cooling period can take significantly longer.

“Thermal management is absolutely necessary,” said Ketan Dewan, senior director of IoT, Compute, and Industrial MCUs at Infineon Technologies. “If not managed properly, it may result in performance loss, unreliable operation, devices failure, and higher system costs.”

To make matters worse, many of these devices involve different materials. “You’re dealing with differential expansion,” noted Marc Swinnen, director of product marketing at Ansys. “If you take a piece of metal, heat one end and not the other, your chip is going to flex and warp.”

Fig. 1: Thermal simulation of heat flows in advanced package assembly. Source: Ansys

Warpage can break the bonds between various compute elements and substrates, forcing rerouting of signals and reducing performance. Heat-related stresses also can create thermal runaways, where connections become so hot that bumps melt or change circuit behavior. This latter effect can cause parametric drift, a condition in which a chip may initially work as intended, but eventually no longer fit its originally defined parameters. Thus, a device may appear to be fine when it’s released, only to behave oddly or outright fail months later.

Discrete devices are more isolated, but they’re not immune from thermal problems, either. “They may generate heat, but it’s a little bit easier to dissipate that heat because the chip is on a substrate, so the heat has a place to go,” said John Ferguson, product management director for Calibre nmDRC applications at Siemens EDA. “When you are stacking things, it gets much more difficult. Side-by-side is not as bad, but you can have interactions across boundaries.”

Physical issues can compound, depending upon the layout and the proximity to other circuitry.

For example, automotive sensors can be affected by both heat and vibration, depending on where they are placed, which means both thermal and mechanical stress become issues.

“When you talk about embedded systems, it’s not just the heat produced by the element itself, but the neighboring elements,” said Swinnen. “It’s embedded in something, and then something next to it can also be producing heat, so you suddenly have a flush of heat coming to your chip, which has nothing to do with what you’re doing. It’s the chip next door that’s getting hot, but your temperature goes up, and it seems like it’s your chip.”

Heterogeneity at the packaging level adds to the complications on the design side. “Historically, the way thermal analysis has been done, the chips that are in the package have been treated as not just a uniform material but as a uniform temperature,” said Ferguson. “But that view fails to encompass the full complexity of the interactions. Chips are a source of heat generation, but they are also potentially the victims of heat. It’s important to capture that if you’re thinking about it at the package level.”

Yet even when a designer is aware of these issues, they are not easy to mitigate in embedded devices. “It’s very important to consider thermal analysis for a cell phone, for example, because you cannot put a heatsink in a cell phone,” said Melika Roshandell, product management director at Cadence. “Also, it has no fan and you don’t want it to burn your face when you put it near your ear. If there is no thermal mitigation, it will get that hot. A cellphone designer has to think about how to have the maximum frequency in the chip, and at the same time, have it ergonomic and not burn skin. They have to do a lot of conduction and radiation analysis to determine thresholds. Since they are in a constrained space, they have more challenges to address, and they have to be more innovative in addressing the thermal issues.”

Even where a traditional cooling solution, such as a fan, can be used, it likely won’t be an ideal choice. “Fans cause electrical interference, but that’s just a side issue,” said Geoff Tate, CEO at Flex Logix. “The biggest problem with fans, besides being bulky and noisy, is that they can break. And if they break, then your cooling system dies. So now you’ve got a big problem. When you add more transistors, and the transistors run faster — which is what everybody wants — you burn more power. But packages can only dissipate a certain amount of power. And even if you put this package in cold air, at some point it can’t radiate enough heat. So then you put a heatsink on, and the heat sinks cover the whole size of the big fins. They can radiate a lot more heat. And then you can bolt this onto the side of a metal box, and the metal box becomes a heat sink also, and you can put it outside in the environment.”

But the problems don’t end there. In fact, depending on the environment, they can get worse. “A customer may say, ‘It’s got to be able to be outside, and we’re in Phoenix. That’s 125°F,’” said Tate. “Or maybe it’s got to work in Edmonton, Canada, at -40°F. You have to be able to operate over a very wide temperature range, so you’ve got to design your chip, figure out all the pieces of it, and do thermal analysis of the package and the cooling, and know how the customer is going to use it. Then, you’ve got to work backwards to understand what you can put in the chip and how fast you can let it be, because if you make the chip too powerful, it’ll burn too much power, and you can’t dissipate it. Then it’ll exceed the junction temperatures, and you have to stay within the junction temperatures, because if they are too high, they lower the reliability. Some customers will say, ‘I don’t want the junction temperature to go up to this, but I want it to be higher than that.’ It gets very complicated, very quickly.’”

Even in Phoenix-like conditions, a device could be placed in a controlled environment, such as a camera monitoring an air-conditioned store. But then the designer also must consider whether the customer will always observe those specs, and how much margin to build in. After all, even the best HVAC system potentially could fail, or a sensor intended to be in a shadowed corner might be moved closer to a large window and thus heated by direct sunlight.

Environmental considerations
Such environmental uncertainties, coupled with the critical situations in which many embedded devices are used, can lead to conservative design and packaging choices. This includes developing chips/chiplets at older process nodes for their proven reliability, as well as less-challenging assembly options. Those choices also may be driven by liability concerns, as well as engineering ones. “I see more legalese in the contracts,” Ferguson observed.

With embedded devices, engineering concerns often overlap with human safety issues. “If you’ve got a medical apparatus and you’re putting it into a human body and it heats up, what’s it going to do to the person it’s in?” Ferguson asked. “Or worse, if a body is undergoing stress, which is causing heat, could that affect the package? You want to be more conservative and extra cautious, taking into consideration how these things are being handled.”

Consider what happens with a pacemaker with poor thermal management. “A pacemaker should last more than 10 years,” said Infineon’s Dewan. “But if thermal management is not done for an embedded device, leakage and power consumption of the embedded device may require a battery change in less than 10 years.”

Even outside the body, extreme caution is required. The auto industry has established ISO 26262, the 12-part safety standard that dictates best practices for automotive electronics. The need to comply with ISO 26262 adds even more compelling reasons to overcome thermal issues, along with the simple economic fact that the hotter a device gets, the more money that needs to be spent on incorporating cooling solutions.

There are several approaches to mitigating the thermal issues in advanced packages. From an EDA perspective, the favored method is to address them early in the design cycle, with extensive simulation and prototyping using real workloads. Increasingly, this can involve working with digital twins.

“The first thing is to think about the placement of your chiplets,” said Siemens’ Ferguson. “Can you make better choices of where things are placed so that you’re not having so much heat concentrated in one area? The next phase would be at the level of the chiplets themselves. Are there things you can do to get the heat out? If you can’t move anything around, how can you get the heat out of it?”

For embedded devices, there are additional considerations. “Solutions have to be modified a bit more,” said Ferguson. “You can do forms of cooling across the package, like heatsinks. You can put a TIM (thermal interface material) on top to help with cooling. If you’ve got a chip on top of another chip, maybe the bottom chip is heating the top chip. Now the top chip doesn’t have very much room to get rid of the heat, so you could put in copper pillars as a sort of chimney.”

Fig. 2: Cooling architectures for embedded IC. Source: Nordson Test & Inspection

From a multi-physics perspective, the big difference with thermal, compared to all other electronic simulations, is that a finite element mesh is needed. “It’s a volumetric mesh, not a surface mesh, because heat flows in all three directions on the surface,” Ansys’ Swinnen said. “With a volumetric mesh of your device, you can simulate how that power spreads out and also how ambient temperature can invade your system and spread throughout.”

Thermal issues are a persistent and worsening issue in design. “Thermal has always been an issue at some level for some devices, but it’s not one that chip designers traditionally had to deal with much because package designers dealt with it right at the end,” Swinnen noted. “You had an idea of how much power you could use, and if it turned out to be a little higher, no one was pulling the chip back because the power was a bit too high. Power was seen as a soft sign-off, but now it’s become a number one issue. Thermal is directly related to power, but it is a different thing. Power usage is money, but then power leads to thermal, because power eventually gets exuded as heat. Embedded devices, however you define them, are inside something, which means they’re harder to cool. You’re going to increase the power density, meaning how much power per cubic volume you’re using. Then you need to get all that heat out somehow, so the cooling is becoming more of a challenge. Leading-edge chip designs for 3D-IC are looking at ideas like liquid cooling and total immersion, but there’s only a limited set of things that can be solved that way [in embedded devices].”

Ultimately, solving the thermal problems that have been vexing both embedded and discrete devices may require new materials, like glass substrates, and new thinking about existing components. “Historically, we’re so engrained on the idea that we need the fastest interconnect possible,” Ferguson said, “But could we choose interconnects that are less thermally sensitive — potentially something like a TSV that’s a wide, vertical connection? The part that we need to figure out is the tradeoff. You could be taking up a lot of area that could be used for something else. What are the best choices? For example, you may want to make your connections ultra-thin, but that means higher resistivities, which could make more heat. You have to figure out all the side effects that come along with the benefits.”

Related Reading
Design Flow Challenged By 3D-IC Process, Thermal Variation
Rethinking traditional workflows by shifting left can help solve persistent problems caused by process and thermal variations.
Controlling Warpage In Advanced Packages
Mechanical stresses increase with larger sizes and heterogeneous materials.

Leave a Reply

(Note: This name will be displayed publicly)