Auto Chip Aging Accelerates In Hot Climates

New data shows significant reduction in lifespan and potential new security issues as global temperatures rise.

popularity

Automotive chips are aging significantly faster than expected in hot climates with sustained high temperatures, raising concerns about the reliability of electrified vehicles over time and whether advanced-node chips are the right choice for safety-critical applications.

Many of the most advanced electronics used in vehicles today are ASIL D-compliant, expected to function up to 125° C. But during extended heat waves, those chips don’t last as long as expected. This is evident in new studies conducted in Phoenix, Arizona, which recorded 64 days with daytime temperatures above 110° F (43.3° C), and 5 days with peak highs above 115° F (46.1° C). [1] At those temperatures, the cabin in dark vehicles with dark upholstery can approach 200° F (93° C), which is just shy of the boiling point for water.

Predicting exactly how complex systems will behave under these conditions is difficult due to limited data, non-linear and dynamic interactions in systems, and insufficient predictive techniques, according to a new white paper issued by IEEE’s Functional Safety Standards Committee. “Environmental factors, such as temperature, humidity, vibration, altitude, or radiation can have a significant impact on the degradation and failure of systems,” the paper said. “Incorporating these factures into RUL (remaining useful life) prediction models can be challenging, and being able to measure their dynamic impact on the system can be limited.”


Fig. 1: SEM image of failure caused by electromigration in a copper interconnect. Passivation has been removed. [2]

“We have a number of OEM customers, and a couple of years ago they told us they didn’t have any problems and they weren’t concerned about their silicon because typically they were using 10-year-old technology,” said Steve Pateras, vice president of marketing and business development at Synopsys. “That’s no longer the case. Our automotive customers are now at the leading edge with 5nm and 3nm chips. You need to be able to measure what’s going on, not just assume what’s going to work or use past experience. So RUL is becoming a really big issue with many of these OEMs. And based on the Arrhenius Equation, which looks at how materials degrade based on temperature, and a certain number of hours operated under that period of time in the summer in Phoenix, we were able to look at the change of life expectancy in the silicon. It was quite dramatic.”

For a chip designed to last 30 years, high ambient temperatures reduced the life expectancy an extra 10% a year, so after one year the lifespan dropped to 26 years, Pateras said.

Chipmakers are well aware of these trends, which are being exacerbated by climate change. Extreme temperatures are more frequent, and sometimes they can last for weeks rather than a few days. All of that needs to be incorporated into chip architectures, which may require different materials, extra margin, and some type of active cooling.

“There are two aspects to consider,” said Bill Stewart, vice president of marketing for Automotive Americas at Infineon Technologies. “One is the quality of the devices. Our automotive chips are at a 60 parts per billion failure rate. So for parts that are used at high temperatures, we have designed in margin. The second aspect is functional safety, and how you detect a failure in a system. Is it a software failure? Is it a hardware failure. Whether it’s our chip or someone else’s, how do you diagnose that and alert the operator so that you can either limp home, reset things, or turn on the ‘check engine’ light and go to the dealer.”

What’s important in complex systems is how various components and systems within a vehicle interact with other components and systems. With this level of complexity, seemingly insignificant components can bring down an entire system. In addition, fail-over into other systems, which is required under ISO 26262, can cause unexpected interactions. The fail-over circuitry needs to be designed at the same ASIL level as the failing part, and it needs to be functioning as expected even though it’s subject to the same conditions.

“We have not gotten into a situation where the display on a car in Phoenix is failing,” said Satish Ganesan, senior vice president and general manager for Synaptics‘ Intelligent Sensing Division. “But there have been other components that failed due to heat. Our touch components and screens will likely still work even if other components fail. But any component that fails still can result in a system failure.”

What goes wrong
All of this assumes a normal workload, as well. With increasingly autonomous systems in vehicles, utilization of processing elements may be significantly higher. As with any electronics, higher utilization increases the temperature of circuits, resulting in accelerated aging.

“When we qualify a part, we develop a mission profile,” said Ray Notarantonio, senior director of automotive vehicle body and infotainment at Infineon Technologies. “That mission profile includes temperature, voltage, and everything else. And, of course, the mission profile is not a car that is going to be sitting at maximum temperature for 50% of its life. That is not in the mission profile. But we are seeing cases where autonomous driving is going to change the mission profile because cars will be active more often and running AI. It’s a big factor. We recognize it, and there’s a lot that we do from a qualification standpoint to go beyond those mission profiles.”

Others agree. “If you have a combination electric car with autonomous capability, that might have a 100% duty cycle,” said Josh Akman, senior applications engineer at Ansys. “It may be driving around continuously, which is a completely different amount of usage than a commuter. And now you basically have a computer under the hood of the car. There are a lot more challenges to think about, and if you’re going down to really small nodes, like 5nm or 3nm, there are so many competing duties these things have to do — not just for thermal integrity, but also for electrical and mechanical integrity. And if you solve for one, sometimes that exacerbates the other. There are a lot things to balance.”

Consider the interactions at the packaging level. “You can somewhat distinguish between an aging effect and a wear-out effect,” Akman said. “If you have continuous high heat, a common issue is that your solder joints become more brittle. When you first reflow your solder joints, you get a bulk solder, and at the interfaces to the package and the PCB you get what’s called intermetallics, which is a mixture of the solder and what’s on the PCB. They intermix when they reflow, and over time as the solder ages, that intermetallic layer will grow and become more brittle. So you can create new potential failure modes from that aging effect. Similarly, if you have fluctuations in temperature you get a lot of coefficient of thermal expansion (CTE) mismatch issues. Different materials expand and contract at different rates, causing mechanical stresses that can cause different kinds of failure modes, either on the package or in the solder joints, even down to C4 bumps, flip-chip bumps, or microbumps. And then you can have electromigration and dielectric breakdown at the die level, and a lot of other temperature-related issues.”

Changes ahead
There is no single best practice for addressing all of these issues.

“There is a sort of brute force approach,” said David Fritz, vice president of hybrid and virtual systems at Siemens EDA. “We have analytics that would actually be in the device, and they detect, ‘Oh, two years ago it took two milliseconds for this to happen. Now it’s taking 10 milliseconds.’ So the ‘check engine’ light comes on. But there’s another approach to this, too. I met with a vendor in China that is putting artificial intelligence into their chip using the latest, greatest technology. It’s called a Focused Transformer. It’s the same thing they’re using for these large language models, but it is scaled to go into a single chip. It monitors the situation and determines when there’s degradation, and then determines what kind of other changes it could make. So maybe I’m not at my frequency max. I may want to bump my frequency up in this vehicle by another 10 MHz, and therefore I can extend the life. It’s not just monitoring. It’s decision-making and changing the functionality of the device in ways that can prolong its life.”

This is similar to the approach Apple took with its iPhone, but in reverse. Apple reduced the clock speed of its application processors to prevent them from shutting down due to reduced charging of aging batteries. In this case, the draw on the batteries is relatively small in comparison to the amount of energy required by the vehicle motor.

That kind of resiliency is difficult to manage, however, particularly in a system of complex systems. Not all chips age evenly due to thermal gradients which can cause electromigration and reduce the flow of electrons through wires. In a hot climate, this becomes even more difficult to manage. While redundant circuitry can be used to circumvent EM, that’s not a viable option at 5nm and 3nm because added circuitry impacts overall performance. To make matters worse, at those advanced nodes, the interconnects are extremely thin, which exacerbates any thermal effects in a hot compartment. The same is true for thinner insulating films, which break down over time (time-dependent dielectric breakdown, or TDDB).

So what comes next? “The most telling evidence is the next version of the ISO 26262 standard,” said Synopsys’ Pateras. “The working group that’s been working on predictive maintenance, which is being rolled into the third edition of the standard, really talks about monitoring and resiliency. It’s being able to take silicon data, monitor it, and use that as a way to predict failure. The industry is moving toward that approach, where you need to actively monitor the silicon, as opposed to just building in inherent resiliency through other means. Functional safety will always be there, and people will look at using various techniques. But monitoring will be a key tool in the box to manage that resiliency.”

Security issues
The impact of accelerated aging and high ambient temperatures go well beyond just a single circuit. In automotive, security and safety can overlap in unique ways.

“There was a paper a couple years ago at GOMACTech (Government Microcircuit Applications & Critical Technology Conference), where they put a PUF into the programmable fabric of an FPGA, then powered up the FPGA over voltage, over temperature,” said Scott Best, senior technical director for silicon IP product management at Rambus. “They basically put it into an oven and did a rapid aging experiment. Then they put the PUF back in the fabric and could not recover the original key material because of the aging of the lot fabric.”

One of the most common methods of blocking cyberattacks in the past has been obfuscation, which is essentially adding noise into a device to make it harder to pinpoint how a chip is working. The problem is that AI algorithms can identify noise that a human cannot, and that can easily be blocked.

“I was at a meeting presenting power analysis side-channel countermeasures to a customer,” Best said. “They built a noise circuit, and it was just blasting a lot of noise onto the power supply. So if you measure the power supply, you’re going to be overwhelmed by this random noise and now you can’t see the signature of the crypto operation. Our response was that if you took two measurements and subtracted them, their random signal would go away. A human cannot look at this, but with some tools and 1,000 scope traces, it’s just doing math. It doesn’t get confused by the numbers or the images. ‘There’s a signal, there’s noise, and by the way, here’s your key value.'”

Conclusion
It’s not clear yet whether increasing the heat in a circuit will make those signals even more visible as dielectrics break down, but it’s certainly a topic for future discussion. The bottom line is that thermal is a problem for all circuits, but when compounded with hotter-than-expected ambient temperatures, thermal-related aging accelerates. That triggers a whole bunch of challenges that many automakers never anticipated, and a much greater need for either raising the maximum sustained operating temperature of automotive chips or figuring out better ways to monitor electronics in a vehicle to determine when they should be replaced and how to cool them.

These challenges only worsen as utilization rates increase with increased autonomy and as automakers use more advanced-node chips and chiplets, with thinner substrates, wires, and dielectrics. Safety and security, when combined with circuit aging, require a fine balance in automotive applications. But with 5nm and 3nm dies running advanced algorithms in places like Phoenix, Arizona, that balance becomes even harder to achieve.

References

  1. Weather Underground maximum daily temperatures in 2024 taken from daily reports at Phoenix Sky Harbor International Airport.
  2. Patrick-Emil Zörner, CC BY-SA 3.0, via Wikimedia Commons.


1 comments

Kumar says:

Thank you for giving us the industry problems to solve and reasearch.This helps to understand what is the next thing needed in industry.

Leave a Reply


(Note: This name will be displayed publicly)