Chasing Reliability In Automotive Electronics

Supply chain changes, resistance to sharing data and technology unknowns add up to continued uncertainty.


Assuring reliability in automotive electronics has set off a scramble across the semiconductor supply chain and unearthed a list of issues for which there is insufficient data, a lack of well-defined standards, and inconsistent levels of expertise.

Reliable functional safety that spans 18 to 20 years of service in harsh environments, or under constant use with autonomous taxis or trucks, is a massive undertaking that will require engineering advances in areas such as artificial intelligence, LiDAR, radar, and vehicle-to-vehicle communication. And it will require management of a global supply chain that is populated by startups, chipmakers with no automotive experience, and automotive suppliers with little experience in advanced electronics.

At this point no one knows exactly how reliable a 7nm AI system will be, or how effectively it will fail over to another system in case of a malfunction. In fact, no one is even sure what are the right questions to ask during testing. Communication among all the suppliers up and down the supply chain has to be clear and open, yet some suppliers protect their IP by withholding important data, leaving car manufacturers to discover some data for themselves. To make matters worse, the rules for pulling all of this together are spotty, at best.

“At this time, there is no generally agreed upon technical strategy for validating the safety of the nonconventional software aspects of these vehicles,” wrote Carnegie Mellon University’s Philip Koopman and Edge Case Research’s Michael Wagner, in a 2018 paper presented at the 2018 SAE World Congress. “It seems that many HAVs will be deployed as soon as development teams think their vehicles are ready—and then they will see how things work out on public roads. Even if pilot deployments yield acceptably low mishap rates, there is still the question of whether a limited scale deployment will accurately forecast the safety of much larger scale deployments and accompanying future software updates.”

The lack of governmental regulations on self-driving cars leaves the consumer at the mercy of a competitive, nascent autonomous vehicle (AV) industry. But these industries have a lot to lose if they fail. That economic threat combined with the continuing evolution of the ISO 26262 standard, may be the saving grace. ISO 26262 requires tracking all materials and parts at all points in procurement and manufacturing, setting the stage for a culture of safety behavior and cooperation among suppliers. A post-mortem diagnosis of failure looks like an aeronautical investigation. It almost goes without saying that the testing and tracking process is more expensive for safety-critical systems especially, whereas reliability and good quality are still important selling points for non-safety critical systems, like infotainment.

Flying blind at advanced nodes
The only way to really know what the lifespan and reliability of advanced nodes is by looking backward. “The biggest problem with advanced nodes is that you need to get reliable data for the stress screening test, and you don’t have them before an advanced node has been in production for a while,” said Gert Jørgensen, vice president of sales and marketing at Delta Microelectronics. “You can use the old methods of simulating lifetime, but you actually don’t know if this model is okay before the time has gone. The tools are there because you impose the old model to the new technology, but you actually don’t know if it is waterproof before the time has gone.”

Confidence in the reliability of parts goes up with time. “When you have years to debug your process, you’re naturally going to have higher reliability,” said Jay Rathert, senior director of strategic collaborations at KLA-Tencor. “But when you’re putting 7nm and 10nm parts in there, those processes still have a lot of maturing to do. There are still a lot of systematic defects and integration challenges that haven’t been debugged yet.”

Most automotive chips aren’t developed at advanced nodes. But the ones that require massive compute power to make split-second safety-critical decisions, such as AI, will require the highest density available. That creates reliability questions that have largely been ignored at advanced nodes because most of the chips developed using those processes are used in consumer devices or controlled environments.

“Newer manufacturing processes generally will produce a greater number of defective parts than older process technologies that have had time to mature,” said Brady Benware, senior marketing director Tessent product group at Mentor, a Siemens Business. “The use of the latest process technologies in automotive applications presents two key challenges. This higher defect density means post-manufacturing test must achieve a higher level of defect coverage to achieve the same level of quality. The traditional approach of using abstract logical fault models to generate test sequences that detect defects is no longer sufficient. Achieving automotive-grade quality levels for sophisticated ICs using advanced process nodes requires that test pattern generation be aware of how and where defects can manifest physically, and it must be aware of how those defects behave in an analog sense, not just a digital sense.”

Benware sees more of the defects inside cells. “Prior to finFET process technology, it was common to see about a 50-50 split of defects inside logic cells and defects in the interconnecting wires. With the introduction of finFET, the process complexity to manufacture the transistors and associated logic cells has grown disproportionately as compared to the interconnecting layers. This discrepancy is expected to continue into 5nm, 3nm and below with the introduction of ever-more exotic transistor technologies. Now that automotive ICs will leverage these advanced nodes, more must be done specifically to test for defects inside the cells.”

All automotive electronics—especially safety-critical parts and systems—now undergo rigorous testing during and after manufacturing. The goal is to weed out chips with infant mortality or child-sick devices: the devices that are going to fail early.

“Every device is going through accelerated lifetime, and then you do that for let’s say 128 hours—one full week,” said Jørgensen. “You test the devices, you put them in ovens, accelerate the life and take them out after one week, and then you have simulated one year’s life approximately. Next you put the devices in the car or in the modules, which goes in the car, and they should last for the next 20 years. By [doing this], you get rid of what’s called infant mortalities or child-sick devices.”

Step two takes the testing further. “And then you have another part of the batch, which is a lot of the total produced batch that you put in the same chamber,” said Jørgensen, “but it stays there for 1,000 hours. It’s a lot of 1,000 components, and then you accelerate the life and then you see if those 1,000 components can last for three months corresponding to approximately 1,000 hours. And that should then yield 20 years simulated lifetime. So, we have 1,000 devices are passing that point, you conclude that the rest of the devices will also do that. So that’s how you do quality assurance on automotive parts, and that’s why they are so expensive. You have a lot of QA gates to pass before you can put them in the car.”

One of the issues with reliability is it directly proportional to cost. In design of automotive safety critical components and systems, up and down the supply chain each supplier has more steps to do, which adds more test time and requires more tests, which in turn drives up the cost. And while strategies are being developed to test more concurrently, the cost continues to rise.

“There’s definitely a lot more attention being focused on the early part of manufacturing and test,” said Anil Bhalla, senior manager at Astronics. “Automotive test is the most complicated and expensive, and right now everyone is trying to sharpen their pencils and figure out how to cut costs. Automotive is driven by a lot of data. It’s very careful and methodical, and it happens across wide temperature ranges. But there’s also a lot of redundancy in the test flow, and the focus is trying to find the right coverage at the right insertion point. This is made more difficult by the fact that for the first time, automotive is getting more leading-edge parts. We’re seeing 7nm parts in automotive, and if you look at where the growth is happening in semiconductors, automotive is one of the largest segments.”

There are two different approaches to solving this problem. One is to utilize system-level test, which is more expensive but allows testing to be done in the context of an actual system. It’s not clear yet whether system-level test will actually increase the overall cost, though, because temperature typically requires three different insertion points, while it could be just one with system-level test. The other is focus on cost first, and to figure out what may or may not be necessary to test.

“The problem is you can’t do both, because there are too many moving pieces,” said Bhalla. “In a consumer device, you may change out parts every six months. But in automotive, they’re talking about zero defect and parts per trillion. That has to be balanced against who can afford that.”

Fig. 2: ISO 26262 fault reference. Source: Arteris IP

Not all faults are the same, and not all of them are predictable. ISO 26262 recognizes systemic faults, which are the faults we can find and perhaps predict and fix, and the random faults that fall into the line of “stuff happens.”

“Car manufacturers are registering all failures to see if it is a periodic failure or if it is a random failure,” said Delta’s Jørgensen. “You, of course, have fast reporting systems. When we find a failure, we need to determine if it has an effect on the rest of the population, whether this is a random failure.”

So there are quality measurements and methodologies for dealing with that, and procedures for what data needs to be stored. Everything is logged and registered the way it is done with airplanes, and it is supposed to be kept for 15 to 20 years. But even that may not be sufficient.

“While many reliability failures could likely be predicted by monitoring for subtle shifts in performance of the built-in self-tests, I don’t think that predicting failures will ever be 100% accurate,” said Mentor’s Benware. “Many reliability failures will have no indication before they occur. As long as 100% accuracy is not attainable, failure mitigation will be the priority over failure prediction.”

One big, tangled supply chain
The whole automotive supply chain has to be involved in a safety culture now in the effort to make automotive systems reliable and safe. Reliability is a team effort.

“Everywhere we turn we hear about zero defects,” said KLA’s Rathert. “In the last two years, we’ve pivoted to align our tools, people, methods, partners, to help make that a reality.”

That’s one piece of the puzzle. The other is understanding exactly who plays where in the supply chain.

“You’re certainly seeing players in the game you wouldn’t have seen five years ago,” said Rob Cappel, senior director of marketing at KLA-Tencor. “There are people designing their own chips—Google, Apple, Amazon. That may not just be for cars. They’re looking at artificial intelligence. The ecosystem as we knew it five years ago is changing. With automotive, the ecosystem across the board, from the big players all the way down to the semiconductor fabs, all agree that quality and reliability are key.”

At the same time, those supply chain relationships are becoming more complex, explained Kurt Shuler, vice president of marketing at Arteris IP, in a paper about ISO 26262. “Traditional semiconductor vendors who are making or designing chips to enable autonomous driving applications are nowadays sometimes competing with Tier-1 electronic system designers and OEMs, who may be making their own chips or providing explicit requirements to their semiconductor vendor partners. Additionally, new entrants like Uber, Waymo and Apple are designing their own complete systems, despite their relative lack of experience in the automotive industry. ISO 26262 mandates high levels of collaboration and information sharing throughout the value chain that may be unfamiliar to new entrants.”

Fig. 03. Automotive supply chain. Source: Arteris IP

The ISO 26262 standard is snapshot of issues and the lengths the whole supply chain has to go. Collaboration is key. Communication is part of the safety standards up and down the automotive safety critical supply chain now. It’s built into the standards.

Sharing knowledge of a supplier’s crown jewels—intellectual property—has to happen among suppliers and auto OEMs. “Participants in the semiconductor and software supply chains are usually secretive about how their IP was developed and how it works in detail,” said Shuler. Suppliers should remember that the “your customer still has an obligation to confirm your compliance with ISO 26262.”

This presents some interesting challenges for companies leveraging IP, as well, because IP characterization can vary greatly. “If you want to compete in the marketplace, you need to leverage IP better than before,” said Ranjit Adhikary, vice president of marketing at ClioSoft. “This is why you hear a lot of auto companies talking about IP management. But you also want to make sure that people working on IP don’t see other IP.”

The value of IP goes up as it is certified and tested in silicon in real-world applications. “For us as an IP supplier into these markets, we also go through automotive qualification for our IP into these applications,” said Graham Allen, senior product marketing manager for DDR PHYs at Synopsys. “So when vendors purchase automotive-grade IP they know they are going to get IP that, once they get an automotive grade certification for their chip, their IP won’t cause them any problems in that regard.”

Carmakers verify and validate the parts for themselves, as well. “We actually take everybody’s designs apart and put them back together at Ford,” said Keith Hodgson, a senior reliability engineer at Ford Motor Co. “We go through a worst-case circuit analysis process where we actually help them redesign for our actual auto customer usage and mission profile.”

The data Ford wants from IC designers and manufacturers is how long a chip will actually function properly using worst-case temperatures and shock for the 99th percentile customer. “And then have [IC manufacturers] hopefully share the data with us so we get estimates on how long you think it will go for our worst-case customer and then try to have a way to mitigate that before it fails.”

Failure is inevitable at some point, but what to do about it opens up all sorts of options. “At Ford, we assume that parts are going to fail, so we’re trying to mitigate failure with prognostics—a prognostics approach where we want the integrated circuit manufacturers to help us understand the models for degradation so we can build it into our software that runs the chip and do an estimate on what we think is useful life. Then, autonomous is easy. Just have the vehicle drive itself home and change modules that are on the edge of failure. It is the customers who would ignore a wrench light.”

Automakers are looking to the chip industry for detailed data on degradation models, much of which doesn’t exist yet.

“One thing that drives us a little crazy with the semiconductor guys is that they use surprisingly sophisticated tools at these smaller process nodes to predict reliability of transistors,” said Craig Hillman, CEO and managing partner of Dfr Solutions. “But then when their users ask about reliability, they say well it just 0.70 v and constant failure rate 77 devices, no failures.”

Dfr isn’t alone. “We’re in talks with a number of German carmakers, and they basically have the same issue,” said Roland Jancke, who heads the Department Design Methodology in Fraunhofer’s Engineering of Adaptive Systems Division. “They do not get enough information from what the technology is capable of. If you are thinking about the latest technologies—10, 12, 7nm —then the issue is that they don’t get enough information.”

In the past, the supply chain used a waterfall model, where the OEM would give a spec to a Tier 1 supplier, and then they would decide which Tier 2 player to involve, and so on down to Tier 3 and Tier 4.

“This model is no longer working,” said Jancke. “It’s too slow and there’s really not enough information given along the line. This is a rather long line—a value chain of sorts—and there is some information that is not handed over the whole chain. Therefore, the carmakers don’t have the complete picture of what they’re getting when they involve certain technology in their cars. We hear from numerous OEMs that they are starting to break up this value chain. They are starting to directly contact the technology providers and the foundries because they want to know what the technology is really capable of, especially in the advanced technologies. And they want to know what foundries are testing, what they are doing to make sure that the technology will last over 20 years or whatever the requirements are.”

He noted that foundries are particularly interested in this because it provides a two-way information flow. “What one carmaker told me is they cannot penetrate the Tier 1 layer. They only pass the information that is minimally necessary and not all the information. The reason is there are business relations between the OEM and Tier 1. There are legal reasons for that. There are a number of reasons why they don’t divulge all the information, but from the OEM to the technology provider to the foundry, there is no direct legal connection. Therefore, they can talk on a business level, but not on a technical level with each other. The foundries are interested because they want to know what the end user—the OEM—is really doing with these chips. What are the application conditions that the chips are supposed to work in?”

Other reliability issues
The automotive supply chain reaches deeper and farther than just chips and IP. It also includes materials that are used to create chips in the first place, as well as the materials that are left when the manufacturing processes are complete.

“Reliability starts with the supply source and the engagement with the supplier,” said Terry Brewer, CEO of Brewer Science. “They have to bring in a certain level of quality and capability. So you need to look at the source, their relationship with the supplier, and the expectations of the supplier. In the old days, we never had in-person relationships with suppliers. Now we do, because you have to run hundreds of tests on materials. We construct new materials from scratch, so we need monitors and pre-monitors. If we don’t do all of that, there’s no chance we will get the reliability we need. If you look at More Than Moore, materials are synonymous with reliability, and the analyses are more sophisticated and complex.”

Brewer said that because tolerances are shrinking at advanced nodes, and in systems where there are more electronic components than before, customers are demanding even lower defect rates than in the past. “5 parts per billion probably exists somewhere, but with 5 parts per trillion, we’re not sure that even exists in reality. We’ve moved from Newton to quantum with 5 parts per trillion. And it’s not just the chip. It’s the system integration, which can be both the savior and the challenge.”

Some of this can be spread out over a system. “With a system, you can modify reliability,” he said.” So you might be able to lower resolution and still get the same performance out of a computer. So if you’re supplying to a chipmaker, they may want parts per trillion. If you’re supplying to a system vendor, they can be more comfortable with a lower number.”

All of this has a big impact on reliability of automotive electronics. But it also raises some interesting questions about reliability in the first place.

“The key issues are whether you can make it more reliable, and whether you can tolerate it being less reliable,” said Sanjay Natarajan, corporate vice president at Applied Materials. “And when is it not reliable enough? There’s the how do you make it reliable, and even there we can cut it up into the traditional von Neumann AI, where we live today. There you’re talking about good reliability with a digital device, and you still want to scale. This all breaks if you don’t have more transistors and more power efficiency every couple years. If you don’t have that, you’re stuck. So you’re really talking about how to make things reliable in the presence of making them smaller and run cooler. That has its own massive collection of challenges. And then, say we go to a more brain-inspired approach. Then you can no longer hide the variation behind digital, which is what we do today. All our transistors have variation today, but that variation is all smaller than the clock speed, for example. So transistor A might switch fast, transistor B might switch slow, but as long as all of them finish switching within one clock period no one notices that variation. The digital world has hidden that variation. Now, if you want to go analog, it’s more energy efficient. But then you have to get the variation under control. Part of what we’re working on is, because you can’t hide the variation, now you have to eliminate or minimize the variation. And that’s where some of these integrated materials solutions come into play.”

There are two issues that are creating reliability concerns on the automotive side. One is soft errors in the electronics. The other is the more classic type of fault.

“With soft errors, the challenge we’re facing is how to isolate or reboot part of a chip,” said Kurt Shuler, vice president of marketing at Arteris IP. “With transient IP, you want to turn off part of the interconnect, flush the data, isolate it, turn it off and reboot and sync that back up. With a permanent fault, you want to isolate that but not reboot it. You want to make sure you can get to the shop using degraded or emergency mode.”

Shuler noted that at present, packaging issues are causing more problems than the silicon in regard to aging and reliability. (Those are standard plastic or ceramic packages, rather than advanced packaging approaches.)

What consumers want
Reliability in automotive vehicles means users can depend on them to work when called upon without any performance issues and need for repair. Over time, how often the vehicle doesn’t work as advertised or is completely out of commission will give users a feel for the conveyances’ overall dependability.

“There are a lot of differences in assumptions on the algorithms to drive a car,” said Jeff Phillips, head of automotive marketing at National Instruments. “Some want a smooth ride. Others are focused on 100% safety. There are a lot of decisions that need to be made in the algorithms themselves, independent of the supply chain. Across all of this, reliability will be a differentiator.”

“If you buy a car, you generally go by brand affinity and perception of car reliability,” said David Hall, principal product marketing manager at National Instruments. “The problem is that with electrification, reliability is an unknown. There isn’t 10 years worth of data for a [Chevy] Volt or a Tesla. And there’s another side to this, which is the service level (Uber, Lyft and others). That will be judged more by user experience and reliability of service than the car itself.”

Hall noted that code running on sensor fusion devices will change over time, as well.

“There are chip-level reliability concerns, too,” he said. “Most of the problem areas are electrical, where they use parts that were not designed for an automobile. A lot of what’s happening today is people are designing for a scenario rather than designing parts for automotive. That will change as the industry standardizes on ISO 26262 for ADAS across all auto models. That will speed everything up and help us get to full autonomy in hardware, but we also need this in the software development process. This is happening in places like Korea, where anytime there is an accident, they fill in a standardized form that is used to change sensor fusion algorithms. This is heavily mandated in Asia. In the U.S., though, car manufacturers don’t have to share crash data.”

In general, automotive reliability is improving. J.D. Power found that reliability rose 9% in three-year-old models (2015 models, surveying owners in late 2017). Less than three years may be a good time for an initial assessment of a car’s general dependability, but 20 years—and zero defects in safety systems—is the expected life-time of a car. Overall, cars now average 142 problems experienced per 100 vehicles (PP100). The most reliable vehicles in the study hit 99 to 100 PP100.

“In-vehicle technology continues to be most problematic,” J.D. Powers reported, “Audio/Communications/ Entertainment/Navigation (ACEN) remains a troublesome category for vehicle owners, receiving the highest frequency of complaints. The two most common problems relate to built-in voice recognition (9.3 PP100) and built-in Bluetooth connectivity (7.7 PP100).”

While this is good news for the auto industry, there’s still a very long way to go.

Related Stories:
Making Autonomous Vehicles Safer
What needs to be tested, and what’s the best way to make that happen?
AV Testing Advances Without Standards
While U.S. struggles to make rules for self-driving cars, industry works on streamlining validation.
Auto Chip Test Getting Harder
Each new level of assistance and autonomy adds new requirements and problems, some of which don’t have viable solutions today.
Who’s Paying For Auto Chip Test?
Complexity, advanced nodes, harsh conditions and safety concerns will make testing more time-consuming and expensive.

Leave a Reply

(Note: This name will be displayed publicly)