Why Improving Auto Chip Reliability Is So Hard

Aging, adaptation, and new processes and technology require big changes on every level.

popularity

Tools and ecosystems that focus on reliability and the long-term health of chips are starting to coalesce for the automotive electronics industry. Data gleaned from a chip’s lifecycle — design, verification, test, manufacturing, and in-field operation — will become key to achieving the longevity, reliability, functional safety, and security of newer generations of automobiles.

Having sufficient amounts of good data in one platform means automotive companies and their supply chain can analyze it and act upon it, potentially predicting failures early enough to make changes in design, manufacturing, and in-field operations.

“It’s an entire ecosystem. Many things need to work together to create a positive feedback loop with respect to reliability and silicon design,” said Fadi Maamari, group director for R&D in Synopsys’ Digital Design Group.

This is harder than it looks. Some of these ecosystems consist of separate entities working together. They can include tools or platforms from multiple vendors, or which were developed in-house, to pull data from a variety of tools. In some cases, this involves collecting data from multiple pieces of equipment that previously was not considered essential for chips because most of the chips that were used in automotive applications were actuators, MCUs, or programmable devices.

Much has changed with the progression toward autonomous driving, and data now needs to harvested, correlated and structured so it can be assessed and acted upon in the verification, test, yield and field-monitoring processes. Data needs to be cleaned, converted into consistent formats, and analyzed. And while these steps in a feedback loop are not entirely new for some markets, they represent a significant shift for the car industry, which has worked on a relatively slow and manual level over many years. The challenge now is to modernize the entire supply chain, from design through manufacturing and beyond, and to provide much more detailed information from more sources more quickly than ever before.

“OEMs are looking for scalable ways to support the growing performance envelope of advanced software technology, and in parallel maintain electronic reliability in advanced nodes. But what they need is data to know when and how to fail gracefully, and balance availability with safety in their fail-safe mechanism,” said Gal Carmel, general manager of proteanTecs‘ Automotive Division. “It’s a tradeoff, and perfection does not exist today. But we are working hard to change that equation by obtaining accurate visibility on these systems’ health.”

Not all of this will happen at once, of course. “This is a phased delivery. said Randy Fish, director of marketing for Silicon Lifecycle Management at Synopsys. “We’re delivering on some content today.”

The rest will take time, as this requires retrofitting a massive global supply chain, and it will impact every stage, from procurement of raw materials to final test and beyond. “Given the breadth and depth of technology and communication standards that are in an autonomous vehicle, there’s not going to be one or two companies that come together with a holistic end-to-end test solution for cars,” said Jeff Phillips, go to market lead for transportation at National Instruments. “It’s going to be different people in the ecosystem working together, collaborating on how to interoperate and integrate our solutions. There will be strengths from manufacturing and test analytics applied to radar, lidar, and I/O, using cloud and infrastructure processing. But there also are a lot of different technology vectors that have to come together.”

The AV, EV push
Some automotive chips are getting more complex and delicate. This is especially true of the central logic in a vehicle, which will need to manage all other systems to keep that vehicle out of trouble. Current designs use 7nm and 5nm logic, which is at the leading edge of chip manufacturing, but these devices will have to withstand a harsh environment and work correctly at least 10 years longer than consumer chips.

Although mature nodes are still a mainstay, automotive chips have grown into large and complex SoCs. These include advanced-node devices integrated into heterogeneous packages and configurations that are relatively untried. In the past, electronic control units typically had a single processor or memory unit. That’s no longer the case, and everything from verification to various types of testing, including compliance testing, has become more rigorous.

Part of this is due to more driver assistance and the progression toward full autonomy. “Self-driving cars and latest nodes are shaping a lot of what is being done right now,” said Synopsys’ Maamari.

Part of it also is due to better efficiency and reliability. “The second mega trend is electrification with emission control,” said Uzi Baruch, CSO at proteanTecs. “There’s so much that they can do in parallel when introducing a new vehicle to the market that needs to be both fully autonomous and fully electrified.”

Functional safety vs. reliability
A given in any discussion of automotive electronics is the tight relationship between reliability and functional safety. Functional safety focuses on avoiding injuries, whereas reliability is about whether the car works and does not need repair. But with increasing amounts of autonomy, there is plenty of overlap.

“What happens if a rock hits the sensor? In addition to reliability on its own, we have to look at functional safety for self-driving cars, and the standard driving some of this activity is ISO 26262. It’s at the heart of a lot of things we do at the design stage,” said Maamari. “It’s okay for the chip to fail as long as it fails safely. That’s the pure focus of functional safety. If in a self-driving car, whether a chip fails or lightning strikes, it’s critical that the car does not crash. No injury is caused. Reliability, of course, is important. You’d rather have the chip not fail in the first place. So that’s desirable both for functional safety, but also for quality.”

Understanding failure is essential for the auto industry. “There are many opportunities for failure,” said Baruch. “Getting to the point where you can repeat the process in a controlled manner and find root-cause issues inside those production lines, or between plants or different suppliers, is creating a significant challenge for reliability. Whatever you’re producing needs to be repeatable, and you need to be able to trust it. That’s causing a significant shift in how these companies are operating.”

Reliability requirements for automotive electronics have been defined and graded by the Automotive Electronics Council (AEC), with AEC Q-100/200 the go-to standards for stress testing automotive ICs. Heat, humidity, and vibration are all risk factors that can destroy a chip, but materials, design, and manufacturing processes also can make chips more or less susceptible to risk factors. This gets complicated and details are important.

“Judicious use of thermomechanical modeling is needed through-out the development and qualification process,” writes Amkor’s R. Dias et al., in the 2019 research paper, “Challenges and Approaches to Developing Automotive Grade 1/0 FCBGA Package Capability.” “Polymeric materials undergo permanent changes when subjected to high temperatures for extended periods of time. Depending on the ambience, this may include material oxidation as well as mechanical property changes resulting in embrittlement. The presence of humidity can also lead to loss of adhesion at the die passivation and substrate solder mask interfaces.”

Beyond materials science, random errors such as an alpha particle striking a critical component can cause reliability issues. And then there is the issue of software reliability.

“Software is challenging because it doesn’t follow any rules of physics,” said Dennis Ciplickas, vice president of advanced solutions at PDF Solutions. “Hardware sounds hard, but it actually follows some boundary conditions. With software, you can change one thing and have massive unintended costs,”

One solution is redundancy, but that adds both expense and weight. Redundancy is normally the way to achieve reliability in aeronautical, but for automotive redundancy needs to be limited to specific systems within a vehicle.

“I come from an aviation background and redundancy is the way that we normally take care of reliability issues — three flight computers voting on who’s right. But we don’t have that luxury in an automobile, where we’re trying to save a nickel,” said Jay Rathert, senior director of strategic collaborations at KLA.

In a car, redundancy is balanced. “With a larger SoC, redundancies come in many forms,” said Maamari. The techniques are “at the higher level literally duplicating some CPUs or some blocks that have a poor function. You duplicate them, check the output, and make sure that the two get the same output, and then flag an issue if any of the two show something that is not consistent. That’s quite expensive, so it’s done for poor function, and it plays a dual role actually. You can constantly check whether they’re consistent, but that also it allows you to do some level of self-test during operation. You can bring one of the two down, do a self-test on it while the system still functions, and then bring it back. That is expensive because you duplicate an entire block or processor.”

Other redundancies are more precisely designed in. “Sometimes it comes just at the memory register level, like a flip flop in logic design, where you identify some registers that play critical functions. Either you replace it by a much more tolerant part, or you put in triple modular redundancy, where you have three of them and it is a vote of the three. So it is fine granularity, coarse granularity, and a variety of other techniques. It’s all a balancing act to keep the cost down.”

Not all data important for feedback loops
Collecting the useful data from cars and sending it into the design feedback loop is a routine that the industry will get better at in the next 5 to 10 years. “The car company, the chip design company, the semiconductor company is where we need to get much more detailed analytics to get the feedback — the positive feedback loop — so we can make the proper adjustments and the design technology and the libraries that are used,” said Maamari.

One of the key elements to making this work is transporting the least amount of data from the car. That means the car itself must be capable of sorting out the relevant data and events. “Some of it could be on the monitor itself that’s sitting inside the chip attached to a power supply, or to a thermal sensor or to a bus activity type monitor,” said Fish. “You can make a decision based on triggers. Do I want to bother keeping this information or not? Or maybe you only want the address and not the data portion that you’re monitoring. There are lots of local decisions, very local, that can be made.”

Managing all of this data will be critical. “In regard to too much data and how to put it together in a clean way, we’ve developed a notion called a semantic model,” said PDF Solutions’ Ciplickas. “Semantics is different than structure. It’s different than grammar and a schema. A schema is a way that you can relate the keys between all the different sources of data. But when you put some semantics on top of that, although they come from different sources with their own keys, you can see the data are actually very similar or the same to each other. So for different types of tools, or different types of sensors, even though they’re physically different and from different vendors, they’re logically the same thing. By identifying what’s semantically similar across all of the different sources of data, you can much more easily put that into a format to then extract out some useful results. By driving that notion of semantics up and down the chain, it might offer some way of putting everything together for that correlation analysis, or whatever we need to do with it.

The key is understanding what’s important for reliability, and that in itself is a complex task. “There may be 10, 15, or even 100 process steps when you look at the components as they are built across time,” said Baruch. “That idea of being able to ingest data from multiple layers — different layers, different processes, different sensors, different equipment types — and combine them all together, if you’re not coming with a descriptive approach, you will end up either going back to your engineering team every time you need to do something, or not being able actually to fulfill that use case because you’ll get stuck in different areas.”

Put simply, pulling that data from a car is not a data free-for-all. “It’s very important to only gather valuable information,” said Fish. “I know that’s hard to quantify, but you can’t send all the data all the time.” Finding the important data means having analytic capabilities on the car or in the part. “From very simple at wherever you can do analysis effectively, you should do that to basically minimize data that’s being transported eventually back to the cloud.”

Taking data gleaned from a chip’s lifecycle and plugging it back into the feedback loop is the key to reliability and functional safety. “The feedback loop into design comes in many different ways,” said Maamari. “For functional safety, one of the things that we can do is, as you measure aging, we can then conclude at some point in time that the device has aged to the point where it’s not safe anymore. Then we can raise a flag and say this car needs servicing before an injury or failure happens. You can do preventive maintenance to some extent.”

Predictive maintenance is a benefit of the feedback loop. “The capabilities for predictive maintenance are being designed into vehicle architectures today; the key is data. Data can be gathered from a huge number of sources from tiny MCUs running ML algorithms (for example, vibration sensors) to large central compute nodes,” said Tom Conway, director of product management, Automotive and IoT Line of Business, Arm. “That data canbe interpreted locally then communicated throughout the network in the car. This vehicle-level data can be aggregated and further interpreted before being transferred to the cloud (vehicle edge processing). Vehicle data can then be aggregated in the cloud into fleet or vehicle model data by car OEMs (or fleet managers, logistics companies, etc) to enable predictive maintenance, for example: ‘that tiny vibration on the wheel bearing [on aggregate for this vehicle type, in this geography, in this weather] can lead to failure in 1000 miles time. Suggest maintenance on this vehicle within 500 miles.’”

Of course, some of the assumptions from the traditional car days are breaking it down, and aging is one of them. The car electronics do not stop aging when the car is parked and turned off. “Even if it’s sitting in a garage, the self-driving cars will never shut down,” Maamari said. “They may be sitting in the garage, but they’re still alive, they still communicate through cellular, they update, they do self-checks in the background. The software updates will happen even if your car’s in the garage.”

Synopsys has a lifecycle management platform that pulls data from process/voltage/temperature (PVT) sensors, design for test (DFT), built-in self-test (BiST) resources, structural and functional monitors, embedded on-chip analysis, and data transport. The goal is to get information from a chip to the location where further analysis, control, and optimization occurs.

“The idea here is that with the sensors — and these tend to be process, voltage, temperature, and aging measurements with respect to path (for example, path delay) — and then the control IP that goes into the chip, to manage all of this,” Maamari said. “You can also use that data with analytics to sort out the outliers. That’s the feedback loop into design that comes in many different ways. For example, we can find the paths that are sensitive and that fail first, and then use this information when we do the design to actually increase the margin, to give us a little bit more slack in those, so that we can have a chip that will work for 15 years instead of 10.”

proteanTecs, meanwhile, monitors each SoC with deep-data using Universal Chip Telemetry (UCT). By compiling cross-stage measurements for each chip, machine learning analytics software develops a baseline for the expected behavior of any specific chip and the entire fleet. Systems are monitored using this approach to detect potentially abnormal behavior caused by wear-out, aging, or random failures. The goal is to gain deep data visibility for in-mission preventive actions, which includes dynamic adaptation and performance optimization, which is one of the main concerns around machine learning. As these systems adapt, it’s not always clear how that impacts other systems.

“Even before the vehicle is deployed to the field, we must ensure all reliability and safety measures have been taken,” said Rafi Spiewak, proteanTecs’ Content Marketing Manager. “By combining UCT data extracted from different stages of the lifecycle and augmenting it with additional data sources, manufacturers along the value chain can improve quality by tenfold, preventing quality escapes or ‘walking wounded’. This is obtained through highly advanced outlier detection methods that weed out undetected defects, without affecting good yield. Even during characterization and qualification, performance limits are tuned and optimized to ensure sufficient reliability margins.”


Fig. 1: A feedback loop that feeds rich data sets from various stages of the products design and production into an analytics engines that are connected to sensors and monitors embedded in the chips. Source: Synopsys

Ownership of data also makes the feedback loop and lifecycle management a possibility now. “We’re talking about gathering data from deep inside a chip inside your car, and we think the market is addressing that,” said Fish. “Today, data is being gathered from cars and is being shared within the automobile companies. Over time, depending on the region, the legalities, general comfort level of gathering data, and the ability to protect that data as far as confidentiality and different uses that we would have for it, ownership of data will be an enabler for us to do deeper analytics throughout the life of a design. The question comes up when the customer buys the car or maybe leases the car. Are they comfortable with that data being sent out? The markets kind of voted on that and said, ‘Yeah, we are.’ There’s data being sent all over the place all the time.”

Conclusion
One thing is certain — unreliabililty is not an option. “The car industry has made enormous progress on quality over the last few decades, and customers love it,” Maamari observed. “Nobody will buy a car that’s not reliable today. So reliability remains fundamentally important.”

The ecosystems, platforms, and tools for monitoring have just begun to improve to help assure reliability. “Some of that technology is already here, and this is not just in the future. We have many of these components already in production in use today. And there are other pieces that we are adding to it quickly, working with customer partners to push the envelope,” he said.

But a variety of options and approaches to feeding data into the design feedback loop will continue to exist. “We believe the solution is a multifaceted approach. There is no one single approach that solves everything,” said KLA’s Rathert. ”Well-designed, well-built, low-defectivity devices that are well-fabricated using tight processes are the foundation to build on. But we don’t think that’s the only answer. Nor do we think test by itself is the only answer. The merger of these, along with real-time diagnostics and the capability to look across the supply chain and find weak points from design to the final system — all of these things coming together are the industry’s best hope to create a zero-defect solution. It’s not just one of us.”

Related Stories

Predicting And Avoiding Failures In Automotive Chips

Making Chips To Last Their Expected Lifetimes

Chips Good Enough To Bet Your Life On

Growing Complexity Adds To Auto IC Safety Challenges

Sensor Fusion Challenges In Cars

Reliability Becomes The Top Concern In Automotive



1 comments

NAVDEEP Singh Solanki says:

Very informative. It is imperative on part of manufacturers to ensure fail safe level of safety and graceful degradation of reliability in autonomous environment. As,there is no/little room for human intervention/reaction to avoid accidents like boeing 373 max. Complete life cycle of processors and likely failure in various environments and eventualities be known before incorporating in system design is must. One can not have dark space and uncertainty in passenger automobiles for mass use.

Leave a Reply


(Note: This name will be displayed publicly)