Automakers Changing Tactics On Reliability

Focus shifts to more data-centric approaches as chip content increases.


Automakers are beginning to rethink how to ensure automotive electronics will remain reliable over their projected lifetimes, focusing their efforts on redundancy, more data-centric architectures and continued testing throughout the life of a vehicle.

It is still too early to really know how automotive chips actually will perform over the next 15 to 20 years, especially AI logic developed at leading-edge nodes. TSMC’s volume production of 7nm chips only started in April 2018, and none of those chips has been used under the intensive road conditions that automotive chips will have to endure. In fact, TSMC didn’t announce volume production of 28nm chips until October 2011. So if cars will be using the latest process nodes for autonomous driving in 10 years, they may be using transistor structures for which there is no history at all.

Moreover, they will be running applications, such as AI and 5G, both of which are brand new. And they will be using those applications to guide vehicles through a litany of unexpected hazards, which is why most automakers are focusing on assisted rather than autonomous technology. But the reality is many of these systems will have autonomous capabilities built-in, even if they aren’t used right way, as they assess how these systems behave under extreme stress.

5G communication, in particular, adds a whole new set of questions for which there are few good answers.

“What people are finding is that it is a lot more difficult to do than a typical digital baseband,” said Kurt Shuler, vice president of marketing at Arteris IP. “There is a lot more processing involved and the SoC architectures are lot more complex. The industry is having to deal with this given how we done things with GSM and CDMA modems that we were used to. Here’s all this additional stuff we have to deal with for 5G.”

Most consumers will experience 5G as enhanced mobile broadband, typically using sub-6 GHz 5G, which spans from 450 MHz to 6 GHz. The millimeter wave bands (24 to 86 GHz) offer the fastest speed but are problematic, especially if anything is in motion. Rain, trees, walls and other cars all interfere with mmWave signals. A mmWave antennae would need to be outside the car because any part of the car, including the glass, would block the signals.

“We’ve seen is a lot of nervousness around trying to implement a mobility test case using millimeter wave that is being able to follow the user as it moves, and then execute a handoff from base station to base station, like you would do with a 1 GHz cellular signal,” said Alejandro Buritica, senior solutions marketing manager at National Instruments. “So we know that up to the current 4G bands or frequencies, that’s been done, it’s proven, no problem with mobility at those frequencies. But when you’re trying to implement some of the technologies that 5G requires, like massive MIMO, and trying to track multiple users and do that at fast speeds at millimeter wave, then it becomes a really, really difficult problem.”

Processing goes up significantly. “You would have to re-compute the channel state eight times per millisecond,” Buritica said. “That becomes very computationally intensive and very, very difficult. The idea is to have those high-bandwidth channels exchange data with the vehicle once it’s static, such as at a stoplight. So you can have small cells that allow for a quick exchange of a lot of data between the vehicle and the infrastructure, but the car has to be completely stopped. The problem is that as vehicles become more autonomous, maybe in a few decades, we will not see stoplights anymore. The car will just negotiate crossing the intersection with other autonomous vehicles and you won’t have to stop. There are a number of conflicting ideas there. But what they know is that implementing mobility at millimeter wave is currently a really difficult problem.”

In fact, there are doubts about whether it is feasible at all. “It’s a much higher frequency and the range is shorter,” said Shuler. “In a phone or a car, you want to reduce the amount of power for transmission. The good thing is if you do beam steering and steer to the next node in the 5G network, you can save a lot of power, rather than doing something omnidirectional. But you need to have a lot of processing to figure out where that is. And you have to some kind of predictive analytics, too. For our customers it has been new and exciting, and a lot of these folks that say they are coming from a traditional wireless background are trying to figure it out.”

Despite all of this, there is general agreement that some version of high-speed communication will be required. David Fritz, senior autonomous vehicle SoC leader at Mentor, a Siemens Business, said that 5G will be a tipping point for assisted and autonomous driving, but most likely in the sub-6 GHz range. “We’re worried about keeping decision-making in zones, so you may have a left front, right front, left rear, right rear. That allows these zones to react incredibly fast to information, such as when a light turns green and an ambulance is coming, no cars should be moving. The car knows this because of the 5G infrastructure. You can think of this as one extra human sense. But you probably don’t need millimeter wave for this. When two cars exchange information, it can involve a minimal amount of data.”

Less data transfer allows for quicker processing, lower latency, and uses less power. In effect, this is like SMS between vehicles.

But how it will hold up over time isn’t clear. “When it comes to failure rates and design for life, we are just beginning — along with many of the other challenges with autonomous driving and ride sharing — to look at what it means to have people drive cars more than an hour and half a day,” said Lance Williams, vice president for automotive strategy at ON Semiconductor. “This could be a ‘driving 22 hours a day’ type of scenario.”

Testing all the time
Ask two people how devices will hold up under this kind of strain and you are likely to get at least two different answers. But everyone agrees these systems need to be monitored throughout their lifetimes.

“Having parts last for 18 years is possible, but you cannot say there will be no failures,” said Evelyn Landman, CTO at proteanTecs. “What’s needed is the ability to send out alerts with enough time to be able to take action, so there is less damage and controlled RMAs. From a DPPM (defect parts per million) or DPPB point of view, you’re still going to have some defects that are so minor, they’re hard to see. You will have leakage current, electromigration and NBTI (negative-bias temparature instability). In the field, when these devices are in use, aging will take its toll and performance degradation will cause device wearout.”

Understanding their impact requires sufficient coverage in a device, and one important aspect of that is in-field and in-circuit monitoring.

“In order for this solution to work in safety critical applications, you need to go for higher coverage than in the past,” said Landman. “With increasingly autonomous driving, you need to track vast amounts of data at every point in time. You need to make sure the hardware does not fail on the road. And predicting problems is the key. If you can detect issues before they develop into system failures, you can proactively take action and avoid damages. You can gain that visibility by applying on-chip monitoring combined with AI-based analytics. ”

Machine learning has a big impact in this area because it can find things in data that people cannot.

“We really need machine learning,” said Tomasz Brozek, a technical fellow at PDF Solutions. “We’ve been putting structures on silicon that provide data about quality and manufacturing. Those structures monitor degradation rates about drift in manufacturing. They also can monitor process windows for things like contact/gate weaknesses and a breakdown in leakage. That can be done on every wafer and scribe line.”

He said the key in automotive is identifying weak structures. “If there is not good coverage, you get leakages that are typically at the noise level. You need to design these devices for inspection, and that includes sub-micron test structures. This has no impact on area or power, because they’re generally embedded in a dark area of a chip, which most designs have. They also can be put into gray areas between blocks. They don’t participate in the operation of the chip, but this type of test structure can tell if a chip is at risk. Then you close the loop by collecting data from manufacturing, from test, and from every die.”

An important aspect of all of this is the ability to loop back data into the manufacturing process, so that defects can be analyzed and potentially fixed in future generations of products. Current data analysis has become sophisticated enough to be able to be able to trace problems back to a particular wafer in a particular manufacturing lot, and with machine learning that can be compared to other wafers on the same day or different days to determine if something different happened with that particular chip or wafer. If not, it might be a more generalized problem, or it might be a random defect that will never crop up again.

But there also is a big financial incentive to getting this right.

“Imagine if you have 40% scrap,” said Uzi Baruch, vice president and general manager of the automotive business at OptimalPlus. “So the real cost is the time it takes you to get to market and the amount of scrap. In dual-camera systems, you need to match two different cameras in a module so they behave the same. Once you glue them in, if they’re not the same performance, that whole module is done. What we’ve been doing is applying scoring mechanisms to measure and compare different cameras. So if you have 400,000 lenses and you’re grouping data on CMOS issues, you may see one or two modules that are different or things on the edges that you wouldn’t notice with one lens, and then you go back to the design.”

New techniques
During design and manufacturing, checking at the block and flip-flop levels will help. After manufacturing, using logic built-in self-test (LBiST) and on-chip monitoring are essential.

“Ultimately, flops are those things that affect reliability,” said Steve Pateras, senior director of marketing for test automation at Synopsys. “If a flop can be made fault-tolerant, then you eliminate that contribution. If you go down to the flop level, we can provide a list of flops that are affecting the metric and we can automatically replace those flops with fault-tolerant ones.”

It’s not just about finding the errors but fixing them. “There are different levels of capability that we provide,” said Pateras, “But the long and short of it is, we can quickly estimate the metrics and we can provide direct actionable guidance on how to fix them. Depending on the ASIL Level A, B, C or D, you need to achieve certain metric levels, and those metrics need to be in a certain value: 90%, 95%, 99%. So our customers are faced with a couple of problems. One is to how do you actually measure those metrics accurately? And then, once those metrics are measured, how you you actually improve them if necessary?”

Another tack is to rethink how and where data gets processed in a vehicle. “Over the last couple of years, we’ve seen a whole slew of different approaches,” said Mentor’s Fritz. “Now they all seem to be coalescing and heading in the same direction. So rather than everything being processed in a single device, we’re seeing a push toward multiple levels of redundancy and isolation of sensitive compute elements. Some companies are going for liquid cooling, others air cooling. But the bottom line is you need to design an enclosure for all of these systems, and you need real data very early in the design cycle to make this work properly, which isn’t always available. You can’t just pull out a spreadsheet and expect it to work. You need to measure power, calculate area and do thermal analysis.”

The new direction is to do much more computation closer to the sensor in order to greatly reduce the amount of data that needs to be moved around a vehicle. But that also requires fundamental changes to how vehicles are designed today, basically using data as a starting point to design the system rather than creating a supercomputer to manage any possible corner case.

“You need to know more about a sensor’s capabilities,” said Fritz. “That’s essential to decrease the amount of bandwidth, decrease the amount of data that needs to be moved, which requires less power and less cooling. So if you think about algorithms for object detection and classification, the video is coming in at 30 to 60 frames per second in high resolution. That’s terabytes per second. But even in the worst-case scenario, which is a busy city, you only need a tiny fraction of that. Most of it can be discarded. The answer isn’t sensor fusion. It’s processing less data.”

Self-test and calibration
That also greatly simplifies the design, and makes it easier to test systems, and those tests can be run at any time over the course of their lifecycle.

“You want to be able to run a test of electronics at different times during the operation of the vehicle,” said Pateras. “The typical ones are called key-on and key-off. We turn a car on, you know you powered it up or powered down, you want to go through a certain level of testing for the electronics. Our customers also want to be able to run periodic tests while the vehicle is operating. You want to be able to run a certain level of testing on the electronics on some period on some interval.”

LBiST is not always reliable. “You’re definitely going to need a comprehensive on-chip test and reliability system for automotive,” said Pateras. “We’re seeing the requirements continue to evolve and expand. At first, for certain automotive parts, it was fine to just turn on the car and do a quick test. But with anything to do with self-driving, all of our customers are requiring periodic testing. They’re requiring that it’s all be a solid ASIL D, and you need to be able to verify that and achieve that.”

One test approach is not sufficient for that. “You want to have ability to monitor certain things on chip, certain signals and things like power rails and PLLs and clocking structures to see if there is any shifts or drifts in those functionalities or the performance,” he said. “And then you want off-chip analytics to be able take this data and analyze it to see if it’s trending in the direction in which would predict some failure at some point in time.”

This is particularly important for the sensors that provide the data upon which a car needs to react. As sensors age, they can drift. They also can become dirty, which can limit their effectiveness. The key here is to be able to have a benchmark against which to measure those changes.

“When you design a sensor, you really need to know its behavior,” said Fritz.”If it ages, the sensor company knows what’s happening. You don’t have to adjust the training on the AI systems, but you do have to incorporate that into your inferencing. As the sensors degrade, you need to be able to compensate for that degradation so that you are confident in the results.”

The bottom line in automotive, whether it’s 5G communication or the design of sensors and data networks, is that reliability is a multi-faceted challenge.

“When these parts get manufactured and go through their first level of testing, you know — especially in markets like automotive — that you have defective parts per million requirements of zero,” said Brady Benware, vice president and general manager for Tessent at Mentor. “That’s the target. What that means in reality is there are very few single-digit kinds of defective parts per million, but that’s quickly transforming into defective parts per billion coming out of initial test.”

After that, there need to be failure mitigation strategies have to be in place once these devices are deployed in the end-use environment. “So functional safety is really all about ensuring that if a failure develops in the field, if these parts degrade in the field, that there is a safe way that those parts fail and the broader system can respond to that,” Benware said. “That requires a lot of analysis in the design phase of these devices to understand what areas of the device and what functionalities are susceptible to failure. What are the effects of those failures if they do occur? And then you need to insert additional circuitry within the devices to ensure that if there is a failure, that it is tolerated or detected so that it can be responded to in the system.”

— Ed Sperling contributed to this report.

Related content:

BiST Grows Up In Automotive

Planning For Failures In Automotive

Auto Industry Shifts Gears On Where Data Gets Processed

The Growing Challenges Of 5G Reliability

The Race To Zero Defects

Finding Faulty Auto Chips

Different Ways To Improve Chip Reliability

How 5G Affects Test

Leave a Reply

(Note: This name will be displayed publicly)