How To Make Autonomous Vehicles Reliable

Making sure ADAS designs function correctly over time will be an enormous challenge.

popularity

The number of unknowns in automotive chips, subsystems and entire vehicles is growing as higher levels of driver assistance are deployed, sparking new concerns and approaches about how to improve reliability of these systems.

Advanced Driver Assistance Systems (ADAS) will need to detect objects, animals and people, and they will be used for parking assistance, night vision and collision avoidance. If all of that works as planned, they will improve the safety of automobiles and reduce the risk of dangerous accidents.

But ADAS involves a lot of technology. Some of it has been field tested for years, such as sensors and cameras. Some of it is brand new, such as advanced processors developed at the latest process nodes and intelligent software for continuously sensing and collecting data about the environment. On top of that, the data that is collected has to be processed in real time. And ADAS systems are further evolving to create a network of intelligent systems using vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) communications to pave the way for autonomous driving.

Taken together, this is an unprecedented level of system complexity and integration. But the challenge goes well beyond just getting these systems to function properly. They also have to fail gracefully and according to plan. And all of this has to happen over extended periods of time, or in the case of vehicles as a service, for almost continuous use over shorter periods.

“The current state of the art is acquiring data,” said George Zafiropoulos, vice president of solutions marketing at National Instruments. “The next challenge is figuring out what big data analytics allows you to do with that data. Can you find something useful that you’re not necessarily looking for? This is the next wave that’s coming—intelligently pointing out trends and correlations. As with every discipline, there is more data than people need to analyze. You’re looking for insights into that data.”


Fig. 1: Testing ADAS Systems. Source: National Instruments

Zafiropoulos said this requires a very different approach to design. “As engineers, we guard-band around design. But if you guard-band everything, you stack up inefficiencies. If you can decrease guard-banding with certainty of reliability and performance, that would add huge value.”

Improving reliability
The important term here is reliability, which is a measure of quality over time. While this sounds straightforward enough in theory, the reality is that ADAS involves a lot of different components. As such, it needs to be simulated and tested as a system, and that data needs to be considered from a very high level.

What can cause a failure in these systems, even for a fraction of a second? The answer could be anything from insufficient experience in handling certain situations on the software side, to power integrity, electromigration, thermal, stress, electro-magnetic compliance, and electrostatic discharge on the hardware side.

Thermal reliability of ADAS chip-package-system (CPS) is considered mission-critical because those systems need to last more than 10 years under often hostile thermal environments, where the temperature under the hood can reach as high as 150°C. This requires multi-physics tools and approaches to manage heat, which can exacerbate electromigration, and thermal-induced stress analysis of a chip-package-system implemented in an advanced package, such as an ADAS AI system, said Vic Kulkarni, vice president and chief strategist for the semiconductor business unit at ANSYS.


Fig. 2: Thermal analysis for chip and package co-simulation. Source: ANSYS

The number of electronic components to support these advanced technologies has increased dramatically. Among the issues that now need to be considered are high-power, advanced packaging for AI chips—basically the central brain of an autonomous vehicle—and all the communications and data collection systems that feed into those AI systems. ADAS has very high demand for thermal reliability, as typical operating temperatures range from -40°C to 50°C, while certain ADAS and power management systems under the hood can be subject to device junction temperatures as high as 135°C to 150°C.

“Thermal reliability presents the toughest challenge because electronics systems under the hood cannot exceed the maximum operating temperature and are highly vulnerable to thermal-induced stress, and thermally aware electromigration effects,” Kulkarni said.

In order to be a reliable solution in the ADAS market, and to ensure the safety of the vehicle and its occupants, products must meet the following requirements, according to Andrew Klaus, director of automotive business development and architecture at Marvell,:

  • The data streaming through the vehicle must be secure, and systems must be designed to prevent malicious attacks or compromises to the data. A completely secure Ethernet switch is paramount.
  • The semiconductor components must pass rigorous quality and long-term reliability tests, such as AEC-Q100 and even more stringent tests specified by OEMs and Tier 1s.
  • The communication signals must be robust against the high-noise environment of the vehicle. This requires tests such as bulk current injection, transient noise, strip-line, and other immunity and emission testing specified in CISPR, ISO, IEC, as well as additional tests specified at each OEM.
  • The communications between critical electronic control units (ECUs) must meet an extremely low bit-error rate (BER) even under adverse conditions, which may require the use of forward error-correction schemes such as Reed-Solomon.
  • Safety critical systems must pass ASIL requirements, which may include redundancy.
  • Data must be guaranteed to arrive at the processing ECU when required, which may include features such as latency control, time synchronization, traffic shaping, bandwidth guarantee, and predictability as specified in the time-sensitive network standards of IEEE 802.1.

Package effects
Advanced packaging is a complicating factor. Fan-out wafer-level packaging is being used in ADAS systems to shrink the footprint, lower power and improve performance. 2.5D packages based on interposers, and 3D-ICs with through silicon vias scattered around the chip, provide other options for improving the performance of ADAS systems. While all of these have been used commercially, none has been extensively tested under harsh automotive conditions for extended periods of time.


Figure 3: FOWLP package structure for ADAS with one chip. An ADAS system can include multiple chips, stacked vertically or side-by-side in a package. Source: ANSYS

Field testing is necessary for electronics in ADAS applications, but it is time consuming. Morevoer, it doesn’t always uncover all of the potential failure mechanisms in such complex systems, which can vary by process node, by manufacturing run at a single node, and even by logistics handling during or after manufacturing.

“A key area of contribution relates to the specific silicon implementation and the associated process node,” said Neil Stroud, director of advanced technology marketing at ARM. “Any system can experience faults that may be either permanent or transient. In an ADAS system, where the electronics are taking responsibility for driver safety, it is critical that these faults can be detected and appropriate action taken to avoid a dangerous situation occurring, whether it is for cameras, sensor fusion or actuation.”

This is easier said than done, however. The mad dash toward autonomous driving adds yet another level of uncertainty. ADAS is a collection of brand new and existing technologies that will be used in a safety-critical consumer market, which is new. Almost all safety-critical markets—mil/aero, medical and industrial—are well outside of the consumer world, and the procedures and speed at which they move are radically different.

“The military and aerospace world is a lot slower, but you have to have everything documented, such as processes and procedures,” said Jon Sinskie, executive vice president at Astronics. “It’s becoming the same thing in the semiconductor space, only semiconductors move a lot faster. The ability to customize on the fly, but also to document it as we go, is where those two worlds come together. That needs to be done quickly and on a repeatable basis. So we may have 70 tools in the field, and 30 have changes to them. That requires logistics management, and you have to provide that in the military space. You need to have 20-year logistics changes all around the world.”

That requires a totally different way of approaching these issues.

“Is your manufacturing process good enough that the vast majority of what you’re shipping is good, or you do have enough of a quality firewall such that while it might not be the most perfect manufacturing environment in the world, all of the parts that are going to be problematic can be screened out,” said David Park, vice president, worldwide marketing at Optimal+. “This is where big data analytics come in to make ensure the quality of the devices, and that they will function the way the designers and the company expects.”

Every step counts
While problems can be discovered after the fact, the real challenge is avoiding them in the first place.

“When you’re talking about reliability in ADAS, that’s ISO 26262,” said Park. “Reliability is less of a manufacturing issue than a design issue. Electronics tend to be binary. They either work or they don’t work. When you have hydro-mechanical types of systems like steering, for example, in an older car, if you let the power steering fluid get low it made a horrible screeching noise when you turned the steering wheel. But it’s not like all of a sudden the steering stopped working and you crashed. You can be the kind of person that ignores the screeching in their steering wheel, but there were audible and other feedback clues that something wasn’t right with the steering. In current cars, it’s all drive-by-wire. You turn the steering wheel but you’re turning it against some sort of sensor that’s picking up, ‘Oh, you’re saying I need to turn the wheel,’ and the car turns the wheels. There’s no connection between your brake pedal and the brakes, your gas pedal and the throttle on your engine. Those don’t exist anymore. Like airplanes, it’s all fly-by-wire. Reliability comes in when you just can’t have it like your computer giving you the blue screen of death. When you’re driving a car that’s drive-by-wire, that’s an unacceptable outcome. Someone’s going to die.”

Design teams today are trying to build electronic devices such that if something fails catastrophically, the system somehow allows the driver to get the car to a safe space so it can be repaired.

“The concerns on the manufacturing side includes making sure there is nothing being shipped that doesn’t meet the quality requirements of the semiconductor vendor, which is very different than the reliability,” he noted. “If someone builds a chip that’s inherently unreliable, such as it’s supposed to last for 10 years and realistically only last for 3 years, the manufacturing test process said, ‘You passed all the tests, so you’re good, it functions.’ But reliability is an add-on to that, which is why burn-in test is done to make sure the chip lasts as long as the expected lifetime.”

To be sure, reliability is a broad term because it includes both hardware and software, noted Maximilian Odendahl, CEO of Silexica. “For example, there is lock-step mode in hardware where several cores are doing the same thing in functional safety for powertrain has been done for a long time. Now we’re talking about adding a lot of fuzzing to make it safe and reliable versus hackers while engineering a chip.”

On the software side, reliability equates to how much the design can be trusted. “How much can I trust my software? How reliable is it going to be in terms of the variance? How much dynamic behavior is in my software?” The main problem there, Odendahl asserted, is that everything used to be static. “I defined the scheduling statically, I had static code generators, everything is certified. The goal was pretty easy. Now, with ADAS or deep learning, this static behavior comes to a halt just because I need all the performance. If everything is static, if I can pre-calculate everything, that’s a pretty simple algorithm. Now that everything is moving to more dynamic behavior, there is a huge fear about how to control this dynamic behavior. Do I actually know how much dynamic behavior I have? Is it completely dynamic? If it is, then it is not deterministic any more. If it’s not deterministic any more, how can I make sure it is reliable? How can I make sure it still works in all the scenarios, and all the different use cases?”

Standards do help. Accompanying the classic version of AUTOSAR, there is a new standard emerging in Europe, Adaptive AUTOSAR, because the original doesn’t scale to ADAS applications.

“For applications that have a lot of dynamic behavior and need all the performance they can get, the core problem is that they want the performance and dynamic behavior, but they don’t know how to control it and there is no way to debug it,” he explained. “I write it once and wonder if it is a realistic scenario or was I just lucky? Now, I get into all those problems of ‘regular’ computing—not this very specific old-school automotive computing, but regular C++ programming, which is the whole point of going to this new standard. We’re at the point of everything being solved, but now we get all this complex behavior that cannot be controlled. No matter if I talk to the OEMs or the Tier 1s, it’s a huge risk, so what’s the answer? Either I can control it by tools or they go back to doing everything static, which is exactly the point of why they created this new standard.”

As such, the very definition of reliability has exploded because the entire problem exploded. Odendahl said. “It’s not just that I have my fixed power train and software, and it’s not static. It means there are so many new possibilities, and so many more scenarios. Some crank’s handle going up and down is very different from some ADAS algorithm in 1 billion different scenarios, and it needs to work in all of them. This means the software changes rapidly, and needs to deal with it, which means the reliability and uncertainty is just getting bigger.”

At the same time, much depends on the details of the application. Geoff Tate, CEO of Flex Logix, pointed out there are different levels of safety requirements, depending on the part of the system. “[OEMs and Tier Ones] are looking for the technical solution—how do they process the data to achieve the objective they are looking for within a certain timeframe at a certain cost? This must then be overlaid with the safety requirements. What is the cost of the mistake? The higher the cost of a mistake, the higher the safety level is. In automotive applications, generally the perception is that automotive engineering teams are used to using commercial applications and then adjusting their architectures to duplicate or triplicate functions in order to achieve the safety level that they need.”

There’s also a lot riding on the response time of the system, Tate said. “One approach that can be used, which goes back to the moon flights a long time ago, is to implement three computing units with a box that compares the outputs. They vote, and if one of them has an error, presumably it’s the wrong one, and the two that have the similar outputs are the correct ones and you use the majority output. We’ve also seen applications where there are two units and they vote. If they vote the same you can assume with a very high confidence level that there has not been an error. If there is a mismatch, then one of them is wrong obviously, but you don’t know which one so you have to flush the pipeline, reset, and reprocess, which is acceptable in some cases—but not in others, depending on safety levels and response time. If you’re doing something that’s super-real-time, like a braking system, or a system that must make decisions about which obstacle to hit, you might have to use the triplicate approach. You don’t have the time to recompute.”

Conclusion
Looking more all the time like the crossover point between consumer and safety critical markets, there are many pieces to this incredibly complicated ADAS puzzle. From the silicon and software design to the verification, data management, system validation, adherence to standards and manufacturing quality control — all of it has to happen correctly the first time. These are immense system challenges that also bring huge opportunities for the technology providers that get it right.

—Ed Sperling contributed to this report.

Related Stories
Auto Suppliers: More Than Chips
Not all suppliers to the automotive industry are providing semiconductors and system-level products.
Tech Talk: ISO 26262
What’s new in the automotive standard and how to design cars that can fail safely.
Advanced Packaging Moves To Cars
Supply chain shake-up as OEMs look to fan-outs and systems in package for differentiation and faster time to market.
Tech Talk: DO-254
A look at the safety-critical standard for aerospace and what it means for automotive electronics.
Automotive’s Unsung Technology
Audio technology is making big strides alongside autonomous vehicles and vehicle-to-infrastructure communication.



1 comments

Srikanth says:

Nice overview! Speaking from the biz side, reliability is largely Operations’ problem. Symptom is RMAs, leading to lower effective yield. Functional Safety, that this article does a great job of reviewing, is everyone’s problem. Symptom is usually lawsuits leading to .. well, lots of bad things 🙂

ADAS specifically, is where the system complexity is increasing geometrically, so we have a double whammy of the highest ASILs demanded of the most complex code running on the largest SoCs in the (Automotive) town. Here is to ISO26262 as our white knight!

Leave a Reply


(Note: This name will be displayed publicly)