Improving Reliability In Automobiles

Focus turns toward predictive and preventive maintenance, with an emphasis on resiliency and recovery rather than maintenance schedules.


Carmakers are turning to predictive and preventive maintenance to improve the safety and reliability of increasingly electrified vehicles, setting the stage for more internal and external sensors, and more intelligence to interpret and react to the data generated by those sensors.

The number of chips inside of vehicles has been steadily rising, regardless of whether they are powered by electric motors or internal combustion engines, as carmakers replace or supplement mechanical parts with electronics. This includes everything from electronic control units (ECUs) to wireless and networking circuitry, more batteries with sophisticated power management, and many more components devoted to safety and diagnostics. The challenge now is to utilize that data more effectively.

“The concept of ‘safe state’ is critical,” said Frank Schirrmeister, vice president for solutions and business development at Arteris IP. “A sensor that measures tire pressure, for instance, can guide the driver to check if and when variation from the norm is significant. For preventive maintenance without human intervention, there are questions about whether predictive maintenance concepts like digital twins also can be applied here, and what is the respective safe state of the electronic system.”

At the ECU level, end-to-end virtualization can open the door to integrating cross-domain vehicle functions for software-defined vehicles. Virtual ECUs only have access to, and control of, specific aspects of the electronics. They are electronically isolated from other virtual ECUs, providing independent responses to faults.

“If and when a fault occurs, the system can notify the virtual ECU software,” Schirrmeister said. “It also can cause a reset of the virtual ECU without affecting other areas of the system and report the failure of the virtual ECU externally. From here, these concepts can extend further into predictive maintenance.”

Predictive maintenance relies on understanding how an ECU should behave, and when or how often it is no longer functioning within the acceptable parameters. Preventive maintenance, meanwhile, typically requires maintenance based on the odometer’s mileage reading. But the goal is to push preventive maintenance from a passive, schedule-driven service to on-board resilience, where possible, either with a software fix or some sort of automated fail-over, similar to what error-correcting code is used for in memory.

“The larger ECUs and ICs at the heart of ADAS, autonomous driving, and electric vehicle functions address tasks requiring large complex processing,” according to Charles Battikha, functional safety consultant at Siemens EDA Consulting & Learning Services. “Volatile memory and flash devices are susceptible to aging effects. They also have complex interfaces, such as Ethernet. Larger ECUs and ICs fall under the ISO 26262 purview for managing functional safety, and thus already have hardware and software to detect errors due to random hardware faults, including Error Correction Codes (ECC) and retry techniques to detect and manage correctable errors. For instance, ECC can correct a 1-bit error in memory. The Ethernet protocol allows for retry if a packet is corrupted. It’s a simple step to log and track the rate of these correctable errors to predict when an ECU may need replacing, which potentially could prevent failure during operation. Most of these ECUs and ICs are expensive to replace, setting a barrier for swap. However, now that vehicles have OTA updates, collecting data across large fleets to create correlation between correctable error rates and eventual hard failures is easier.”

More complexity
At the same time, these ECUs and ICs require substantial power and generate large amounts of heat, so they need proper airflow, which includes heatsinks, fans, and even liquid cooling.

“Improper heat removal impacts the lifespan and could result in a hard failure during operation,” Battikha said. “Required preventive maintenance will help ensure ECUs free of dirt/dust, air flow free of blockages, and tight connections. Preventive maintenance steps could include vacuuming, cleaning, resetting/recalibrating connections of heatsinks, visual inspections of connectors and cables for corrosion, etc.”

That will require significant changes in the electronic content of vehicles. According to CALSTART, a 300-member, nonprofit consortium whose mission is to help build a clean transportation industry, about 70% of the components in EVs will differ from those of internal combustion engine vehicles. Electric motors, controllers, batteries, and chargers will replace components such as combustion engines, gas tanks, carburetors, smog control, starters, exhaust systems, generators, and gas and oil pumps.

Preventive maintenance also will change as ICE vehicles are supplanted by EVs. For example, EVs don’t require engine oil changes or component replacement, and changing the transmission fluid may differ, depending on the design. EVs with single-gear motors do not require transmission fluid changes, while the higher-performance models with two-gear motors do.

“These are redesigned from the ground up,” said Lars Ullrich, vice president of Automotive Americas at Infineon. “The traditional car had miles of wires. We’re transitioning to a kind of computer on wheels. With electrification comes new opportunities for decentralized compute architectures, new secure connections, and the ability to look at it from a systems perspective.”

Modeling reliability on industry
Fortunately, there is some historical precedent for using sensors to improve reliability. Industry 4.0 uses the connectivity of the IIoT, as well as AI and lots of sensors to predict when a particular system or component needs to be serviced. One of the key goals of Industry 4.0 is to improve uptime, and that is heavily dependent on predictive and preventive maintenance.

By installing multiple wired or wireless sensors on production equipment with 24/7 AI monitoring, factory operators have been able to keep equipment running around the clock. AI systems predict when equipment is about to break down. For example, sensors can now detect equipment temperature changes while the machinery is running. If a motor is overheating, exceeding its normal temperature profile, it is an indication that it is about to break down. In addition, these systems can automatically schedule a maintenance team to replace components and even order the necessary parts well ahead of a failure.

Much of this learning can be applied to the automotive world, where AI can be used to monitor the core functions and schedule maintenance as needed, rather than according to a fixed schedule.

Amol Borkar, director of product management, marketing and business development for Tensilica Vision and AI DSPs at Cadence, said this data can be collected from sensors specifically designed to detect vibration, gas, moisture, temperature, pressure and rotation/speed. “For example, with a vibration sensor, is this unit vibrating outside a range of desirable frequency? Is the vibration on the outside or in the individual components? With a moisture/humidity sensor, if moisture is present or absent, is it within a tolerable range? Then, for a security sensor, has the unit been opened or sealed improperly?”

Having sufficient sensors that are strategically floor-planned into a system is step one. The next challenge is to make sense of all of the data collected by those sensors.

“Using AI/ML to do predictive maintenance has been around for some time,” Borkar said. “Like any AI/ML problem, if you want your system to run reliably, you need to train it with lots of data. Usually, predictive maintenance cannot be accomplished by looking at a single data point in time. Rather, one needs to process sequences of data to extract patterns and predict possible points of failure from the training data. Therefore, to accomplish this task, typically recurrent neural networks (RNNs) — or more specifically, long short-term memory (LSTM)-based networks — are used because they are great for processing data sequences and time series.”

For electric motors, sensors can be added to monitor temperature, noise, vibration, change of torque power, and other processes. Similarly for batteries, voltage, charging time, intermittent voltage drop, and other unusual behavior can be monitored.

Sensors also can be used to monitor tampering of software running in a vehicle, or whether an abnormal amount of data is being generated by one or more components.

“There are a lot of techniques available today such as secure boot and metered boot that would allow car makers to detect when something is not right security-wise with a vehicle in the field,” said Maarten Bron, managing director of Riscure. “Device attestation is one such technique, which enables the relying party such as the vehicle’s OEM to be sure of the trustworthiness of the information they receive back from their vehicles.”

There are many use cases today in the mobile industry, for instance, where device attestation techniques are used to avoid fraud in mobile payment scenarios. “What makes the automotive use case so special is the fine line between safety and security,” Bron said. “Whereas in a payments scenario, the expected response to fraud would be to disconnect or disable the payments device remotely. For a vehicle, such action potentially could put the safety of the passengers and other users of the road at risk. So even though the attacker model could be similar, the response to an attack is probably not.”

Given that EVs today contain more than 100 distributed ECUs, and over time some of these will be replaced by a centralized controller ECU, how can the system be designed to prevent these electronic components from failing? Even though most of the chips used in EVs meet automotive-grade requirements, there is no guarantee they won’t fail.

One approach is to add redundancy to the design. For example, in one ECU, there may be two identical controller chips that share the workload while operation is normal. However, if one chip is acting abnormally, most of the critical workload can be shifted to the functional chip, or the functional chip can completely shut down the questionable chip. This approach can potentially eliminate the risk of critical component failure.

“In driving, real-time safety is important,” said Robert Day, director of automotive partnerships for Arm‘s Automotive Line of Business. “For example, in an accident, airbag malfunction is unacceptable. To increase CPU, SoC, or ASIC reliability, a dual-core lockstep design can be deployed. These two processors will be working side-by-side and comparing results real-time, as in the case of Cortex-A78AE. Depending on the operation, the core also can leverage its ‘split-lock’ capability, which can split the lock-stepped cores to perform different functions. If one core fails, the other core could take over preventing a dangerous situation from happening.”

Systems also need to be partitioned between emergency and non-emergency functions. Electric motor failure would be considered an emergency and should be avoided at all costs. On the other hand, when one of the headlights fails, it’s not an emergency if the other one is still working.

The road ahead
Not all of the pieces are in place for automated recovery. For example, while more sensors are being deployed throughout a vehicle, what will prevent these sensors from failing? And while self-healing components are being researched and tested, they are still years away from commercial use.

As technology rapidly changes, reliability remains a challenge. According to Consumer Reports’ 2021 study, out of 17 most reliable car categories, electric SUVs were found to be least reliable, with defective rates higher than most internal combustion-powered vehicles, partly due to the fact that EVs were relatively new and manufacturers were still working through the learning curve.

EVs also are limited in driving range, which means they need to be connected to public chargers. But those chargers also can be used for cyber risks. Pen Test Partners tested several different types of public chargers, including Project EV/ATESS/Shenzen Growatt, Wallbox, EVBox, EO Hub and EO mini pro 2, Rolec, and Hypervolt. All but one were found to have security vulnerabilities, thus highlighting the risk of exposing EVs to cyberattacks using public chargers as the back door.

However, the vehicles themselves face roughly the same threats as regular gasoline-powered automobiles. “There are various components of vehicles to consider when thinking about potential threat vectors, the first of which is the vehicle’s connection to external networks,” said Bart Stevens, senior director of product marketing for security IP at Rambus. “When vehicles are connected to a precarious cellular network for telematics or Wi-Fi for entertainment purposes, the network connection can be hacked and exploited, giving a cybercriminal access to vehicle electronics. Steering and braking systems could be vulnerable.”

In addition, automotive networks often lack confidentiality. As a result, an attacker could reverse engineer ECU messages to impersonate other internal devices.

“Attackers also can exploit vulnerable diagnostic ports, such as onboard diagnostics (OBD) and OBD-II ports, to install unauthorized configurations or malicious software updates,” Stevens explained. “Further, wireless key fob interactions are typically fairly simple and can be spoofed by hackers using a phone application, making it easy for attackers to lock and unlock vehicle doors as they please.”

These vulnerabilities are potentially present in both EV and internal combustion engine vehicles.

“While EVs have additional systems in place for battery and motor management, the EV charging port is another accessible entry point into an EV,” he said. “Public chargers such as Electric Vehicle Supply Equipment (EVSE) typically use the Open Charge Point Protocol (OCPP) and could be manipulated into charging fraud through vehicle impersonation. The ability to charge the vehicle could be disrupted or aborted. These charging connections are using proper encryption techniques, but if keys are not managed correctly and securely, these links could become vulnerable. Once compromised, data can be harvested, or the electric vehicle’s internet activity can be monitored.”

What’s next
Much work still needs to be done before vehicles are fully autonomous.

“’Fail Operational’ systems can continue to function in the presence of a failure,” said Ken Boorom, functional safety consultant at Siemens EDA Consulting & Learning Services. “These systems have fault tolerance built into them. Fail operational is absolutely required for Level 5 ADAS systems, so considerable work already has gone toward developing systems that can perform some type of self-repair. Electronic systems that can heal themselves at the device level, for example, by repairing a break in a metal trace, are currently not commercially feasible. Instead, fault tolerance is achieved through redundancy by duplicating design circuitry, developing failure detection methods, and swapping circuitry in the event of a failure. This approach can come at a cost, as it may require fabricating extra circuitry.

Homogenous redundancy — adding a resource that is duplicated but can also be used during operation — is one way to reduce this cost. For example, adding the ability to “lock out” portions of bad memory at the software level allows the system to keep operating even in the presence of a memory failure, without having to duplicate all the memory. A similar approach is used in disk drives to lock out bad blocks. The approach can even be used in systems with multiple CPU cores. For example, in an 8-core system, if one core fails, the others may be able to keep the system operational until the vehicle can be serviced.

One should assume that all circuits and electronics can be broken and will break. What is more likely is that higher levels of redundancy will be more available. So as costs decrease and efficiencies improve, adding triple-level redundancy will become more common. Currently, logic redundancy is used to protect IC internals (such as lock-step CPUs), and functional redundancy is used to protect systems (such as lidar and vVision – both responsible for object detection). This level of redundancy assumes failures mean loss of function. At triple (and higher) redundancy, the complete function/operation is maintained, even with the fault.”

While EVs are the transportation of the future, many questions remain. For example, how secure are OTA updates, and who will have ownership of the data that is generated by the hundreds or thousands of sensors in a vehicle. As more and more sensors are used to monitor everything inside the EVs, what impact will that have on the risk of failures? And finally, as more and more AI is deployed in EVs, who will determine how that AI should perform and whether the AI itself is reliable?

So far, there are no definitive answers to these questions. But over the next decade, all of this will need to be well understood for the automotive chip industry to make progress on reducing accidents and improving vehicle reliability.

Leave a Reply

(Note: This name will be displayed publicly)