Hidden Impacts Of Software Updates

Over-the-air code changes can stress systems in unexpected ways.


Over-the-air updates can reduce obsolescence over longer chip and system lifetimes, but those updates also can impact reliability, performance, and affect how various resources such as memory and various processing elements are used.

The connected world is very familiar with over-the-air (OTA) updates in smart phones and computers, where the software stack — firmware, operating systems, drivers, and applications — require frequent infusions of code for workarounds, added features, and new security threats. But in applications like industrial machinery, cars, and data centers, where devices are expected to last for much longer periods of time, those updates can stress different parts of a device or system in unexpected ways.

Unlike a smart phone, where a glitch from one update may require a quick follow-on update — and a series of updates in other software or firmware may be affected by those updates — there is far less room for error when it involves safety- or mission-critical functionality. Any update of multiple systems can further stress systems that already are working under extreme conditions, utilizing circuitry in ways that it was never designed to be used.

On the mechanical side, these systems may be stressed by vibration and occasional shocks, as well as wide swings in temperature. On the electrical side, circuits can be stressed by rapid inrush current from quick startups, by aging effects, and by multiple types of noise from a variety of sources. How over-the-air updates impact those systems needs to be well understood, and it may vary from one implementation to the next.

Still, these updates are essential to avoid — or at least postpone — physical replacement of electronic components. In automotive applications, vehicles may be on the road for a couple decades. During that time, the ability to interact with other vehicles and infrastructure will evolve, and so will protocols and standards.

Updates can change the fundamental behavior of individual circuits or entire systems. This is particularly evident with battery management, where aging (typically a measure of the number and speed of charges, rather than months or years) may reduce the time a battery can hold a charge. Apple has been offering a “performance management” option reducing overall performance for older iPhones in order to compensate for those aging effects. The same approach can be used in electrified vehicles, but instead of diminishing performance, the range per charge typically is reduced.

“Battery technology is the same sort of process, where the system collects data on how the batteries are performing,” said Lee Harrison, automotive test solutions manager at Siemens EDA. “As regular over-the-air updates come to the vehicle, they’re tweaking the battery management to give the best performance based on how the batteries are aging. We can do the same sort of thing with the other electronics in the vehicle. But that also relies on the assumption that these systems suffer from the same sort of aging effects. If it’s completely random, then there’s there’s not a great deal you can do with that data.”

Updates don’t work everywhere, though, and even in places where they do work, at some point new hardware still may be required.

“We did speak to one OEM that actually factored into the design cycle and lifecycle of the vehicle at least one hardware replacement to address any challenges that might crop up during the life of the vehicle,” Harrison said. “What we’ve tried to do with the embedded analytics technology is to make it as configurable as possible. So you can update it throughout the lifecycle of the vehicle, and hopefully we’ve made it flexible enough to address some of those emerging threats as they appear. We’re not going to catch all of them, but it’s flexible enough to do a pretty good job in that area.”

Planning for updates
The impact of updates may be felt well beyond an individual device or sub-system. They often impact other parts of the supply chain. All of that needs to be considered at the outset of any design.

“It starts with the architecture,” said Rob Aitken, R&D fellow at Arm. “You have to think about what actually needs to be present in a CPU, in the surrounding logic, in the I/Os, and so on. What actually has to be there in order to provide the data? What can you do with the data? What we ran into a lot in the IoT space was that if you’re going to do device management of some kind as part of your silicon lifecycle management, how do you do upgrades? How does software get updated? How does a device trust the software provider? How does the cloud service know to trust the device? There are a lot of problems and challenges all throughout this process.”

Fig. 2: Arm's containerization approach, now part of the Scalable Open Architecture for Embedded Edge (SOAFEE), helps isolate and minimize the impact of OTA updates. Source: Arm/soafee.io

Fig. 1: Arm’s containerization approach, now part of the Scalable Open Architecture for Embedded Edge (SOAFEE), helps isolate and minimize the impact of OTA updates. Source: Arm/soafee.io

In most mission- and safety critical applications, systems are connected to other systems. Typically, they need to be updated synchronously, which means updates must be extremely well thought out.

“If you look at automotive design automation, it’s not just the chipmaker,” said Steve Pateras, vice president of market and business development at Synopsys. “So you may be talking to the integrators, the Tier 1s, the OEMs, but it’s also the end users of those systems. You want to optimize performance over time. So the cone of opportunity just expands wider as you go down to the later lifecycle stages.”

But the larger and more diverse the supply chain, the greater the potential for data glitches. “That’s an issue, because you do want to share data across the lifecycle stages,” Pateras said. “If I have knowledge about wafer level test, or if I have design characterization information, I may want to use this in the field to understand trends. And likewise, if I get field failure information, like degradation of a signal over time, I want to be able to cross-correlate that with my original wafer data. There’s definitely a desire to feed data forward and backward.”

That data involves more than just a chip or a system in use in the field. It also involves the equipment used to make those chips, which also is undergoing regular OTA updates.

“There are lots of synergies at our adjacent spaces — design and test — and it gets harder and harder as you get further away” said Jay Rathert, senior director of strategic collaborations at KLA. “What’s causing your failures? Are tools in the right places? Are recipes doing the right thing? Is our data being used in the right way? The knee-jerk reaction of the industry has always been, ‘When in doubt, shut off the data flow and keep everything internal.’ But to get to the next level of what you’re trying to do, you have to start sharing some of this data because it needs to come all the way through the supply chain. And now that the supply chain is much tighter and much more integrated than it was, there are things that happen in design that process inspection would benefit from knowing. And there are things that happen in process that inspection would benefit from knowing, and further on through burn-in and SLT, all the way into the car and monitoring the data.”

In the case of automotive applications, a chip’s lifetime generally is a decade or more. For industrial equipment, it might be 25 years. Another challenge on the equipment side is that chips may need to be replicated chips years later, but the equipment also has to be current enough to work with other equipment in the fab.

“New designs fly off the shelves,” said Don Blair, business development manager at Advantest. “The difference in the automotive industry is they tend to stay on the shelves much, much longer. The life of a cell phone might be 18 months, and then they’re on to something totally different. In automotive, most of our customers require us to guarantee for 10 years the availability of the systems, the instruments, and our systems themselves, including everything they need to make the test cell work. So we have to always guarantee a 10-year availability. That’s one difference with automotive. They get new designs all the time, but they tend to stay on the books for quite a long time, too. The newer cars are getting design wins for the newer chips, but they’re also producing the same model cars for years using the older chips.”

Security updates
Security plays a big role in every aspect of a chip’s production, as well as in the field. As vulnerabilities are discovered, security updates need to be installed. And no matter how good the security today, it’s unlikely to be considered state-of-the-art security a decade down the road.

“The German government two years ago developed a label, which is voluntary in the beginning, that was first targeted for routers,” said Thomas Rosteck, division president at Infineon. “One of the conditions is that you update security over the router’s lifetime, or at least for a certain period of time. For security purposes, that’s super-necessary.”

Rosteck said this type of approach is likely to spread to other areas, as vendors begin pitching continuous security as a differentiator, and as standards are established for what is considered acceptable. This is particularly important in areas such as automotive and aerospace, where safety and security are tightly interwoven.

“Nowadays, we’re getting questions from government agencies, which are worried about their entire automotive population,” said Marc Witteman, CEO of Riscure. “Imagine that your country has an enemy that wants to paralyze all of your automobiles. That would be a disaster. Shops would run out of food. No public services would work anymore. That’s not just a consumer threat. It’s a systems threat.”

Other sectors are wrestling with these challenges, as well. “With banks we sometimes see them using old, insecure algorithms,” Witteman said. “The reason is there may be some people who haven’t updated through a browser, and they want to make sure those people can still access internet banking. There also are hackers who claim they can hack into cars with a piece of foil by wrapping it around an OTA antenna. That degrades an LTE signal to 2G, which is full of known problems. The reason it’s not being disabled by the carmaker is that you may be driving around in the back streets of some town where there is no LTE coverage. So those insecure algorithms are still around. Even though you have a technologically advanced mindset of always wanting to be on the bleeding edge of what security can offer, change is slow sometimes. There are people who don’t own the latest technology, and some carmakers want 100% coverage in the United States or elsewhere. This is why OTA is so interesting.”

Security is an ongoing concern, and many updates contain at least some security modifications to stay current with ever-changing cyber threats.

“Security is rarely a permanent state,” said Mark Knight, director of architecture product management at Arm. “A product manufactured in 2012 is unlikely to be secure in 2022 without maintenance, and a product manufactured and considered secure in 2022 may not be secure in 2032. A vital goal of a secure development lifecycle is to determine the appropriate response to foreseeable security threats, so that a product will be protected throughout the intended lifecycle. This involves understanding the likelihood and potential impact of a threat so that products can be positioned on the right part of a risk curve. Mitigations to security risks can take many forms — technical, compensating controls, or commercial measures. Penetration testing and evaluation by an experienced third-party or independent test lab are two of the best ways to be assured that a product is secure against the latest attack techniques and can therefore increase the product’s durability.”

Evolving products and processes
Reducing obsolescence and improving reliability is a winning market strategy, particularly for high-ticket items such as cars and appliances. But technology also builds on itself, and as more data becomes available from end devices in the field, it can be applied to new and existing devices.

“One of the reasons companies want data is to improve their product,” said Infineon’s Rosteck. “But there also is a value to the user if a product improves over time. If motor algorithms improve, they can be upgraded. You can download something that has immediate value to you as a consumer. Or your machine can call you or another machine and say it has a problem.”

Others agree. “People want data coming off the chip in order to understand how it’s aging,” said John Kibarian, CEO of PDF Solutions. “You’re starting to see the IP industry provide additional sensors. You need to be measuring a lot of things. But it’s not all that different from any big control system. If you have a big office building, you’re going to want to measure temperature and humidity and air quality. The same thing is happening for chips, because a chip needs to report back to the cloud because it has an ADAS chip and it’s aging quickly. That’s a big deal, and the first place we’re seeing sensor adoption is in mission-critical applications like ADAS.”

This potentially gets more complicated in advanced packages, where chips may share memories or I/Os or other resources. “In the automotive safety world, we’re seeing more of these designs with multiple die in a package,” said Chuck Carline, senior manager of factory applications for Precision Power & Analog at Teradyne. “That certainly has an impact on what’s tested on a wafer because you don’t have all of those nodes coming out of the package. They’re just interconnected. Some functionality can’t be tested once you have both chips together.”

That means chips in a package need to be monitored, and there at least needs to be a way to connect them to any updates that are required for reliability. And it adds yet more challenges for keeping everything in sync throughout the flow and into the field.

As more AI is included in devices, it adds yet another level of complexity in terms of updates, because AI algorithms are updated regularly. That includes everything from the logic used in a car to identify objects on the road, to the equipment used to ensure that chips are fully inspected.

“We have IP that we established many years ago that are feed-forward algorithms,” said Hector Lara, director and business manager at Bruker. “So as we’re scanning a structure, we can really learn the topography. If we see any repetition, we start applying that through some AI algorithms to speed up the scanning and maintain the accuracy that you would have in a very slow scan. If there are predictable repeating structures, we can speed things up even more. We apply some of that in AI, and some of the things we looking at are larger areas, using a combination of AFM and a profiler, all at the AFM resolution. But if you do that same thing again, you have to make sure you don’t crash the tip into a structure. We’re essentially navigating to an exact point.”

Put simply, precision counts. And for all equipment and processes in the supply chain and design-through-manufacturing flows, updates can impact that precision in unexpected ways.

Still, updates are essential in every process, in nearly every chip, and for every level of software that runs inside or on top of those chips. But OTA updates also can complicate the long-term reliability and performance of chips, and of other chips or systems that are in proximity or connected to whatever is being updated.

At older nodes, when there was limited connectivity and chips were largely design for sockets, this typically passed well under the radar. But as the expected lifetimes of electronic systems increases, and as more are tied to safety or mission-critical applications, getting this right is a becoming more complex and increasingly difficult.

Leave a Reply

(Note: This name will be displayed publicly)