Degradation in older chips can create new vulnerabilities; dealing with them involves some complex tradeoffs.
The longer a piece of silicon is out in the field the more prone it becomes to a cyberattack, raising questions about the optimal longevity of circuits and the impact of extending their lifetimes.
This is particularly challenging for safety- and mission-critical applications, where the cost of development can run as high as $100 million for some of the most complex designs. Chipmakers want to be able to amortize that investment over longer lifetimes. But as devices become more complex and heterogeneous, that needs to be analyzed in the context of the overall cost of ownership, which increasingly includes security.
“As chips age, you slowly erode the margin between the definition of a working device and a failing device,” said Lee Harrison, director of automotive IC solutions at Siemens Digital Industries Software. “Typically, you have the process of aging that takes you from where you are at time zero, and as the chip slowly ages you get to a point where things like timing margins slowly erode. A lot of the function of the device being secure is being resilient to attacks, especially side-channel attacks. The closer you get to a failing device, the more vulnerable it becomes, and the greater the chance of a side channel attack actually having an impact on the functionality of the device.”
That increased vulnerability is the result of a couple of factors. The first is obvious and has little to do with any technical aspects. The longer a device is in the field, the more time attackers have to find vulnerabilities and exploit them. This is particularly true for chips that are accessible to hackers, such as those in a vehicle.
On top of that, inaccessible hardware is increasingly vulnerable as more devices are connected to each other and to the internet. With multi-die devices, this can include connections within a package as well as over the air.
And to make matters worse, margins for resiliency on temperature, voltage, and other aspects weaken as circuits degrade. The solution is to add more granular monitoring and a design-for-security approach, but that also requires sharing of data across the entire semiconductor and electronic system supply chain, and many companies are unwilling to share data that might be seen by their competitors.
Temporal threats
On the flip side, there is no shortage of hackers. Cyberattacks on electronics are big business for criminal organizations, state and corporate hackers, criminal organizations, and even some independent programmers. Many of them share a deep understanding of technology and the ingenuity and persistence to break into a device, whether that is through the hardware to the software, or the other way around. And given enough time and compute resources, any design can be hacked, with the electronic blueprints sold for cryptocurrency on the dark web.
“With any type of aging technology over time, hacks and vulnerabilities get found, whether deliberately or even stumbled upon accidentally,” said Simon Rance, general manager and business unit leader for process data and management at Keysight. “There’s usually always access points, either from a hardware perspective, and it’s usually ports — buses, for example — as well as software, which is often more vulnerable. You can hack not just the software, but the access to the hardware through the software and the firmware. That is a challenge, because as devices age, their vulnerabilities will always come out. They’re very openly shared, even on sites like Reddit.”
Defending against future attackers is a challenge with a central problem. The hardware will remain the same, while the attackers’ approaches will evolve. They have plenty of time to detect vulnerabilities, but designers cannot go back and fix those vulnerabilities.
“It’s not an easy challenge to solve,” said Rance. “It’s costly to predict and prevent this upfront, and so it’s usually hardware that’s going to go into controlled systems, where security is of utmost importance that they’ll invest in. If it’s hardware, like secure buses on a chip, the bus data may be encrypted from the CPU or the processor. But when you start getting into those types of architectural choices and decisions, the cost of the chip goes up.”
Where to look
Even if most circuits can eventually be hacked in some way, that’s not to say that every one of them is equally vulnerable. Different types of semiconductors, as well as different components of SoCs, can be more open to attack as they get older. According to a 2024 paper by Virginia Tech researchers, SRAM showed gradual analog domain-level changes over time, a condition that could be simulated by an attacker.
“These devices are well verified for power, performance, area, and thermal these days,” said Rance. “Those who design them know that if the thermal, for example, does go slightly out of range, that impacts the system functionality and system application, and it can be exploited. If you heat something up to a certain point, it will make other things shut down. Or, you can freeze it, and that often gives you back-door access to certain things. Power is the other one in timing. Having different types of aspects, whether it’s electromagnetic pulses or any of that type of stuff, can throw timing off for data that’s been synchronized either within the device or from one device to another. It’s not necessarily that it exposes a danger or problem, but it can throw the information out of whack to the point where it doesn’t perform as intended.”
The vulnerability level can ramp up at advanced nodes, where thin films and wires are more susceptible to aging effects such as time-dependent dielectric breakdown and electromigration. At the most advanced nodes, security requires aggressive attention to clock speed and localized temperatures. This is why safety critical circuits, such as some of those found in automotive, tend to be on larger, older nodes.
“The size of the transistor has to be bigger, and the size of the metal interconnect has to be bigger,” said Harrison. “It’s a tradeoff between smaller geometry processes and margin. “The more advanced nodes are always going to be easier to attack, because just the technology itself is more vulnerable. As you go to smaller nodes, you have to be more aware of that, and basically factor that into your physical design.”
Sitting ducks?
Designs that don’t change over time may be no more secure than those that are constantly patched and updated. This is a plus for programmable logic, which can be updated in the field. In contrast, ASICs are “as good as it gets on day one,” said Mike Borza, a scientist at Synopsys. “Everything that you can do to repair any security defects that are discovered is going to be built around software. So in a sense, as attacks get better and as the attackers learn more about a system, the number of ways they might be able to exploit it just increases over time.”
Degradation plays a role here. Scott Best, technical director at Rambus, said there are three kinds of degradation. Electromigration is widely understood, and can be accounted for by not exceeding the maximums in power supply and signal traces, or by implementing side channel countermeasures such as random masking. However, other kinds of degradation, such as those affecting pFETs and nFETs, can accumulate charges that affect threshold voltages.
“This doesn’t seem terrifically important at first, but a lot of these designs are dialed in within an inch of their lives, or within a micron of their lives,” said Best. “If you do have a probe, it feels like a process shift.”
Those shifts are not felt equally across all components. A 2018 paper from a handful of universities and imec, found that workloads had a major effect on degradation of components, which in turn made them vulnerable to attack. While memory was particularly at risk, Borza said it’s hardly the only aspect that becomes increasingly open to attack over time.
“Memories are one of the main things that do age, but the other things that age are various kinds of oscillators and even the data paths,” he said. “The time it takes to transit a data path changes. These are not large changes, and sometimes they’re very, very small changes, but they’re large enough that you can detect changes in them over time. Anytime you have that ability, there’s an opportunity that someone might find to exploit that. “
Other risk factors
Peter Laackmann, senior vice president of security at Infineon, observed that aging in unprotected chips can simulate the effects of techniques used to break low-end security on standard microcontrollers.
“One of the most important threats against non-secured, but also so-called ‘security-hardened’ microcontrollers, is the group of ‘fault induction attacks,’” Laackmann said. “Utilizing such an attack, an adversary is trying to induce errors in data processing, storage, or transport inside a chip. If a security chip would perform an erroneous calculation or data retrieval, an attacker could circumvent access rights. Furthermore, if the attacker would succeed in modifying a cryptographic calculation at the right time, secret private keys could be compromised.”
The vulnerabilities inherent in SoCs extend to chiplets. But there is an additional threat, due to the integration of multiple dies in a system.
“The thing that a chiplet package design exposes you to is that, physically, it’s larger, and you have ‘on’ and ‘off’ die interconnects to hook the chiplets together,” said Borza. “Whether that’s in a vertical stack, or whether it’s on a planar architecture, or some combination of them, you have the possibility that somebody is able to get a probe into the interconnect level and be able to intercept or modify data that’s moving between chiplets. That’s what the increased risk or increased exposure is in that kind of design, but the underlying principles remain the same.”
Design for security
While aging is unavoidable, that doesn’t mean it can’t be accounted for. As Infineon’s Laackman observed, microcontrollers from the 1990s often relied on sensors to detect improper environmental conditions, such as unusual supply voltages, temperatures, clock frequencies, or laser irradiation.
Advances in the technology available to attackers today requires that sensor implementations must be even more robust.
“The sort of things you need to look for when you do your monitoring of the silicon is, if you have monitors that are slowly seeing the degradation of the silicon, that’s acceptable. That’s the profile that you would expect to see,” said Harrison. “But then, all of a sudden, if you were to see an anomaly or a spike, you know that’s not part of the natural aging process. It’s an attack on the actual silicon itself. By monitoring the general aging process of the silicon, you also can put in place the ability to monitor for these certain amount of side channel attacks.”
Functional monitoring is also a key security measure against a hacker’s time advantage. Because the monitors are configurable by software, Borza noted that designers are able to update their profiles and alter the type of data being collected, and to watch for new types of attacks that develop over time.
As Laackman put it, architectures that prioritize security must be included as the “inner foundation” of certified security controllers. “As a countermeasure against physical attacks, but also against potential threats due to natural or accelerated aging, modern certified security microcontrollers like those offered by Infineon must be built on the principle of design-for-security. In terms of security, this means that no matter where a physical effect comes from, the chip’s task is to detect errors and initiate countermeasures or alarms. Modern certified security controllers typically utilize hardware crypto accelerators, comprising internal data masking and heavy use of randomization features, as well as internal software that carries out the cryptographic implementation. These measures take care that highly effective barriers against side-channel analysis methods exist to efficiently protect the chip against attacks, including accelerated and natural aging.”
Another approach is to build in triple modular redundancy, which can measure and compare glitches.
“TMR, or just doing just the calculation redundancy is, in general, a way of mitigating those single event upsets that are part of RAD hard by design,” said Rambus’ Best. “This is really important not just in satellites, but in the terrestrial automotive space, as well. There are some safety-critical systems inside automotive that require lockstep cores — functional redundancy side-by-side. You perform the same calculation in lockstep in two different processor cores, and they’re monitoring each other the whole time to make sure that they remain in lockstep. Now, if an adversary is trying to use aging effects to attack your chip, it turns out that these redundancies help mitigate those attacks because now your adversary has to make their attacks as redundant as the defenses were, and that can be surprisingly hard to do.”
Security in the margins
Part of designing for security is understanding how attackers work. As Harrison explained, side channels are made possible by stressing a device or altering an environmental parameter to jolt it out of its stable state. This can be done by techniques such as spiking voltage or temperature and watching to see if it grants access to security channels.
There are tradeoffs to designing for that kind of resiliency, however. “It’s obviously going to impact the overall functional timing of the device internally,” Harrison said. “If you’ve got a healthy margin, then you’re going to have to try really, really hard to attack the device to get it to a point where it will go into a state that’s unstable. But as it gets old, and you get to a point where you’re getting close to those margins, the device may be getting very, very close to failure. The timing between two functional flip flops in that timing path, for example, are getting longer and longer.”
Conclusion
As chips age, they become more open to attacks, particularly side-channel attacks. This is due to degradation over time, but also because as time in the field increases, hackers have more opportunities to find weaknesses.
A design-for-security approach that includes active monitoring for activity such as voltage and temperature spikes can help ward off these attacks, but it also may impact performance and power. So while it’s important to build realistic margins for temperature and voltage into a design, no device can ever be completely secure over its lifetime. The challenge is to find the right balance.
Related Reading
Auto Chip Aging Accelerates In Hot Climates
New data shows significant reduction in lifespan and potential new security issues as global temperatures rise.
Data Leakage Becoming Bigger Issue For Chipmakers
Increasing complexity, disaggregation, and continued feature shrinks add to problem; oversight is scant.
Edge Devices Require New Security Approaches
More attack points and more valuable data are driving new approaches and regulations.
Leave a Reply