Finding Security Holes In Hardware

Emphasis on performance, backward compatibility and system complexity are creating vulnerabilities that are difficult to fix.


At least three major security holes in processors were identified by Google’s Project Zero over the past year, with more expected to roll out in coming months. Now the question is what to do about them.

Since the beginning of the PC era, two requirements for hardware were backward compatibility and improvements in performance with each new version of processors. No one wants to replace their software every time they upgrade their hardware, and companies that tried to buck that trend never fared well.

But it turns out that security doesn’t age well, and what worked in the past may not work in the future. Security that was considered state-of-the-art five years ago may be broken in a matter of hours today using more powerful computers and information available all over the Internet. And adding in a level of abstraction with virtual machines, which can isolate one application from another in a multi-tenant cloud environment, is no longer a guarantee there won’t be breaches from one application into another.

These same vulnerabilities apply to the AI/machine learning world, as well, where the focus is on performance using a variety of accelerators, different memory architectures, and algorithms that are almost in a constant state of flux. Some of this hardware is brand new and untested. No one is sure how secure it will be over time, or what effect tuning hardware and software algorithms together will have on security.

“There are a huge number of security challenges with respect to the way AI is being used,” said Paul Kocher, an independent cryptographer and computer security expert. “That includes everything from poisoning of the training sets to adversarial input and unpredictable corner cases. The overall complexity of these algorithms that are effectively being created automatically adds a whole new set of issues on top of the problems we have with algorithms that are being created by smart people. There are lots and lots of security papers showing attacks against AI systems, accelerated or not. There have been a bunch over the past few years and there are going to be a whole lot more coming. It’s an area of very active research, and right now it’s a huge mess.”

Unlike in the past, where hardware software was adapted to hardware, new AI chips are being defined by the software. How that affects security remains to be seen, but the fact that the software is still in flux makes it harder to nail down what these systems ultimately will look like.

“There’s no killer strategy that addresses the issues, and a lot of the results are fairly ad hoc, which makes it hard to replicate,” said Kocher. “So it’s even hard to know whether a security strategy that worked for one thing and seemed to work in one academic setting actually works in the field.”

This is a looming issue in the non-AI world, as well, where the number of edge devices is growing rapidly. Meltdown and Spectre have raised the anxiety level of CIOs, who worry about data leakage, data theft and ransomware. Large chip companies have been quick to develop security patches, but those patches eat into performance. Inside of data centers, that performance is tied to real money because it affects the speed at which data can be processed, how much cooling is required, and how much extra capacity is required just to stay even.

“Most attacks happen in software, and that is still the majority of them,” said John Hennessy, chairman of Alphabet (Google’s parent company), in a presentation at the recent Hot Chips 30 conference. But he noted that attacks on hardware are becoming a much bigger concern. “Meltdown and Spectre were just the start. There are now multiple versions, and there is a suspicion those were just the first variants. L1TF (L1 Terminal Fault, also known as Foreshadow-NG) and Foreshadow can break a virtual machine.”

Hennessy said those holes are nearly impossible to repair in software. There is a big performance hit, and not all of the vulnerabilities can be addressed that way. “There are lots of processors out there with holes in them,” he noted. “The average response time to fix holes is about 100 days.”

Speculative execution
One of the main hardware attack vectors identified so far involves speculative execution, which is roughly the equivalent in a processor to what pre-fetch does for search. A speculative execution predicts what the result will be from the next instructions on a processor in order to optimize performance.

The problem is that what worked well in the past is now considered a liability due to side-channel attacks. Paul Turner, a Google software engineer, said during a Hot Chips presentation that speculation was thought to be data free for the past two decades. That turned out to be an erroneous assumption. He noted this is particularly troublesome with shared memory, because mis-speculation on a boundary can open a security hole. Hackers can use that approach to tap into software kernels and hypervisors through restricted memory and look at anything running in the system.

“There is nothing today that tells hardware where the boundaries are,” Turner said. He noted that hackers can use a side-channel to attack private memory using its own execution environment.

Speculation, which has been in use for decades, is a valuable way of improving performance in hardware. Mark Hill, professor of computer science at the University of Wisconsin, said speculation has provided a 20X improvement in performance over the years, or roughly the difference between current processor speeds and the days of 200MHz chips.

“The question is whether we solve this in hardware or just mitigate,” said Hill. “It will take a long time before that answer comes back.”

Fig. 1: How speculation works. Source: Mark Hill/Hot Chips 30

Safety, security and complexity
Finding security holes is only one piece of the puzzle. The next question is what to do about them.

“If the fundamental processors have a newly discovered vulnerability, that has big impact, but it also gets fixed relatively quickly—except that you have all the existing systems,” said Synopsys chairman and co-CEO Aart de Geus. “One of the key challenges with security is the timing of knowing about a vulnerability, including when do you communicate it. Let’s say you find that you are delivering a system that has a vulnerability. How do you communicate that to your customers? Do you inform them all at the same time? Do you inform them loudly? Do you inform them quietly? If you inform them loudly, you’re simultaneously informing all of their enemies. A lot of enemies are very fast.”

This is one of the reasons that new vulnerabilities identified by groups such as Project Zero show up in the market slowly.

“When somebody finds a vulnerability, they tell the affected vendors, and then there’ll be an embargo period that runs usually in the range of three to six months,” said Kocher. “It can be a little longer, a little shorter depending on the situation. Right now we’re in the waves of vulnerabilities coming out from people who discovered and heard about Spectre and Meltdown after the embargo ended. They started doing research after that, looking at what else they could do to it and how to adapt it and apply it in other ways. The embargo for the Foreshadow attack started in January, which was right about the same time as the other embargo was ending, so we should expect a couple more durations of the embargo length before the first set of academic researchers thinking about these problems have published their results.”

There will certainly be more to come, and not all of them will be discovered by white-hat hackers. At this point no one knows vulnerabilities can hit. But some of these chips are beginning to find their way into safety-critical applications in automotive or industrial markets. In the past, no one ever considered a car to be something that could be successfully hacked. Yet as cars increasingly are connected to each other and the Internet, and electronics are added to control a vehicle’s movement, the threat of malware and ransomware suddenly becomes a much greater threat. This issue is made much worse by the fact that some of these chips are expected to be in use for 10 to 20 years.

Designing for security
The ideal solution is to design in best practices for security from the start, but that’s made more difficult by the security features themselves.

“Security and safety are a big concern for customers, and the costs are intertwined for both,” said Frank Schirrmeister, senior group director for product management and marketing at Cadence. “There’s a huge awareness about how to deal with security. There are high-security modules somewhere in the chip, which nobody else should be able to look at, and there are formal tools to help you to validate that. You’re creating a chain of trust. But this affects other parts of the design, too. You want to make sure that for debug you can’t just read out the chip because you have a JTAG interface into all of the secure zones. There needs to be technology in there so that when you put keys into chips, it can’t get out. Otherwise you have a problem.”

Security also doesn’t get applied evenly, which adds its own set of concerns as more devices are connected.

“For each kind of device you have different threats and different costs associated with that,” said Mike Eftimakis, senior IoT product manager at Arm. “For example, the level of protection you need for a light bulb and for smart camera are very different. From those threats you develop different architectures to counter those threats. This is more about spreading best practices in the industry, not reinventing the wheel, because not all of the many IC devices are developed by security experts. They need this knowledge and understanding of how to match the threats and the architecture.”

And even then, weaknesses will crop up.

“The more connected things are, the more kinds of devices that are available, the more opportunities to get in,” said Martin Scott, CTO of Rambus. “If you look at the mobile platform, that’s mature enough, standardized enough, and has been in production long enough that it may be difficult to get into those devices. Applications and memory are tied down better. But if that device connects into a fob for your car, or water filter replacements for your refrigerator, or your home gateway, your wearables—somewhere in that there isn’t the same level of standardization or protocol compliance or security testing. That is the interesting and worrisome attack surface. It’s all this stuff you don’t think is very valuable, but you can get in through your toaster to drive off in your car.”

Hardware security risks will continue to escalate for several reasons. First, the value of data is increasing and there are better tools to sift through large amounts of data. Second, increasing connectivity and more sophistication allows devices to be attacked remotely. And third, hardware is being developed as part of increasingly complex systems, with some elements expected to be in use for a decade or more.

Security experts have been warning about risks to hardware for more than a decade, with little hard evidence to back up their claims. Much has changed in the past year. And while hardware is still far from secure, a lot more people are paying attention these days.

Related Stories
Security Holes In Machine Learning And AI
A primary goal of machine learning is to use machines to train other machines. But what happens if there’s malware or other flaws in the training data?
Designing Hardware For Security
Most attacks in the past focused on gaining access to software, but Meltdown and Spectre have changed that forever.
Why The IIoT Is Not Secure
Don’t blame the technology. This is a people problem.
Hardware Security Threat Rising
Why hardware is now a target and what’s driving this shift.

Leave a Reply

(Note: This name will be displayed publicly)