Verifying Side-Channel Security Pre-Silicon

Complexity and new applications are pushing security much further to the left in the design flow.

popularity

As security grows in importance, side-channel attacks pose a unique challenge because they rely on physical phenomena that aren’t always modeled for the design verification process.

While everything can be hacked, the goal is to make it so difficult that an attacker concludes it isn’t worth the effort. For side-channel attacks, the pre-silicon design is the best place to address any known or less-obvious attack vectors. And the earlier in the flow that can be done, the more likely it is to stand up to attempted breaches.

“There’s been a huge realization that the overall system is only as secure as the hardware foundation,” said Pete Hardee, product management director for formal solutions at Cadence. “Once you’re aware of how these attacks are happening, you can design the processor to be a lot less vulnerable to those sorts of attacks.”

That’s becoming increasingly vital as chips become more complex, and as they are used in safety-critical and mission-critical applications. “You really have to build in security from the start,” said Steve Hanna, distinguished engineer at Infineon Technologies. “We’ve learned over the years it is not something you can slap on later. You really have to design it in. What that means is that you want to have some secure execution area within the processor where you can put your most security-critical elements, whether that’s a hardware security element, a separate core, or perhaps it’s just some circuitry as part of the processor that has that capability. Some of our customers even use their memory as the secure domain. But in any case, you need to have a secure domain where you can keep things like cryptographic keys.”

That requires verifying the hardware is secure. Still, some types of side-channel attacks are easier to verify than others. Tools are becoming available that can help to vet a new chip for immunity to such attacks. But some vulnerabilities cannot be directly checked, leaving some of the verification for confirmation once silicon is available.

“If you really start going after the hardware root of trust and determine the value of the secret key, that opens up everything above that level,” said William Ruby, COO of FortifyIQ. “Once you break the hardware, you can then expose the operating system, the applications, the files, and the entire network.”

Breaking the hardware involves one or more of three basic categories of side-channel attacks:

  • Physical attacks, which are typically non-invasive;
  • Timing attacks, and
  • Fault injection attacks, which are invasive.

While each of these approaches is very different, they share one goal — to disrupt normal chip operation so they can access protected secrets.

Multiple approaches are used for pre-silicon verification, as well. But even with the best tools and methodologies, they may not be exhaustive. There are no completely hands-off automated ways to verify a design’s security, so engineers must be aware of the attacks that might be tried and identify what is potentially of interest to attackers.

Physical attacks
This type of attack looks for information that might inadvertently leak from a device while operating. It’s not logical information, but rather indirect physical indicators that provide clues as to what’s going on inside.

“There are a whole host of vulnerabilities and leakages that rely on crosstalk and similar phenomena,” said Hardee.

The two best-known ways of doing this are by watching the power and detecting electromagnetic signals emanating from the device.

“Using differential power analysis has been the most common way of performing a general attack to extract confidential information or cryptographic keys,” said Vikas Gautam, vice president of engineering for Synopsys‘ Systems Design Group. “You apply patterns, you study power consumption, and then you try to analyze and extract keys or parts of the keys one by one.”

In the early days of such attacks, the power signal may move visibly during a critical operation, providing a more direct indication of what’s happening. These days, the effect is more subtle. “There’s a notion of simple power analysis, where you can look at the waveform and actually see a spike,” said Jason Oberg, co-founder and CTO at Tortuga Logic. “But most of the time, if you look at it, it looks like there’s no discernible difference. But if you do it over a thousand or a million cycles, you’ll start to see correlations.”

Statistical tools are required to expose those correlations, but there are few surprises here. These attacks have been “productized” more than any others.

“You can buy surprisingly cheap and easy-to-use devices, whose sole purpose is to mess up NFC payment chips and gain access to money,” noted Hardee.

Fig. 1: A typical setup for performing a differential power analysis (DPA) attack. Source: Wikipedia [1]

Fig. 1: A typical setup for performing a differential power analysis (DPA) attack. Source: Wikipedia [1]

The benefit of pre-silicon verification, at least when it comes to power, is that a simulation model provides access deep into the circuit. So while an actual attacker can observe only an external power pin — unless they are grinding off the package and probing it with sophisticated tools — simulation software can help to identify which modules are telegraphing these secrets.

“What you want to know is, ‘What module shows the vulnerability? Where am I leaking my secret key information? Which module is responsible for it?” said Ruby.

But power or EMI models don’t usually have the resolution to simulate those phenomena outright, which has made this a challenge. “If you do analysis pre-silicon, the quality of your analysis is very much dependent on your power model,” said Oberg. “You can use a very simple power model, and if that says that you’re vulnerable, you definitely are vulnerable.”

Instead, signal transition density is a reasonable proxy. So simulation tools here will watch those signals to see how they’re likely to impact power and radiation. But the vector suites can be very long.

“Some of these compliance and certification standards specify the number of vectors or traces that you need to run to ensure that your design is not vulnerable to these types of attacks,” said Ruby. “The number can be in the billions. With a decent number of machines in your compute farm, you can literally turn billions and billions of traces overnight.”

One of the harder problems to solve in this area involves dark silicon, which is a portion of a chip or system that is put into a sleep mode to conserve power when it is not in use. “An adversary can always figure out what wakes up a system,” said Scott Best, technical director at Rambus. “And if the system needs to wake up and behave securely, that’s staggeringly similar what your phone does. It may take a minute from a full cold boot to get itself into a secure execution place before going back to sleep. And then it may take only 10 or 15 milliseconds each time from there, which you probably don’t even notice, and it may only stay on for another 10 seconds before going back to sleep. So you may think you don’t have to worry about security because it’s off most of the time. But that 10 seconds is an eternity for staging any sort of attack that might reveal private or otherwise secret contents that are in the system.”

Timing attacks
Timing attacks take advantage of subtle differences in timing that may be driven by some detail that can give key information away. “A new class of vulnerabilities based on transient execution attacks, such as Spectre and Meltdown, has gained much attention,” said Raik Brinkmann, senior director of engineering at Siemens EDA. “Because they require no physical access to the target system, they pose a genuine threat to the security of modern computing systems.”

Code execution flow, caches, and bus contention are examples of how this might happen. In the first case, different code trajectories may take differing amounts of time to complete. If one knows something about the code (not unreasonable, given the amount of open-source code available), then watching the timing provides clues as to the data the code is working with.

“The classic example is if you have an encryption key and you’re performing RSA, for example,” said Oberg. “Depending on how you implement it, it has a data-dependent computation that happens on the key bit. There’s either a square or multiply, and there’s an IF condition based on a key bit. There can be a time delay based on the value of the key if it’s implemented poorly, so the attacker can determine you’ve used key one versus key two. If you do that for a lot of computations, you actually can figure out what the secret value is.”

This is largely driven by the software being executed, rather than the hardware itself, so silicon verification might not apply to this case.

Watching caches, in contrast, has been a component of a number of well-known attacks. Data needed for execution may or may not be in the cache when requested. If not, a cache miss occurs and the data must be fetched from memory. This takes more time than a cache hit would take, providing information about data flow.

Attacks via the cache can be subtle and complicated. “Cache-timing attacks distill down to an attacker, forcing the victim’s data into the tag of the cache and then extracting that data through the difference of cache hit/miss timing,” explained Oberg. “The cache tag is related to the memory address, and dictates which cache line an address maps to. The tag is important because the attacker cannot arbitrarily read data from the cache, but it can figure out what addresses belong to the victim from the hit/miss latency difference.”

Those clues aren’t likely to be secrets themselves, but they are part of what is typically an elaborate sequence of steps that eventually can lead to secrets by handing over control, providing access to privileged memory, or “inverting” data by switching it from secure to unsecure.

“These vulnerabilities are implications of micro-architectural design decisions and may be introduced unintentionally or, even worse, deliberately,” noted Brinkmann. “Although in principle such backdoors or Trojans can be analyzed and prevented by pre-silicon verification, they are extremely hard to find, as they neither add redundant logic nor corrupt intended design functionality.”

The ability of speculative execution to load an instruction that may never be executed has been a particular issue in famous attacks like Meltdown and Spectre. That tends to be more of a vulnerability on a fully booted system. “Your secure boot processor is usually a much simpler processor that really doesn’t use any speculative execution,” noted Hardee.

Bus contention attacks, meanwhile, allow a program controlled by an attacker to attempt bus accesses while another process is also trying to access the bus. The timing involved in either being granted access or having to wait to gain access yields yet more information.

“There’s a whole class of growing cyber-attacks in the time domain caused by contention on buses to other shared resources like floating point units,” said Oberg. “The attacker can manipulate the bus when there’s some secret transaction also hitting the bus. The attacker is able to observe that because of the contention that they’re driving on the bus, as well. So even though they’re not reading a secret value — they may be getting garbage back — at least they’re learning information from time delays because of the contention.”

“There’s going to be a whole slew of those that continue to pop up, especially within heterogeneous shared architectures,” he added. “That’s going to be an attack vector that continues to grow. A lot of our customers are building out pretty complex environments that look at this contention because they’re concerned about Meltdown/Spectre-type stuff.”

For these attacks, an accurate source of timing information is needed. “In most real cases this is done from software using existing timers backed into processor cores,” said Oberg.

The higher the timing resolution, the more information can be learned from execution. If the resolution of the timer is reduced enough, it becomes harder to distinguish between events with small differences in timing.

All of these timing attacks relate to the flow of information through the system as execution proceeds. Pre-silicon verification involves tracking those flows to identify vulnerabilities. “We identify how the victim’s information flows in the design, including how the victim’s information can influence time,” said Oberg. “If you can prevent the flow of the victim’s data to regions an attacker can access, then you can make extremely strong statements about the security of the system.”

But the verification engineer must be sure to identify the assets that might be attacked. Otherwise, the overwhelming number of irrelevant flows and timing relationships may flood the results with noise. “We’re big fan of defining your security requirements based on an asset, and then based on some security objective — confidentiality, integrity, or availability,” Oberg added.

Fault-injection attacks
The third attack category is harder to nail down because it tends to involve unpredictable behaviors, such as when a chip is somehow abused. “For example, say you’re in the process of doing a secure boot,” said Oberg. “You can glitch the power line and maybe cause a secure boot to jump into safe mode, and now you have root access.”

There’s no strict definition of what is considered a fault-injection attack. Instead, the basic idea is to find some way to put a processor into an unexpected state or flip some register data or something similar, changing the state of the system in a way that may give the attacker an advantage.

“A lot of what hackers are trying to do is to make attacks on vulnerable signals that put the system into a different mode,” said Hardee. “Test-control signals and debug mode are a couple of areas that hackers tend to attack. But they’re also attacking the power lines to try and get the system into a different state. And what that is really trying to do is to put the system into a boot mode, and then interrupt the secure boot.”

The three most common ways of doing this are by messing with power, the clock, or even temperature. Glitching VDD or playing with clock timing, for example, can have unexpected results. It’s just hard to tell, a priori, what effect anything like that will have. An attack may end up being a process of trying a lot of different things randomly until something interesting happens, and then figuring out what happened and how that can be exploited.

This tends to be harder to verify pre-silicon. While there aren’t models that predict what will happen if VDD is glitched at a particular time, for example, it is possible to check what the impact would be of artificially injected faults — regardless of how they were injected. But it helps to think through what an attacker’s approach likely may be in order to identify where to inject faults.

Fault injection is also a tool used for verifying functional safety in systems that must prove themselves to be safe. “There’s an overlap between functional safety solutions and security solutions,” said Hardee.

But security testing can be harder than its safety equivalent. “What you’re generally looking for in a functional safety solution is, ‘Is there a single fault I can inject that causes a problem that’s not detected?’” Hardee explained. “What you’re looking at in a security solution is often a lot more targeted, but it’s, ‘Is there a combination of faults that can put the design into a non-secure state?’”

Unfortunately, this can be one of those areas where it’s common to test for vulnerabilities to known successful past attacks. It’s much harder to ensure that there isn’t some new attack that no one has tried yet.

Pre-silicon verification is coming around
While there are no — and can probably never be any — exhaustive pre-silicon verification suites that cover all side-channel attacks, this is an area being addressed by an increasing number of tools.

It’s early days for verifying against physical attacks, but at least one tool from FortifyIQ claims to be able to do that. The company doesn’t track power or radiation directly, but rather looks at transition densities to see if any such sequences can be correlated with internal secrets.

There also are tools that can trace internal events to help to identify timing attacks, such as the tools from Tortuga Systems that track information flows.

Formal analysis can be helpful, too. Most straight simulation approaches require stimulation vectors with a high degree of coverage. “When you’re talking about dynamic verification, it’s only as good as the testbenches you use,” said Hardee. “Formal is different in that, unless you tell the tool otherwise, it’s going to throw every combination of possible inputs at the design. How formal works is that you describe the behavior that should happen as a property, and then the formal tool tries to disprove that. It’s negative testing, as opposed to positive testing, in that we’re throwing everything we can at the design and trying to break it. All of the processor vendors that I’m aware of are using formal verification to ensure the systems are not vulnerable to these kinds of attacks, and that out-of-order execution is not causing these kinds of problems.”

“Recently, formal verification has been shown to be effective for hardware sign-off at the RTL level, which guarantees the security of processors,” said Brinkmann. “Together with academic and industry research partners, we are currently developing a formal-proof methodology that can be used to systematically detect all vulnerabilities to transient execution attacks in RTL designs. As of today, it already scales to a wide range of hardware designs, including in-order processors, pipelines with out-of-order writeback, and processors with deep out-of-order speculative execution.”

Fault injection, however, has no exhaustive set of tools. These are largely ad-hoc attacks that are challenging to assess systematically. The best approach today is to identify what kinds of faults could cause a system to misbehave. Then developers either can make it hard to inject that fault, or change the design to eliminate the misbehavior — or both.

But declaring success can be difficult. “Safety standards give me some data that I can test against,” said Hardee. “There’s no equivalent to that for security.”

Security tests gradually are being accumulated, some of which can be thought of as a regression suite for security. “There are libraries of security tests we can run, derived from the portable stimulus spec, to try and see if non-secure processors can access secure memory areas and other secure resources,” said Hardee.

While it’s not possible to deterministically account for all possible attacks, lists of potential problems are being built — even for hardware. “There are a number of different databases of known vulnerabilities,” explained Hardee. “And one of the most common is from mitre.org called CWE, the Common Weaknesses Enumeration. As security weaknesses become known, there are various ‘white-hat’ hackers, academic groups, and industry groups who test for vulnerabilities and publish results.”

Only in the last couple years have they started to measure hardware vulnerabilities. “The last time I looked, the last version had something like 104 defined hardware weaknesses,” Hardee said. “Those weaknesses are not specific bugs, but they’re classes of vulnerabilities — often the very side-channel attacks that we’re talking about. No doubt there are a lot of customers who have their own collection of vulnerabilities that are not published to the world.”

Buying pre-certified IP can also help, although it’s not a panacea. “What people are doing now is they’ll buy an AES encryption core or root of trust that already has side-channel mitigations built into it,” said Oberg. “They’ll put that in their SoC, and they say, ‘I have mitigations. I’m good.’ Because they have the mitigations, they tend to check the box and get their certification, but they really don’t have a good understanding of the vulnerability points in there. From an attacker’s perspective, they could have a bunch of other issues they’re unaware of.”

Conclusion
Even with the best of verification approaches, however, there are no absolute guarantees. “The objective of secure design is to make a successful attack as difficult as possible,” said Synopsys’ Gautam. “Nobody can guarantee that a device is completely attack-proof. It’s impossible.”

Or, as Oberg put it: “The challenge is knowing that you’ve done enough so that you’re done, so you feel good about it.”

Reference
[1] Image source: By Mark Pellegrini — Oscilloscope diagram from File:Oscilloscope diagram.svg (GFDL); Computer diagram from File:Computer-aj aj ashton 01.svg (Public domain); smart card reader from File:Gnome-dev-smartcard.svg (LGPL). Smart card diagram is a self-made SVG, derived from file:Carte vitale anonyme.jpg (public domain)., CC BY-SA 3.0



1 comments

Andy says:

The purpose for side channel arrack is to find the encryption key. The easiest way to avoid this kind of attack is to divide a file into thousands or millions of blocks and encrypt it with thousands or millions of keys.

As for injection attack, it is rather easy to find out that the system is being attacked once a file using “key stream” encryption is out of order. If a file is protected with a distribution list, which has user’s attributes, which allows users to access the file by being authorized by user’s real identification, it is rather difficult for a hacker to access a file, not even an execution file.

Leave a Reply


(Note: This name will be displayed publicly)