Why it’s so difficult to ensure that hardware works correctly and is capable of detecting vulnerabilities that may show up in the field.
Verification for security requires new practices in both the development and verification flows, but tools and methodologies to enable this are rudimentary today.
Flows are becoming more complex, especially when they span multiple development groups. Security is special in that it is pervasive throughout the development process, requiring both positive and negative verification. Positive verification shows that intended functionality is present, and this is usually done through example. Negative verification attempts to demonstrate that certain things cannot happen, and this is often a lot more difficult.
Safety and security represent a new axis in the matrix of issues associated with system verification.
But security also is very similar to verification. “Tools are not perfect, and each of them needs to be combined,” said Mike Eftimakis, vice president for strategy and ecosystem at Codasip. “This is what we call the Swiss-cheese model. Imagine you have slices of cheese with holes in them. The bugs can go through the holes, but if you stack up plenty of cheese slices, you end up finding way to block the bugs. We need to accumulate these methods, these tools, to make sure that we end up with good coverage. But indeed, it requires a lot of different methodologies, different tools.”
The security problem adds another wrinkle, though, because it is dynamic. For most systems, once they are manufactured, the hardware has a fixed function, and that is what verification targets. But there are constantly new attack vectors that appear for security, and it is important that systems operating in the field can have their software updated to address as many of those new threats as possible.
Two approaches
There are two aspects to the verification of security. One is to verify the features that are inserted into the hardware to try and make it secure. The other is to verify that the system can detect and deal with threats that were not completely covered by the inserted hardware.
The verification of security hardware is tough. “The challenge with security is that people can’t do functional coverage easily on things like an MMU,” says Simon Davidmann, founder and CEO for Imperas Software. “They have many views and virtualization layers, and because of that you have to understand it in a white-box manner. It is a lot of effort just to work out the functional coverage for it. There are lots of challenges in verifying the quality of the hardware that is meant to make things secure.”
Many processor systems are being shipped with built-in security hardware. “One of the first things that you need in security is a platform that you can trust,” says Manish Pandey, vice president of engineering for the EDA Group at Synopsys. “If we have a design with a trusted mode, you may have registers that contain encryption keys or other digital certificates. These ensure you cannot have unauthorized access to secure information. Making sure all of these parts are clearly specified and can be verified against a security spec is one of the first steps. Going beyond to the next level, security is not just in hardware, but at the firmware level and beyond.”
While foundational technology like TrustZone are now commonplace, there are efforts to take this further. “CHERI (Capability Hardware Enhanced RISC Instructions) came out of Cambridge University, which adds capabilities in hardware for security,” says Imperas’ Davidmann. “It is putting in hardware that monitors all accesses. If you have a pointer that’s defined as ‘this size of array,’ you give that pointer the capability of only accessing those bits and the hardware enforces it. You can’t get a foreign agent coming in and manipulating your pointer. People are looking at all the vulnerabilities and all things that are broken in people’s previous generations of processors. And they’re putting new layers of hardware in. The first was TrustZone. The next was virtualization, and the third is things like CHERI.”
When does a feature become a bug that becomes a security vulnerability? “Bugs, like Spectre and Meltdown in processor architectures, provide a hack to take over an otherwise secure processor,” says Pete Hardee, group director for product management at Cadence. “You have a functional test bench that has good coverage on the good machine, and then you could introduce faults and measure that on the bad machine. But that doesn’t really work with security, because most of the modes that are used for security attacks would not necessarily be represented in a functional testbench at all. They’d be devious methods that the verification engineer cannot really conceive of. There are so many ways that the system could be attacked.”
Instead, it becomes important to identify what weaknesses might look like. “The way customers do this today is to define monitors that look at certain conditions that are not allowed in a secure context, that would violate security requirements,” says Johannes Stahl, senior director of product line management for the Systems Design Group at Synopsys. “Whether these are created within their own methodologies, or through commercial tools, these monitors land on an engine like an emulator. The full SoC has to be emulated with the software stack so customers can see if the full SoC is violating any of the security rules in operation.”
There is a limit to the number of simulation vectors that can be run, however. “Just doing simulation alone is not sufficient,” says Synopsys’ Pandey. “Hard as that may be, customers are persevering using formal to complement everything that’s done through simulation-based simulation for detection. But formal has its limits, even at the module level. Some are trying to extend this to the full-chip level with an appropriate methodology. While formal has a very critical role to play, if formal was scalable to full chip, everyone would be adopting that.”
Common weaknesses
Every tool has its limit. “The only way to ensure completeness in verification is by applying formal methods,” says Roland Jancke, design methodology head in Fraunhofer IIS’ Engineering of Adaptive Systems Division. “Unfortunately, this is only possible for the digital part of a design, and only for limited complexity. Therefore, in security, the common method is to set up databases of common threads and test against it. This approach is very similar to a virus database in software security, and its success very much depends on being up-to-date and being easily accessible. There are well known such databases, like MITRE’s Common Weakness Enumeration (CWE).”
This can help identify the types of behaviors that have been known to cause problems. “There is a great opportunity in describing those types of behaviors that could be inserted into chips, whether it be intentional or unintentional, but somehow manifest themselves into the behavior of the chips,” says John Hallman, DVT solutions manager at Siemens Digital Industries Software. “We can describe those types of behaviors with formal properties and be able to detect and make decisions about whether or not those should remain. There’s a tremendous amount of opportunity in being able to characterize a lot of these vulnerable type behaviors, and both detect and react, or even remove them when the risk is too high.”
One of the main challenges with those methods is showing completeness. “There is an ISO cybersecurity standard for automotive — ISO 21434,” says Cadence’s Hardee. “There are a lot of commonalities between ISO 21434 and ISO 26262. A lot of the processes are the same. The big difference is there are no specific metrics. There’s no single point failure metric, no failure rates. There is no equivalent when you’re talking about cybersecurity. The threats are unknown, largely.”
While the CME does collect attributes of threats, that database is not complete. “A lot of companies that are involved in secure designs will also be maintaining their own databases,” adds Hardee. “Their database might contain weaknesses that are not published to the world. It’s a combination of maintaining a database of known threats that are public, and known threats that people might want to keep private. That is really what you’re verifying against.”
Sometimes people wish to keep quiet about vulnerabilities in their systems. “One company we work with had problems with a number of security break-ins,” says Davidmann. “They put in place a methodology to reduce future threats. They would inject threats they’d found and see why their systems weren’t good enough to detect it. With full observability they could see what was going on. Then they would work out scenarios to put in their hardware and software, which could detect the malware when it was injected. They could build resilience into their operating system processes. There is security from a malicious intent, and then there’s the failure where something’s gone wrong. And if there are failures because the system’s not detecting these, they tend to open doors — vulnerabilities that people can explore and try and break in.”
Some of this is being integrated into tools. “I see automation being built into verification of tools that are looking to detect common threats, common vulnerabilities, and ways that could be used to exploit those,” says Siemens’ Hallman. “That may inform you about where you need to insert certain protections. Or it may be enough just to be able to prove that those vulnerabilities or weaknesses may not be able to be exploited in your chip. You can make conscious decisions at that point to save the real estate and know that you don’t have to protect against that particular threat to your IC.”
Side-channel attacks
There is a limit to what can be done pre-silicon, however. “Side-band attacks go beyond traditional functional verification,” says Pandey. “In traditional functional verification you are technically looking at the logic level. But for sideband, modeling the system appropriately becomes even more difficult. You have to consider how information is leaked, and what information can be gleaned from it. You have to develop appropriate models for that.”
A dual approach is required. “We can do a lot pre-silicon, but I don’t think it fully replaces the role of the attack lab,” says Hardee. “That takes the actual post-silicon prototype and does security testing on that, doing differential analysis on the power lines, trying to mess it up through different power sequences, all these kinds of things. The earlier you can detect any issues, the better. We can absolutely detect issues early enough where people might want to consider an architecture change. It is architecture changes that you need to fix things like Spectre or Meltdown.”
Creating the right models is not easy. “The problem with emulation is that the physical representation of the circuitry is not close enough to the actual device,” adds Hardee. “We can only really work at the logical or physical layer. We can work at RTL, we can work at gate level, we can work at gate level with timing. But to be able to do differential power analysis, both emission and susceptibility, requires being able to measure power lines to recover keys from the noise on the power lines. Similarly, can I manipulate the power lines and force power interrupts and power transitions with some kind of timing that puts me into a non-secure mode that bypasses the secure boots, bypasses the root of trust, these kinds of things? Those vulnerabilities can only reasonably be done on an accurate physical representation of the circuitry.”
Conclusion
Security is taking verification beyond its current set of capabilities. While the verification of capabilities inserted into hardware can be verified, it is the negative verification that adds the challenge. Formal verification is the only tool capable of performing that task, but today it is not able to tackle the full SoC, let alone a system running software.
The dynamic nature of security threats also means that systems must have the ability to add resilience over their lifetime, and software plays a big role. But this only works if sufficient capabilities are inserted into the hardware for it to work with. Companies are beginning to take security more seriously as they discover, to their financial peril, the true cost of assuming it will never happen to them.
Related Stories
Verification And Test Of Safety And Security
Why It’s So Difficult To Ensure System Safety Over Time
Shift Left, Extend Right, Stretch Sideways
Leave a Reply