Why It’s So Hard To Stop Cyber Attacks On ICs

Experts at the Table: Any chip can be reverse-engineered, so what can be done to minimize the damage?


Semiconductor Engineering sat down to discuss security risks across multiple market segments with Helena Handschuh, security technologies fellow at Rambus; Mike Borza, principal security technologist for the Solutions Group at Synopsys; Steve Carlson, director of aerospace and defense solutions at Cadence; Alric Althoff, senior hardware security engineer at Tortuga Logic; and Joe Kiniry, principal scientist at Galois, an R&D center for national security. What follows are excerpts of that discussion, which was held live at the Virtual Hardware Security Summit. To view part one of this discussion, click here. Part two is here.

SE: As over-the air-updates and security patches are added in for a lot of different devices, does that impact performance and power?

Borza: Yes, and we see that happening with a lot of security algorithms. As we’ve discovered more and more ways in which some attack can be first accomplished, and then defended against, we’ve seen how badly some of the x86 computers have slowed down to address a lot of the microarchitecture attacks. There is a cost, and it comes in terms of performance, power consumption, and sometimes area or memory footprint.

Carlson: There is a tradeoff you can make with the cost per area of performance penalty. There are people who have security co-processors that are watching what’s going on, and they’ll interrupt when something happens. Those are always on and running, so you get a power penalty, as well. But you can lessen the performance impact. It’s not a nice continuous tradeoff, but there are some things you can do to help decide how you want to be impacted.

Handschuh: The RISC-V work currently is looking to see if anything could be done at the ISA level, but also to provide some guidance to try to avoid these Spectre-style, or speculative execution-style, attacks. It’s not easy to solve, but there are small things that you can start doing. One group is working on trying to figure out what the best way is, for example, to flush the caches only when needed, but not too much so as to not impact performance too much. But you still need to do it at least on every kind of security context switch so that you know for sure that your data is gone. There’s this style of work going on. And they’re thinking about how to make the best out of it. But there’s no general solution. Way more research is still needed.

Kiniry: In circumstances where we have control, or at least what I would call ‘über transparency’ about the components being used, and you have extremely strong guarantees about their behavior and correctness, it means you actually can build a system that’s faster with better performance than before. Now you have guarantees about interfaces, which basically means you can fulfill preconditions and invariants that otherwise you had to check for or monitor in the first place. It actually ends up that the systems we build using those techniques are significantly smaller, because you don’t have to do defensive programming, either in software, firmware, or hardware, and you get to really focus on only solving your problem and not exceptional cases. That can trickle all the way up from the hardware into the operating system. But it’s an extraordinary circumstance to have those sorts of guarantees.

Althoff: There’s an opportunity on top of that with respect to usability and discovering what aspects of performance really matter. We can say, ‘Well, we just speed the whole thing up and everyone will be happy with the performance.’ But I have really fast processors and my Windows freeze. It’s like, ‘Why does that happen?’ It’s because of priorities. And we have the opportunity now to prioritize certain features of performance and segment performance on many axes. It’s a great opportunity.

Kiniry: Indeed, and in much of the work we’ve been doing in SETH (Secure and Trustworthy Hardware), which is about measuring power, performance, area, and security across product line families, we find that people might spend a lot of time focusing on making a piece faster, like making the CPU 20% faster. And in the end, it made the area go larger, it made your energy to go up, it made your security go down, and yet your overall system isn’t faster because you’re waiting on the network and I/O all the time.

Borza: Yes, you’re chasing bottlenecks around the system.

Kiniry: You’re chasing the wrong things all the time. Remember the old efficient computers? I miss those days, but we can return to that in some sense by getting more assurance about our pieces and doing away with a lot of the nonsense baggage that we have to deal with today.

SE: Some of this used to be built into margin in your design. We don’t have that margin anymore, particularly as we push down into the AI world where we’re starting to really tighten everything up. What happens when you start losing that margin? If you have to add in security into a car, does that slow down all cars on the road?

Handschuh: At least for now, security updates are not done in real time. They try to wait until you’re at home to try to do that, plugged in somewhere. They don’t send it immediately. There is a lot of monitoring going on in the vehicle itself. You collect the data and you send that data back to the analysis servers so that you already know what will be or what might be the next update you want to do. But yes, in field seems a little tricky.

Kiniry: With regard to matters of margin, most of the work we do avoids conversations around margins entirely, in part because of the push toward higher performance and smaller nodes and the like, but in part because of the way we build things — the way we do systems engineering. It’s common enough for us to do we worst-case execution time analysis on software, so you actually know how bad it is and how you can schedule things. Or in the case of hardware design, we don’t use clocks, and therefore we’re not pushing boundaries with regard to clock designs. We end up building asynchronous designs that are robust in the presence of any kind of tolerance, all the way down the threshold voltage. Sometimes it is smart to think outside the box and not be chasing these hard problems in a way that might be comfortable and well-trodden, but inappropriate in the modern context.

SE: There are a lot of errors that can creep in. Some are just plain programming errors, some are design errors. Are we getting better at tracking down whatever caused a problem and fixing it? Are the tools and methodologies getting better?

Carlson: It depends on what point in the lifecycle are you talking about. Certainly, the connection between the design environment and the lab environment is there. Now, extending that out across the lifecycle, the notion of digital twins lets you have a continuous path of information at any point of the lifecycle. That can take you back into that design database, where you have full visibility into the operation of the design, the state at every point in the system, so that you can diagnose you know how you got there.

Borza: We’re starting to see a lot of work on the formal verification front, which is extending some of the formalisms and mathematics to chip verification and to a lesser extent, in software, to try to prove program is correct and that the implementation on a particular processor is correct. That’s quite helpful in understanding what’s going on in these systems and tracking down when things go awry.

Kiniry: One those occasions where we are able to use those tools and techniques, it is very effective — especially for the tools that we use or build that do minimal counter-examples, shortest traces, these kinds of things. But when I watch traditional hardware engineers trying to debug, ‘Why is there an X on this line,’ at cycle 1 million, and they are hand tracing it back through, I just don’t understand how anything works given traditional design and verification techniques. I have to admit, it’s a miracle my machine works. More powerful tools do lend themselves to this, but even the best tools I use for hardware design are still pretty poor compared to what I’m used to in software and firmware analysis. There’s real fertile ground for new R&D and impactful tools.

Borza: I agree.

Handschuh: We start to see some tools that can do almost direct power-analysis style attack, and we’re seeing quite a bit of effort to try to make sure that at the lowest level, while you’re writing your code, you see where it’s going and you can measure and analyze things. So that’s provides some hope for the future. But there’s a lot of room for companies to come up with stuff.

Carlson: With a combination of formal techniques and fully homomorphic encryption, we’re all set.

Althoff: We started with who’s responsible and who do we trace this to, and then we start talking about debuggability, which reveals a lot about responsibility and how we think about it and really apply pressure. And that’s why we’re focused on the integration of tools into the process and making those important in the process. The tools in software are awesome in comparison, and there’s still a ways to go because they can get lots better. But they’re also integrated into the developer’s process. In the hardware engineers process, not so much.

Carlson: There’s a lot of capability that is not being applied. It requires learning new tools, and spending more money on new tools and compute resources to execute them. Back to the carrot and the stick conversation, the stick is going to be instrumental in that. Companies can get fined by the government today because of inadequate security practices. And if you see that happening, it’s really easy for the accountants in the company to say, ‘Let’s spend the money on those security tools and get this right.’

Kiniry: I look forward to the day when I license a piece of IP for a lot of money and it actually comes with a verification bench that uses Tortuga, Yosys, VC Formal, and Jasper, or even any subset thereof, because suddenly now I have reproducible evidence that helps me design my system better. This doesn’t happen today.

SE: It’s a similar approach to verification IP, right?

Kiniry: Yes, and it isn’t just a bunch of testbenches somebody wrote randomly while they’re eating the hamburger. Sorry, that just doesn’t cut it.

Handschuh: We have built tools in software. We need the same thing in hardware, and we need to integrate the security piece — a nightly security regression of some sort. That would make total sense.

Kiniry: Part of what I hope we can publish, both for open source and publications coming out of FETT (Finding Exploits to Thwart Tampering), is a continuous integration, continuous verification system we built that spans all the different SoC platforms, compilers, and verification with regard to both correctness and security. I’m currently using the cloud to run those evaluations on pull requests that are going up for the FETT program, as well as on premises FPGAs in my lab at Galois. There’s a lot that can be done, where we take modern development practices — not DevOps, but modern rigorous dev practices from the software and firmware world — and apply those to hardware.

Borza: There are a lot of people who are doing that. We’ve adapted a whole Jenkins Flow to be able to do that stuff for our IP products, and they’re constantly running in regression. So there is hope that we’re each learning some things, the software people from the hardware people, and vice versa. It’s telling that the debugging tools for software so much better. It indicates how many companies or many people develop their software and test the quality, as opposed to designing it to be high quality and secure and safe right from the get-go. And so there are absolutely great debugging tools for software, while you don’t have as good tools in the hardware world. On the hardware side there tend to be a lot more tools for verification, meant to be able to run billions of cycles through things and get to as many of the corners as possible, which is different than the approaches taken in software.

SE: So what happens when quantum computing comes online? We’re starting to see real progress there. What does that do to anything we develop now?

Borza: It means that anything you’ve encrypted recently using a public key algorithm becomes known.

Handschuh: Public key encryption becomes public is the short answer.

Kiniry: It depends on which algorithm you’re using.

Borza: It does, but in a sense, we’re already behind schedule to adopt post-quantum resistance algorithms. And the whole process really started later than necessary, because we’re likely to have a quantum computer about the same time as we have agreement on what algorithms should stand up to.

Handschuh: There’s pretty good set of candidates that NIST has been looking at. There’s still seven in the running, plus another backup list. They seem to like lattices a lot. There are a couple other schemes that are a bit older, and that have been out there for a while and studied enough to be confident. But the more interesting ones are newer. People are putting a lot of hope behind them.

Kiniry: And there’s already early adoption of some of them, especially in experimental settings, both in software and hardware at major Fortune 100 companies. If you Google just right, you can even find some formal verification of post-quantum algorithms we’ve done for Amazon.

Althoff: When is NIST making a decision?

Borza: January 2022 is the target right now.

Kiniry: That’s right. And if you if you sniff the wind and listen to NSA proclamation, you can guess where things might land.

SE: As we move down to 7, 5, 3nm, the dielectrics get thinner, the density increases significantly, the ability to tap into on-chip communication or subject things to electromagnetic interference rise up fairly significantly. Is there a way to shield all this stuff?

Borza: It’s a pretty daunting challenge, but people have been doing it for a while. As line widths get smaller, it’s harder to attack them. You need a better lab to start figuring out what’s going on in there. So you get some benefit from getting smaller. But at the end of the day, the art of physical attacks has been improving, and it continues to improve. So you have both sides of the coin. Physical attacks always have been the most difficult to defend against, and that’s not going to change.

Carlson: There is a cost for being able to reverse engineer or perpetrate an attack. The physical access aspect is one. But with the resources of that nation state might bring to bear, it’s extremely difficult to prevent. FCIS (Florida Institute for Cybersecurity) has a nice reverse engineering lab, where they can figure out what’s going on in every bit in a system. They have 14nm capability. And this is just a university. So you can imagine what a nation state would have in terms of lab capability, or just a well-heeled commercial company that that’s interested in other competitors work. At the same time, you can look at defeating security in the systems that you’re developing using reverse engineering analysis. I know some folks at a cell phone chip company that have a nice attack lab, and they break into their competitors’ systems for fun. They’ve been doing it at 7nm most recently, and they’re going to continue to do that. I don’t see where you’re really going to be able to stop it. It’s harder, and the tools they bring to bear are getting more robust.

Handschuh: In general, I would argue that if you can build it, you can break it, because if you can build it, that means you’re able to debug it. And if you can debug it, that means that you can attack it at the lowest level and at the smallest gate size, and at the smallest gate level. Now, there will be mechanisms that we can invent that are a bit different for smaller nodes, but there are already some things that can be to protect these designs.

Althoff: This is a good case for educating the end user about survivability. And at whatever level about, if you use it saying, communicating very clearly, if you use this component in this environment, you are at risk. And for anyone with $100 worth of equipment, or access to a makerspace, or whatever, I think that kind of transparency is really important so that people know what to expect, when they set their cell phone down in library and walk away. That kind of attitude that we have about our systems, as the public and as individual system manufacturers, and how they install and configure components. And just to just to finish one last thought, it really highlights one of the problems with mass market electronics, in general, which is that you’d to avoid the ‘break once, break all’ kinds of attacks. You want an individual system to be protected with its own unique secret data that’s buried in there. You’re not going to stop the physical attack that lets you take apart one chip and get the data. But you can stop that attack from turning into an attack against every instance of that design. It means you’re doing things like uniquely encrypting the software that’s on that chip. It alone uses a unique key, and that key is not shared by everybody else.

Dealing With Security Holes In Chips
Part 1: Challenges range from constant security updates to expected lifetimes that last beyond the companies that made them.
Security Gaps In Open Source Hardware And AI
Part 2: Why AI systems are so difficult to secure, and what strategies are being deployed to change that.
New Security Approaches, New Threats
Techniques and technology for preventing breaches are becoming more sophisticated, but so are the attacks.
Security Knowledge Center
Top stories, special reports, videos, white papers and blogs on security

Leave a Reply

(Note: This name will be displayed publicly)