Data Leakage In Heterogeneous Systems

What’s needed to secure data across multiple chiplets and interoperable systems.


Semiconductor Engineering sat down with Paul Chou, senior director of security architecture at NVIDIA, to discuss data leakage in heterogeneous designs. What follows are excerpts of that one-on-one interview, which was held in front of a live audience at the conference.

SE: We think about hardware in terms of a chip, but increasingly there is data moving through different systems, and the value of that data is increasing. So it’s no longer enough to secure a chip. We now need to secure lots of chips, chiplets, and the interactions between them. How do we do that?

Chou: If you look at the general trend, you need more compute power to be able to process all this data, and the slowdown of Moore’s Law means we need to do this in parallel. We’ve gone from single dies to homogeneous Paul Chou, senior director of security architecture at NVIDIAcomputing with multiple CPUs, heterogeneous withCPUs and GPUs and other things — and multiples of those on boards, and multiple boards in systems that are connected over the network. Instead of securing a single CPU, we’re talking about securing this distributed, heterogeneous blob of compute. It’s getting bigger, but the individual chips are getting smaller. Instead of a single die on the chip, it’s multiple dies within a package. That additional complexity relates to security. It’s no longer atomic, and the traditional ways of protecting things are breaking down. Ideally, you want a root of trust in everything with all links encrypted, but with the additional complexity that makes it a lot more difficult. Dielets are another axis of complexity that makes securing data even more difficult. We’re kind of resigned to the idea that some leakage is going to happen. Pursuit of performance will require it.

SE: In the past, you’d hack into a chip. But now there are lots of different pieces. How does that change security?

Chou: Data is the new focus. We have to protect it for privacy reasons and for competitive. It’s IP. In the past, you’d protect things like movies or video games. But moving forward, it’s protecting neural networks, models and weights, as well as training data, because without access to that you can’t create those neural networks. The importance of data has skyrocketed, and the need for security is rising across the board. And we’re seeing things accelerated by the move to the cloud, and by people working from home who want to manage their own data. But this allows for security to kind of have a direct relation to revenue. Before you put your most sensitive data on the cloud, you’d better trust that entity.

SE: Do we still need techniques like perimeter security or anti-tamper? Or are these now outmoded?

Chou: It’s increasingly more relevant. Obviously, cost is a big factor. But because of the importance of the data, depending on where you deploy it, nation-state attackers may be in scope. Certainly in defense, that’s relevant. Surprisingly, you need it even in areas where you wouldn’t typically consider physical attacks to be within scope, such as servers or data centers, where it’s behind guns, gates, and guards. But increasingly, there are point deployments that are on-premise because they don’t trust the cloud. So there may be cases where the owner of the machine doesn’t trust where it’s located, and there is some IP that you want to protect from attack. For certain applications, anti-tamper doesn’t allow access for reverse engineering. We’re also seeing things for the supply chain, particularly if it’s defense-related. Given the capabilities and the importance of data, it’s increasingly more important.

SE: We tend to think of security in terms of what it costs to build a more secure chip. What does it cost to build more secure software? The reality, though, is this is a risk management type of tradeoff. Is that security worth it? What’s the cost of if I get this wrong and somebody hacks into this? And alongside of that, the data has become more valuable.

Chou: A lot of it comes down to cost. But part of the challenge is quantifying security. One of our jobs is to try to quantify the risk with regard to that. With the move to the cloud, customers care a lot about whether or not they can trust that deployment.

SE: What happens when we get to increasingly dense chips or heterogeneous chiplets in a package?

Chou: It’s an extra axis of complexity, and that makes it more difficult. If everything is on a single die, it’s pretty much protected equally. But if you chop it up into four dies, the atomicity is broken. There are different lifecycle states, different debug states, and keys are spread throughout. Your root of trust is now spread over those four entities. That complexity makes it much more difficult to secure. The problem we’re facing as an industry is distributed compute of disparate parts. They’re all executing independently. If one acts as the governor, how does it know the others are secure? That’s the challenge. We can put a root of trust in every single die and encrypt everything, but that adds another level of complexity, which we’d like to avoid. There’s also this move toward attestation, so before I allow you to play a certain movie it checks if you are running latest firmware and whether you have the right ID number. Before I release access, I wanted to attest and trust you, and then come back. As you can imagine it’s multiplied by ‘n.’ With n chips, in the worst case, you would have to attest all of them. This isn’t something we want to do. It’s already complex, and it’s getting much more complex. So as we proceed to high performance, all of this is creating complexity that’s very difficult to defend against.

SE: On top of all of that, chips also are being used for longer periods of time than in the past. Over time, hackers get smarter and more sophisticated, and chips begin to age in unusual ways. How do you deal with that?

Chou: Longevity is a big deal on a number of different axes. But even if you assume no physical attacks or aging, we have to deal with post-quantum cryptography sometime around 2030, where companies with a quantum computer can attack classic cryptography. If you have a car with a lifespan of 15 years, you want to be able to cover that even if it’s being shipped today. We have to look forward to things like that. From a security perspective, the risk of bricking is a big problem. So certain things will break down, but if keys break down, then the chip is dead.

SE: And with cars, the longer it’s been in the market, the more data it has generated or acquired. It has details about what you’re doing in the cabin, where you’ve driven, how many times you’ve been in the shop. How do you protect that data?

Chou: These systems are complex, and they may be talking to a mother ship to share that data. But cars are basically supercomputers on wheels. You’ll typically sync it with your phone, and your phone knows quite a bit about you. A car will have all your contact information and your driving information. This adds another challenge because in places like France, you have the right to be forgotten. If you’re watering the lawn in your front yard, and someone drives by with a camera, they can’t share that video externally. So there are all these concerns with regard to privacy, both for the user and for the people outside the car.

SE: Because there are so many components going into these designs, and these designs are so complex, it’s difficult to understand how data is moving through these systems. How do you keep track of it? Do we need AI monitoring AI?

Chou: There is tracking, and there are security applications using AI. We’ve got a networking division that is monitoring networks. That’s one application. We also are using AI to improve manufacturing processes, and to create better architectures. Using AI to improve security in chip design is still relatively nascent, though.

SE: Another source of data leakage is when engineers change jobs. Can you architect a design so that people don’t understand enough pieces to apply that somewhere else?

Chou: The first order requirement is to design systems so that, even if you knew everything, it’s still secure and resilient. We’re seeing some of that with things like open-source root of trust, which will raise the bar across the board to at least a minimum. That will help in the data center. It’s scalable, certified, and reviewed, and it supports use cases that our customers are interested in. Of course we also want to protect our IP, but that IP is mostly crypto, so it should be shareable and still impervious to attack. The goal is it should be thoroughly reviewed, with a good baseline for everyone in the ecosystem.

SE: So what happens when people do leave for another job at a different company?

Chou: That’s a challenge. Security professionals have a good career ahead of them, because there’s a lot more security work to do, and not enough capable people to do it. Security architects and researchers are in very high demand right now, and we’re trying to keep them as happy as we can and hire more. But we’re also trying to make sure their designs are resilient even if they know about the design. And there are cases in which you really want to make sure that access is on a need-to-know basis. But that’s kind of a secondary or tertiary thing. We really want to make our designs secure by design.

SE: We’ve always assumed that data at rest is more secure than data in motion. Is that still true?

Chou: Data at rest is not secure. We need to secure data at rest and data in motion. But there’s also a post-quantum threat we need to think about. Even though there is not a quantum computer today that attacks classical crypto, people are storing data today that’s encrypted, thinking that whenever a quantum computer comes along, it will decrypt everything. So even data at rest is not secure. It may be today, but in 10 years it probably will not be. There’s also this notion of confidential compute, and we’re working to secure data in motion. So if you look at video games, cheating is one factor that kills a game. That’s requires protection of data in motion, and it means CPUs and GPUs are working together and protecting data from being stolen. In the cloud you’ve got multi-tenancy concerns. You rent time or a slice of the processor or server, and you don’t want person A to leak data to person B, right? In that case you’re trying to make predictions about data in motion. It’s not just when it’s being stored.

Related Reading
Data Leakage Becoming Bigger Issue For Chipmakers
Increasing complexity, disaggregation, and continued feature shrinks add to problem; oversight is scant.

Leave a Reply

(Note: This name will be displayed publicly)