New regulations make this non-negotiable, but multi-die assemblies and more interactions at the edge are creating some huge hurdles.
Looming financial penalties for data breaches are forcing chipmakers to confront end-to-end security, an increasingly complex and daunting problem because no single company controls all the pieces anymore.
This is especially apparent in multi-die assemblies, in use today in data centers, and under consideration in automotive and other applications. Multiple chiplets can push performance well beyond what is possible with a single reticle-sized SoC, while adding flexibility about which features to include in a device and at what process node. Those chiplets can even come from different foundries, and they can be integrated into a customized advanced package. But they also make it much more difficult to keep track of all the components in the supply chain.
And that’s just for starters. Chiplets may age at different rates under different workloads, which can create new vulnerabilities that were never considered in the design process. And because some of those components are unique and new — particularly logic developed at 3nm and below — there is little or no contextual information about what can go wrong, such as silent data corruption.
“This is one of the problems of taking a design and disaggregating all the components into a set of heterogeneous chiplets,” said Scott Best, senior director of silicon security products at Rambus. “In the past, at least it was vertical. There was one company that owned the chip and the finished product. That company was responsible for characterization data, manufacturing and test data, and software updates. So even though the logistics of putting together the software bill of materials and the hardware bill of materials were really complicated in state-of-the-art SoCs, it was all an internal effort. And if you have an engineering staff of 1,000 people, you can figure this problem out. But if you take that state-of-the-art SoC and break it up into 10 heterogeneous chiplets, all manufactured by different people, your problem scales by a factor of 10. It becomes 10 times more complicated because now you’re dealing with 10 different vendors, potentially with incompatible systems.”
Large chipmakers typically track data about a chip from design through manufacturing, and for some applications, into the field. “They can look at the unique ID on that chip and trace it all the way back to who was the operator on that wafer sort equipment on that day on that wafer, because the chips during that person’s shift may not have been correctly screened,” Best said. “And now you can proactively go find the bug in your manufacturing flow and figure out which other chips were associated with that and where they are in the world, contact those customers, and arrange for some return before there’s a field failure. Other companies do that, too. They have terabytes of data they can look at. But it’s completely incompatible.”
This is one of the key reasons why the commercial chiplet marketplace concept has been slow to launch. Even companies that tightly monitor every step, from design through manufacturing, lack consistent data. And with chiplets from multiple vendors, as well as designs that are customized for specific workloads — and which stress different parts of a multi-die assembly in unique ways and under different operating conditions — trying to figure out where the potential security risks can crop up is a massive challenge.
“The whole ecosystem is becoming much more complex,” said Dana Neustadter, senior director of product management for Security IP Solutions at Synopsys. “If you have one rogue chiplet, you can undermine not just one family of products. You can undermine multiple families of products. They can end up in totally different devices and categories. So you need the capability and process and means to check that you’re doing the right thing, from when you build the silicon to the end product through the whole supply chain and lifecycle.”
This requires a deep understanding of what’s at risk, as well as what may be at risk in the future. And for products with long lifecycles, there needs to be a way to plug security holes that may crop up years later.
“There are some things where you have to be more diligent, rather than creating something novel, building trust throughout the development process, the definition, and then throughout the supply chain and the whole lifecycle,” Neustadter said. “You need to start with security by design, going through the threat modeling and the security requirements. What’s important to protect and why? A unique identity that is built into silicon will be fundamental for building end-to-end security. But then you have all the other layers that you have to build on top, too.”
Erik Wood, senior director of cryptography and product security at Infineon, breaks end-to-end security down into two sections. “One is that I make a wafer, and I have a part in finished goods inventory that I ship to a customer. In that section, we allow provenance to mitigate attacks and make sure that what we’re providing a customer is secure. So, we’ve got a cryptographic identifier that lets a customer verify that when they’re doing their programming, it is, in fact, an Infineon chip. And then, we have provisioning tools that a customer uses after they load their software and shift the part into a secure lifecycle state, which locks all their security down and obfuscates their keys, and does all that stuff at the same time. We are verifying that there was no firmware loaded in the supply chain. It’s a programming provisioning step at their factory. We call it an entrance exam. We verify the chip has no additional firmware that it’s not supposed to have.”
Cloud vs. edge
End-to-end security isn’t just about what’s inside a chip or what software is running on that chip. It also involves what it’s connected to, which usually isn’t known until it’s being used in the field.
Data centers have strong perimeter security, but plugging in different components, such as PCIe cards or replacement power supplies, can open the door to malware or Trojans. This is the whole idea behind the Security Protocol and Data Model (SPDM), a standardized but limited approach to verifying those components. Any new hardware can be scanned using a digital image and a pre-determined test sequence.
But the story becomes much more complicated in multi-die devices, and at the edge, where devices are meant to interact with other devices over the internet, not all of which have the same security. This is the impetus behind the EU’s Cyber Resilience Act (CRA), which establishes heavy financial penalties for failure to report and fix breaches within a short period of time.
“There was a big lag in terms of security awareness because there was no one way. Nobody said, ‘You have to do it like this,'” said Yan-Tarō Clochard, chief marketing officer for Secure-IC, a Cadence company. “CRA makes it mandatory for the customer to know what their security problem might be, to anticipate their supply chain, their operations, and how they will think about security during the lifetime of the product. For the entire end-to-end chain, it forces everybody to think about how sensitive their assets are and how they are going to use them over time.”
Still, there is a cost associated with security, and in price-sensitive markets, that cost can be prohibitive for the application.
“We have chips that we make that are security purpose-driven parts, meaning their sole source of existence is security — passports, payments, secure elements, TPMs (trusted platform modules),” said Infineon’s Wood. “And then you have things like MPUs, MCUs, Wi-Fi, Bluetooth, which are more broad chips that have security as a part of it. But those don’t exist today because of security. They exist for all the features that they bring. And, by the way, you want those features to be secured. But after looking at, interviewing subcontractors, and seeing what it costs to go into their secure rooms for manufacturing with all the additional costs, we estimate that an MCU we sell today for $2.50 would cost $5, and that became a non-starter. So then what you do is guarantee-ish. You mitigate problems such that you can claim provenance, meaning I can assert this is an Infineon chip cryptographically. And second, you break down what the real attack vectors are, end-to-end. You prove it’s not a clone and ensure no Trojan horse firmware has been loaded somewhere along its lifecycle for the manufacturing process.”
What comes after that is more challenging to manage. More data from more sensors and AI is being moved between more devices, particularly at the edge. So, in addition to hardware security, the data that moves between various pieces of hardware needs to be tracked, as well.
“Connectivity is central to everything, especially for AGI (artificial general intelligence) and almost everything that’s moving to the edge,” said Venkat Kodavati, senior vice president and general manager of Synaptics’ Wireless Products Division. “You need to transfer that data fast, either to the data center or between the devices. Let’s say you have AR glasses and a phone. There has to be much faster communication with lower latency. There are more video capabilities. There is voice. And now, people are coming up with lots of variables. So we’re talking about multi-modal communication. There are a lot of things happening all at once, and they need the highest levels of security. That includes PSA Level 3, secure boot, and all these things we include, so no one can break into it. It has to be very secure, and at the same time, it needs to be ultra-efficient. So we’re building in advancements on the communication side, especially on the energy consumption side, with security in mind.”
No silver bullet
The higher up the food chain, the bigger the end-to-end security challenge. Safety-critical applications, such as robots, automobiles, or military applications, can include thousands of different components, not all of which are developed in-house. And because of the long lifecycle of some of these complex systems of systems, it’s also likely that at least some of those companies will not be in business by the end of the vehicle’s expected lifetime.
The key is understanding exactly what the most likely parts are that will need to be replaced or fixed, and making projections about what will likely change, or need to change, starting from the date a device was manufactured.
“In our discussions with the military, aerospace, and defense industries, a vehicle may have a 40-year lifespan, and half of these parts will be replaced three times by different vendors before the 40 years are up,” said David Fritz, vice president of hybrid-physical and virtual systems, automotive, and mil/aero at Siemens EDA. “How do we know they will actually work? If you’re going to provide a part, there needs to be a model for that part. ‘Here are the requirements for the model. It’s got to plug into our digital twin. We need to verify that empirically to be a second or third source.'”
The problem isn’t confined just to the chip. An OEM is responsible for all the components on a PCB, as well. “It’s not just your own silicon on that board anymore,” said Maarten Bron, managing director at Keysight EDA. “You could have BCMs (automotive body control modules) and memories and storage controllers from third parties in there, as well. So if a manufacturer is going to vouch for the security of a system, they’d better have a lot of visibility into the supply chain, as well as the ability to trust your vendors. It’s like security at scale. In data centers, they scale up, or hyperscale. And from a security assurance point of view, it’s what we call ‘security at scale.'”
Factoring time into security
Time is a critical dimension in end-to-end security when it involves devices with long lifespans, and it’s not just the hardware. That includes encrypted data, which could be valuable at some point in the future. As quantum computers become more accurate and as the coherence of qubits improves, the time needed to breach current encryption schemes is expected to shrink by orders of magnitude.
That needs to be addressed early in the design phase, which is a challenge in itself because the quantum technology at this point is largely academic, and no one is quite sure how it will ultimately affect security or how to plan for it. Quantum security isn’t a one-time fix. It will be an ongoing challenge.
“Automotive is really mature in this respect,” said Sylvain Guilley, co-founder and CTO of Secure-IC, a Cadence company. “They have TARA — Threat Analysis and Risk Assessment — which is the way they specify cars and envision the future, because automotive chips must be kept functional and up-to-date and safe, usually for more than 30 years. And when we assess a risk, it includes the outcome of a quantum computer breaking everything. There are also factors like the likelihood and impact and topology of the attacker, because clearly there won’t be a proliferation of quantum computers at first. Is a nation-state interested in attacking a car? That’s one question. But when we analyze the risk, we also need to consider what’s at stake. Is it financial? Is it reputation? The carmakers don’t want to recall millions of cars for an update, or even worse, to change parts. That would be a complete disaster, but it’s weighted with a low probability.”
AI’s role
AI adds another dimension to this picture. On the negative side, AI agents can hunt for security holes and never give up trying to find weaknesses or watching for changes. On the plus side, those agents can be deployed to plug holes, as well.
“AI agents are very native at finding some of the security loopholes that humans are not very good at,” said William Wang, CEO of ChipAgents. “It’s very difficult for a human to look for a particular combination that will break a design. But having AI look at corner cases, based on all the training data in the past, is a very good use case. It can look at all the design specs to see if there’s a loophole. It can look at RTL code and a testbench. It can look at the whole stack. It’s not just about the RTL or hardware. It’s also the embedded systems, because there are a lot of security bugs and issues with drivers. Part of the reason is that drivers don’t actually follow the hardware specifications. You can connect the driver and the C code with the hardware specification to allow you to complete a better system with fewer vulnerabilities.”
At the same time, AI agents themselves need to be tightly controlled, because agents can talk to each other independently. “That’s the nuance,” said Eddie Ramirez, vice president of marketing for Arm’s Infrastructure Business. “When you do confidential compute, you assume that you don’t want these containers to talk to each other. You literally want to separate them. The memories are encrypted. When you talk about agents, you have a lot more coordination between them. That doesn’t mean they still can’t have their own security silos. It just means that an agent now has to pass information and state to another agent. We can leverage a lot of that, but there isn’t a formal standard on that. Discussions are going on in the Confidential Computing Consortium and in OCP. At some point, all of those will gear toward one place to set a standard.”
Conclusion
Security has moved from what was largely an afterthought to a mainstream consideration for every facet of an electronic system, from initial concept and design, through manufacturing and packaging, and into the field. Looming financial penalties, exclusion from partnering opportunities, and potential damage to a company’s reputation pose serious threats to any company that doesn’t take security seriously. That almost certainly will lead to lots of finger-pointing and weed out any companies that fail to add sufficient security.
At the same time, the ability to fully secure a device is becoming more difficult, particularly with multi-die solutions, increased connectivity at the edge to more devices, and an almost endless series of updates to software and firmware.
Whether end-to-end security is truly possible anymore isn’t clear. But there certainly are more reasons why companies should strive to achieve it, as well as more hurdles preventing them from doing so.
Related Reading
Scalable End-To-End Test Solutions For Today’s Complex SoCs
Left-shifting DFT, scalable tests from manufacturing to the field, enabling system-level tests for in-field debug.
Leave a Reply