Bug, Flaw, Or Cyberattack?

Tracking the cause of aberrant behavior is becoming a much bigger challenge.


The lines between counterfeiting, security, and design flaws are becoming increasingly difficult to determine in advanced packages and process nodes, where the number of possible causes of unusual behavior grow exponentially with the complexity of a device.

Strange behavior may be due to a counterfeit part, including one that contains a trojan. Or it may be the result of a cyberattack. It also may be due to complex interactions between heterogeneous components in a certain sequence — basically, a corner case that produces a silent data error. It may even be caused by process variation, which can create a latent defect that creates an open or short in one or more regions of a design. No matter what the cause, all avenues need to be explored.

Of all these possibilities, counterfeiting is the easiest to identify, and it’s an area where there has been significant progress.

“There hasn’t been a new or emerging counterfeiting approach over the past several years except recycled, cloned, re-marked, over-produced, and forged,” said Mark Tehranipoor, chair of the ECE department at the University of Florida. “Hence your solution is based on the fact that one of those five scenarios is going to happen, and you develop solutions for it. Security, however, deals with the intelligence of an attacker. By the time you think that you’ve figured them out, there may be a new vulnerability in the systems. Or you hear on the news that a new vulnerability showed up that you never thought about during the security assessment. Security is a cat-and-mouse game. Counterfeiting is not, because we know the different types of counterfeiters. They’ve been doing this for the past 20 years.”

Mike Borza, Synopsys scientist, noted that the University of Florida has been working on technology to authenticate a chip right on the tester, which is especially important in multi-die/multi-chiplet implementations. “This lets you get away from one kind of counterfeiting, if you can do that accurately at high volume,” he said.

Other approaches include PUFs (physically unclonable functions), which are just one aspect of a solution. “A PUF is good once you have the die identified and you’ve been able to initialize it,” said Borza. “But you can’t just turn on a PUF and use it as a way to validate that you’ve got the right chip, because every PUF looks similar.”

There are statistical measures that might allow a PUF to be used as a weak identifier of the die that it is an authentic copy of the design. The problem is those require a huge amount of data to model what the PUFs should do, and there really isn’t a lot of data about PUFs over a substantial volume of chips, such as how the PUF changes from one chiplet or die or wafer to the next.

“People don’t have enough data to know how those things change, but they know that there are some effects,” Borza said. “You control that PUF with helper data, because the helper data has real random characteristics created from a random data stream and elsewhere and married to whatever is in the PUF. Still, before you do that initialization, the PUF itself will have systematic variations that need to be managed, and that creates an opportunity to identify something unique to that design.”

There are other issues, as well. Each chip is unique, and what works well for one chip may not be ideal for another. Infineon, for example, has developed about 30,000 different chips, but some are so small they cannot be individually marked for identification purposes. Best practices can vary greatly, as a result.

“We do contact our customers through associations such as the Component Obsolescence Group Deutschland (COGD), which is a European association, or the International Institute of Obsolescence Management (IIOM), an international association with local chapters, in which many of our customers are members, where we can raise awareness of the risk of purchasing on the open market,” said Konrad Bechler, security consultant, brand protection and anti-counterfeiting at Infineon Technologies.

Attack vectors
Counterfeit chips may cause security breaches, as well, which is a problem that will be magnified with the introduction of third-party chiplets.

Alan Porter, vice president electronics and semiconductor segment at Siemens Digital Industries Software, advises engineering teams to think about the permutations when they start adding dies, chiplets, or interposers from multiple sources before they’re assembled into their package. “There may even be different substrates and different materials in them, which just expands the complexity. The transparency there is paramount,” he said. “What’s important here is, regardless of who is in that supply chain, maintaining processes and protocols are paramount. If we focus on that, and we’re doing what we need to do to protect the integrity of that supply chain, then we can perhaps mitigate the geopolitics and those concerns. I’m not saying we can get rid of them, but certainly we can mitigate it. In an earlier part of my career, I worked for a very large OEM, and that was part of what we needed to do. When you’re working with companies like a Foxconn, even TSMC has concerns given the situation. It’s really about gaining trust by transparency, making sure that the flow of data is controlled, and that the digital footprints and things that can be processed and tracked are in place.”

Another piece of this is preventing attacks whenever possible, and dealing with them whenever necessary.

In general, there are two general approaches to combat counterfeit-related attacks. “One allows a chip to reliably authenticate its digital identity, and of course protect that digital identity from being easily cloned,” said Scott Best, senior director of product management at Rambus and director of anti-tamper technology development. “The other approach allows a chip to be securely manufactured, even in the context of a so-called ‘zero trust’ manufacturing flow. This second aspect, known as supply-chain security, is as important as chip security, because the chip itself cannot tell you whether it’s been securely manufactured.”

Within those areas, there also are about two dozen relevant countermeasures. What gets used will vary by application, and the value of the target. For example, chips in printer cartridges and set-top boxes may be intrinsically low value, but they provide a gateway into multi-billion dollar businesses. Other chips may be more complex, but the potential damage will be lower. The higher the value of the target market, the greater the resources applied to prevent attacks.

However, what may appear to be a supply-chain issue may in fact be a hacker leveraging a weakness in a design or system. The more elements in that design, the more difficult it is to identify the source of an attack.

“When it comes to security, you’re always behind because the attacker, who happens to be very intelligent and has a lot of resources, may come up with a vulnerability that security engineers could not find,” said Tehranipoor. “Sometimes companies may need to ship your design for fabrication and into the market quickly with known bugs. The attacker had ample opportunity to attack the chip and expose the vulnerabilities, but unfortunately the designers did not have ample opportunity to fix it.”

Hardware-based attacks typically fall into three categories:

  • Non-invasive — Mostly inexpensive attacks that attempt to extract a chip’s digital identity as quickly and cost-effectively as possible. This includes “basic theft,” which is an effective way of compromising the chip’s supply-chain integrity.
  • Semi-invasive — Slightly more sophisticated attacks, carried out once a non-invasive attack is not proving successful for an adversary. Semi-invasive attacks often are beyond the budget of “garage hackers,” but they are still within the scope of what a university lab might do, and then publish a paper about.
  • Fully-invasive — expensive, sophisticated attacks that go after a chip at the transistor level. This type of attack is typically beyond the budget of a university lab, but well within the capabilities of commercial labs and state-funded actors.

Within those areas, there also are about two dozen relevant countermeasures. What gets used will vary by application and the value of the target. For example, chips in printer cartridges and set-top boxes may be intrinsically low value, but they provide a gateway into multi-billion dollar businesses. Other chips may be more complex, but the potential damage will be lower. The higher the value of the target market, the greater the resources applied to an attack.

“There are about two dozen countermeasure techniques that mitigate the attacks that fall into these categories,” Best explained. “But as the attacks grow in complexity and cost, so do the countermeasures. As a result, a lot of the countermeasures for fully-invasive attacks are only included in the most secure silicon, such as where large commercial revenue streams could be impacted, or where the safety of allied troops is at stake.”

Each chip is unique, though, and what works well for one chip may not be ideal for another. Infineon, for example, has developed about 30,000 different chips, but some are so small they cannot be individually marked for identification purposes. Best practices can vary greatly, as a result.

Variation and other challenges
More process variation at leading-edge nodes makes it harder to understand where an attack is coming from, and that problem is compounded when there are different chips or chiplets in the same design.

The semiconductor ecosystem has dealt with variation for a long time on a statistical basis, which helps design teams understand the probability of a problem that stems from a particular manufacturing lot or wafer. But there are so many new processes, and differences from one leading-edge foundry to the next, that trying to statistically identify every aberration is nearly impossible.

“It’s always challenging every time we go down a node to deal with what’s new about this node, and what’s different,” Borza said. “You can plan at the high level for what you’re going to see, but as you get down to the lower levels you start to see things that are different or that you haven’t encountered before. One of the things that the semiconductor industry has solved better than almost anybody else is dealing with the unknowns as you move into a new era of manufacturing at a new density. As a result, there will be opportunities to try to analyze that. Another opportunity that is inherent in that is for generating random numbers. I look for any source of potential entropy that you can exploit to use for that purpose, and digital design people are not accustomed to thinking about that problem in that way. They don’t view it as something desirable. It’s something completely undesirable for them, but for us, it’s the opportunity to get more, and higher quality, random data.”

One way to approach this is from inside the chip or package, and that works for a variety of issues, from variation to identifying counterfeit chips. “If you have a device that’s made up of chiplets, it becomes ultimately more complex,” said Lee Harrison, director, product marketing Tessent group at Siemens Digital Industries Software. “Embedded analytics technology is able to monitor and create a fingerprint or a token to apply attestation. This means on power-up, instead of having a specific key encoded into the chiplets, which can then be changed and counterfeited, we use attestation, and use this monitor to create a digital fingerprint, which will disappear once the device is turned off. But then it’s run again on power up. That token is basically the fingerprint that gets collected from each of the chiplets on to the main die. Then the Root of Trust authenticates it so that the overall signature for all the chiplets is correct, and the device is what it’s saying it is.”

Solutions and future concerns
Rambus’ Best noted that some countermeasures are aimed at protecting the integrity of the design from the earliest phases of engineering, including when EDA tools are being used by the design team. “For example, the RAMP program in the U.S. has worked with major tool vendors to ‘cloudify’ their platforms, so that design teams can use an engineering flow that has been reliably secured against insider attacks, e.g., when malicious tools or malware is infecting the integrity of the EDA software itself,” Best said.

However, broadly speaking, the technologies and techniques for anti-counterfeiting are not enabled in EDA tools. “Most anti-counterfeiting countermeasures require specific engineering intent and are not automatically included as part of a chip’s design flow,” he said, noting that each situation is unique. “A great deal depends on the value of what’s being protected – is it the revenue from a single chip or is it the entire annuity revenue stream of an OEMs platform? Are there national security concerns at stake? In addition, exactly how is the chip being verified. For example, as a chip is being manufactured, when the equipment has real-time access to an online, cloud-based database, there are many times when a chip is being verified as authentic. That vastly simplifies most considerations, as the ‘secret codes’ which activate a chip in the final stages of manufacture can be withheld if the chip’s provenance cannot be immediately confirmed.”

That said, there are some fundamental best practices which should be included in every COTS design. “For example, every chip should contain a unique device identity where the authentication protocol for that chip is cryptographically tied to the digital identity value,” he said. “Similarly, every chip should utilize a ‘secure lifecycle’ such that a chip simply operates differently when it’s in an early manufacturing lifecycle, compared to how it operates when it’s been deployed into ‘mission mode’ within a customer’s system.’

Additionally, Infineon’s Bechler pointed to other questions that should be raised. “How good can these tools be? Can they detect recycled components being sold as new? What is of interest here is the definition of counterfeit and one of the definitions is that recycled and renewed components are sold as new. This means we also have authorized suppliers who provide longtime storage, so the question is how the tools can find the difference between old and genuine parts and recycled parts.”

As designs become more complex, so does security.

“We keep going up and down through these integration levels, and 3D-IC is a way to increase density, but it’s also a way to disintegrate the chip,” Synopsys’ Borza said, “either because you have different functionality that benefits from putting dissimilar technologies together in the same package, or to break up functions that you can re-use across many integrated packages again and again. That creates an opportunity. It’s easier to reverse engineer or to attack the integrated package than it is to attack an integrated circuit. It’s not as easy as attacking a board-level product, but it’s easier than attacking the die. Some types of 3D packaging are very difficult to penetrate, but in general, as soon as you have to go from one die to another you know that there’s a path between them, and that creates the opportunity. Somebody knows that there’s a path to go and look for.”

The challenge then becomes what caused a problem. Was it a third-party chiplet, a flaw in the design, or some manufacturing or packaging issue? Finding the answer is becoming much more difficult.

Related Reading
Chiplet Security Risks Underestimated
The magnitude of the security challenges for commercial chiplets is daunting.
Security Becomes Much Bigger Issue For AI/ML Chips, Tools
Lack of standards, changing algorithms and architectures, and ill-defined metrics open the door for foul play.

Leave a Reply

(Note: This name will be displayed publicly)