What Makes A Chip Tamper-Proof?

Identifying attacks and protecting against them is still difficult, but there has been progress.

popularity

The cyber world is the next major battlefield, and attackers are busily looking for ways to disrupt critical infrastructure.

There is widespread proof this is happening. “Twenty-six percent of the U.S. power grid was found to be hosting Trojans,” said Haydn Povey, IAR Systems’ general manager of embedded security solutions. “In a cyber-warfare situation, that’s the first thing that would be attacked.”

But not all attacks are software-based. Some are very physical. In particular, the Internet of Things (IoT) represents a huge number of new ways to get onto sensitive networks. “The IoT market isn’t talking about tampering. But because there are so many new IoT devices, especially for industrial, there has been an increase in physical attacks,” said Mike Dow, senior product manager of IoT security at Silicon Labs. To address this, anti-tampering features are appearing on a broad range of chips.

Protecting secrets
Security for connected devices involves cryptographic functions for encrypting messages and ensuring that all parties in any communication are who they say they are. But such functions require cryptographic keys, certificates, and other artifacts, some of which must remain secret to be effective. Attackers have increasingly turned to physical attacks in an attempt to retrieve these secrets and defeat the security. The purpose of anti-tampering efforts is to protect those secrets.

In some cases, however, the goal may not be to steal secrets, but rather to disable or sabotage a system. Alric Althoff, senior hardware security engineer at Tortuga Logic, provided one example. “Per FIPS [Federal Information Processing Standard], an RNG [random-number generator] needs to have a built-in health test. You can use [electromagnetic fields] to flip as many bits as possible. The RNG health test will then fail. After some number of failures, the device will disable itself.”

The disabled device is no longer allowed to perform cryptographic functions because of a loss of entropy, and the system it supports no longer can function properly. “This takes the RNG, which is intended to add to security, and makes it an attack vector,” said Althoff.

Physical attacks require the attacker to have physical possession of the unit being attacked. If the attack is successful, then the attacker will have gained access to that unit. For this reason, it’s very important that each unit have its own unique keys, because cracking one device won’t then reveal the secrets in other devices.

Having access to the one device isn’t worthless, however. Often it’s not the device under attack that contains something valuable. Instead, that unit is simply a way to get onto a network for access to more valuable assets elsewhere.

Direct physical attacks
Tampering usually has one goal, which is to extract cryptographic keys by any means possible. The most obvious approach is to break open a chip to find evidence of stored keys. This might come from visual clues in memory, by sensing voltages at key points in the circuit, or by physically altering active circuits — even if only temporarily.

One common tool is the focused ion-beam tool, or FIB. It allows precise sensing and drilling in a manner that’s much more surgical than the older approach of delaminating the entire chip layer by layer. Done carefully on an unprotected chip, power can remain on so that critical metal lines can be probed. Depending on the type of memory used, the attacker may try to inspect stored values either visually or electrically.

Of particular recent concern is the use of lasers. By focusing a laser on a diffused region, the photoelectric effect can cause an associated node to change state. If combinatorial, then it’s likely to be a temporary “glitch” that, timed correctly, can then be captured in the following flip-flop. Such an event is referred to as a “single-event transient,” or “SET.”

It’s also possible to attack the flip-flop directly, causing it to flip state. This is referred to as a “single-event upset,” or “SEU.” Once you can change the value of a node or a flip-flop, you have much more ability to probe the rest of the circuit to see how it responds.

Side-channel attacks
Some non-invasive attacks don’t probe directly into the circuits. Instead, they rely on information that leaks from the circuit while it’s running. These so-called “side-channel attacks” involve analysis of the power line or of electromagnetic radiation to give clues about what’s happening internally. As unlikely as it might sound, it is possible to derive keys by watching these artifacts as computations involving the key take place.


Fig. 1: A conceptual setup for executing a differential power analysis (DPA) attack. Source: Wikipedia

Fig. 2: A power trace indicating the difference between no multiplication (left peak) and multiplication (right peak) while processing an RSA key. Source: Audriusa. Recorded by student in ETH (Zurich) during system security laboratory work. Wikipedia

Finally, a separate class of tampering attempts relies on the hope of finding some anomalous behavior that can be leveraged to leak secrets. Referred to as “fault injection” or “fault induction” attacks, odd voltage or logic glitches have unexpectedly resulted in the loss of secrets. It’s not just random “fuzzing” that’s used, but “specific patterns, glitches, fast/slow, over- and under-voltage, or speed,” said Michael Chen, director of Design for Security at Mentor, a Siemens Business. Such attacks can be harder to anticipate by chip designers. “[These] vulnerabilities are more difficult to address because they are either intentionally designed-in, unintended bugs, or truly never-seen-before attacks.” he added.

Strong RF signals also can cause unexpected behavior. The user manual of the ChipSHOUTER tool says that, “Changing magnetic fields cause induced currents in the [device under test], resulting in changing voltage levels on internal signals. These changing voltage levels can cause incorrect read (or write) operations, affecting results of latches, registers, and more. Corrupting memory, resetting lock bits, skipping instructions, and inserting faults into crypto-graphic operations are all applications of EMFI [electromagnetic fault injection].”

Side-channel attacks types are numerous. “Not all side-channels are power/electromagnetic, as they could be timing or bus monitor, register, cache or memory attacks,” Chen said. “There must be million other ways that we have not thought of yet.”

Protecting the case
Countering these attacks is new for Internet of Things (IoT) devices, but they’re well established for chips in point-of-sale (PoS) systems. Used in the payment card industry (PCI), these units have secure elements (SEs) that have been locked down tight by regulation and by the expectations of the major financial institutions.

These rules require that the case be tamper-proof, that any internal chips containing credit card data also have a tamper shield, and that any cables connecting a credit-card reader to the PoS system also be shielded. These techniques are now migrating to more prosaic chips, and they are also being applied at the system level. “If you can’t get the case open undetected, then it’s hard to get at anything else,” said Silicon Labs’ Dow.

At the system case level, metering companies — electrical and water — have been implementing tamper protections for years to keep people from artificially reducing their bills by rolling back the meters. “The metering industry is used to tamper-proofing. Medical is [also] starting to look at it,” said Dow. Outside those industries (including PCI), it’s not so common to find case-level protections. These can be added by mounting a few switches in places, where opening a case will change the state of the switch. That can alert the system, which then can engage countermeasures.

This security measure requires a source of power even when the system is unplugged. Coin-cell batteries are typically used for this, and PCI regulations say they must last two years — with enough power for a one-time event of responding to a tampering attempt.

If tampering at the case level is detected, then the system inside the case must take defensive measures. Exactly what those will be are determined by the designer. But in general, the tamper-alert signal must be available to the CPU, possibly through a general-purpose I/O. “There are several architectures,” noted Brent Wilson, senior applications manager of IoT products at Silicon Labs. “With one CPU, you can detect the breach and respond via software. With two CPUs, one of them secure, you have an input pin for events, and the unsecured CPU can send a signal to the on-chip SE.”

Protecting chips
Once inside the case, individual chips with sensitive contents must evade tampering attempts. Some countermeasures are physical, while others involve active sensors. Even providing evidence of tampering can be helpful. “Tamper-evident technology often is used to protect the packaging, labeling, seals, markings, and physical security,” said Chen. “Everything from watermarking, heat- and UV-sensitive [materials], and shatter/glass materials are used to leave visible marks of any physical tamper attempts.”

While hardware roots of trust (HRoTs) tend to operate primarily at boot-up to ensure a clean system, they shut down after that task is complete. “It takes about 100 ms to do what’s needed,” said Scott Jones, managing director of embedded security at Maxim Integrated. While protecting against boot-up attacks may take design effort and code or circuitry, it’s unlikely to be a further concern for battery-powered devices. “Crypto operations aren’t done constantly, so there’s no real power budget issue,” said Jones. Tamper-detection sensors, by contrast, always must be on so they can detect a breach at any time.

One technique for protecting sensitive signals is to bury them underneath other metal layers. Some chips will shield the circuit using metal that doesn’t carry a functional signal. These days, that metal is also active so that it can detect whether it’s being removed. DC signals on the metal are common, while state-of-the-art meshes may use AC signals instead.

Some designers may intentionally run other functional lines — or even gates — above sensitive circuits. Others may include purpose-built lines, or involve strategically placed dummy metal fill. In all of these cases, the metal mesh above the circuits both helps shield the circuits from eyes and probes and keeps electromagnetic radiation from escaping.

If memory is used to store keys, that memory needs to be of a type that can’t have its contents visually inspected — and which also resists attempts to detect cell contents electrically. Stored keys may be encrypted, but then a key is needed to decrypt those keys. This can be an area where a physically unclonable function (or PUF) can create the “root” key that enables all of the other security functions.

Photon sensors also are becoming more common. The idea is to detect any attempts to “probe” the circuit with a laser.

Defeating side-channel attacks
Many countermeasures are available to thwart side-channel attacks. These may help to discourage both the stealing of secrets as well as the general reverse-engineering of the chip. “The goal is to insert at least one more countermeasure into a product than your adversary is willing to overcome,” said Scott Best, Rambus’ technical director of anti-counterfeiting products. Approaches include:

  • Run critical code for short periods of time and then power down to minimize the opportunity for analyzing chip behaviors while that code is running.
  • Run “reverse” instructions — instructions whose effect will neutralize the effects of the intentional instructions.
  • Add dummy instructions to muddy the internal signatures.
  • Deliberately add noise to obfuscate any electromagnetic signals.
  • Turn on the true random number generator (TRNG).
  • Monitor everything, including the power distribution network, critical voltages and currents, the clock, signal timing, memory-access speed, radiation, and heat.
  • Use dual-rail transition logic (DTL). This involves a signal that would normally run on a single line being split over two lines. Rather than each line indicating the static value being transmitted, a transition (in either direction) on one line indicates a 1 while a transition on the other line indicates a 0. This ensures balanced transitions, keeping the power from indicating which value is being used. There are a number of variations on this technique.

Responding to tampering
The next obvious question is, once tampering is detected, what to do then? A range of responses are available, and the point at which each level of escalation happens will vary by designer and application. No response at all is one possible action. The next level might be to inform the application so that it can take action. At the next level, a system reset may be generated to provide a clean boot from which to start over. If the severity of the breach is high enough, the nuclear option is to “brick” the system: make it permanently unrunnable.

Bricking the system could be done by erasing keys or portions of keys. It could involve destruction of the data used to recreate a PUF output. In any of these cases, either a pre-provisioned key or a “native” (i.e., PUF) key is being eliminated — a step that is irreversible. Theoretically, it’s possible to re-provision (or re-enroll) the device — in other words, no physical destruction may be involved — but those remedies would mean putting the unit back into the manufacturing flow, so for the purposes of an attacker the damage is permanent.


Fig. 3: One example of a decision tree for resolving tampering issues. The left shows aspects of the chip or the SE that must be monitored, and the right shows the response according to the level returned by the trigger. Some clarifying details: “Filter Counter” refers to a feature that counts events triggered in a filter against spurious events; DCI stands for “Debug Challenge Interface”; “Decouple BOD” refers to the brown-out detector on the main VDD (which requires a big decoupling cap), and “Secure Lock” refers to the lock on the debug port. Source: Silicon Labs

The big risk in all of this lies in creating a response that’s too sensitive, resulting in accidental bricking. “The only downside to case detection is that they can be sensitive, making it tough to deal with in manufacturing,” said Dow. The events that lead to bricking need to be filtered carefully, since it’s an irreversible step. Environmental extremes — like very cold temperatures — or events like dropping the unit must not trigger a tamper alert. This makes it a challenge to strike the right balance — detect true tampering while not misinterpreting other events as tampering.

For devices with built-in anti-tampering circuits, it may be possible to configure the tamper responses. If that configuration is stored in some kind of reprogrammable memory, an attacker might try to change that setting before attempting an attack. Silicon Labs’ Wilson said that Silicon Labs stores its configuration in one-time programmable (OTP) memory, so once the configuration is set, it’s permanent and can’t be altered.

Avoiding fault-injection attacks
Fault-injection attacks are much harder to design against, other than acquiring best practices based on prior failures. However, it is possible to do a certain amount of pre-silicon verification using fault-injection tools. “A fault injection tool can generate single vs. multiple faults, static vs. transient faults, and global vs. local faults,” said Zongyao Wen, R&D director in the Verification Group at Synopsys.

Understanding susceptibility is important. “Susceptibility analysis can be performed during design, but because much can go awry during implementation and manufacturing, testing also must occur on the finished parts as the final word,” said Steve Carlson, director, Aerospace and Defense Solutions at Cadence. “It’s always better to find the problems during design rather than after manufacturing.”

Chen concurred. “If a specific attack surface is identified and a high-security asset needs to be protected, simulating attacks is very possible. Just identifying the security need is half of the battle.” That said, developing a security threat model is an important prelude to any verification or testing, but it may be too easy to get carried away with elaborate scenarios.

“We’re seeing a divergence between realistic attacks, which can be dead simple, and academic attacks, which can be very clever and novel,” said Tortuga Logic’s Althoff.

IAR Systems’ Povey agreed. “The challenge is not to over-engineer these things,” he said. “The reality is that security is largely common sense.”

Althoff said that model also should blend with a safety model, because some tampering incidents may be accidental rather than intentional. “With security, we’ve been focused on intentionality. In real life, people screw things up,” he said.

Physical testing
Because simulation can’t guarantee tamper-resistance, new silicon must be tested before it ships. This used to mean hiring someone experienced at trying to break into chips — and that’s still an option. But there also are tools available to put the chips through their paces and find any weaknesses. One popular open-source tool is called ChipWhisperer, and it can perform a number of these side-channel attacks for very little money. It doesn’t apply attacks that use strong RF signals, so there’s a (more expensive) ChipSHOUTER box that can handle that type of attack.

More sophisticated commercial-grade tools also can be acquired from IP and chip companies. “There are other tools, but they are commercial-grade ones,” said Rambus’ Best. “For example, the Rambus Differential Power Analysis Workstation (DPAWS) is a white-hat product for testing the robustness of power-supply information leakage. ChipWhisperer is more of a hobbyist-level product for doing ‘binary’ testing of a device. ‘Is it broken or not?’ vs. ‘How robust is it?’”

These may do more than ChipWhisperer, but they can also be harder to use. “Chips can always be analyzed electrically to give up their secrets,” said Best. “However, attacking a chip with … ‘fully-invasive’ FA [failure analysis] tools is expensive and requires mastery of difficult commercial equipment. This scenario greatly increases the skills needed by the attacker.”

What appears clear from all of this is there is no one right way to make a chip tamper-proof. Like so many security issues, decisions must be made early in the design phase based on the expected attack model. Cost and power must be balanced against the consequences of a successful attack. And there is a laundry list of potential protections, some or all of which can be implemented in a given design.

If looking for a secure chip, there’s also no simple checklist that applies to all devices. Cost is a consideration, and more money pays for more protection. “What to look for depends on what they’re willing to pay,” said Maxim Integrated’s Jones. “For [a chip that costs] less than 20 cents, there’s not a lot possible. For 50 cents, one can do some of it. For $5, one can do top-of-the-line.”

Specific techniques also will vary depending on the type and scope of system. “[Tampering with an] embedded microcontroller … used for a single off-line function with firmware in ROM is very different from a server multicore CPU with cache in the cloud. Even when the same Intel/AMD processor is used, if it’s in a consumer PC vs. server-farm vs. military application, the secret/high-value assets that need to be protected are very different,” said Chen.

No industry standards exist, and even discussions of large classes of protections are hard to find online. In reality, this business proceeds with some amount of obscurity — and that’s intentional. No company wants to brag about their great protections with any specificity, because it simply will invite an attacker to work harder.

“In many devices, how the mitigation is used or implemented is hidden,” said Chen. “Marketing or documenting the technique used can give too much information to the adversaries.”

Related
Security knowledge center
Making Sense Of PUFs
What’s driving the resurgence of physically unclonable functions, and why this technology is so confusing.
Hardware Attack Surface Widening
Cable Haunt follows Spectre, Meltdown and Foreshadow as potential threat spreads beyond a single device; AI adds new uncertainty.
Determining What Really Needs To Be Secured In A Chip
Experts at the Table: Why previous approaches at security have only limited success.
Security on our YouTube Channel



5 comments

Partha Thirumalai says:

Nice article! Hardware Secure elements and TPMs may add a strong level of security making the overall system tamper-resistant(Not tamper proof of course, nothing is ), but of course, increases the overall BoM costs and if area/form factor is restricted not possible.

rarchimedes says:

As multilayer chips become the rule rather than the exception, metal layers that are nonremovable should become the standard, also. Write-once memory can be inserted at critical points in the system to disable the system, but should not normally be used unless absolutely necessary, because disabling may be the target of the attackers.

Michael says:

MU uses proprietary Authenta to defeat attacks. I am curious how this is achieved?

Hans Schmitz, Microchip says:

Bryon,
Well written and clear article, from various sources.
Hans schmitz, Microchip Tech.

Piyalee Behera says:

Well evaluated and cleared concept
Piyalee Behera

Leave a Reply


(Note: This name will be displayed publicly)