Whether caused by cosmic radiation, voltage glitches, or adversarial attacks, bit flips threaten data integrity, safety critical operation, and the foundations of hardware security.
Key Takeaways:
The flip of a tiny bit can bring down even the best-architected and most secure systems of systems, and despite decades of awareness, there still is no easy way to stop it.
The recall last fall of entire fleets of Airbus A320 aircraft is a case in point, where solar radiation corrupted data that was critical to the functioning of the airplane’s flight controls. At the chip level, bit flipping is a major challenge because it can cause silent, unintentional data corruption, where a 0 becomes a 1 or vice versa, due to cosmic rays, electrical interference, or hardware aging. As components shrink and density increases, these soft errors can lead to unexpected system crashes, data corruption, and critical security issues like rowhammer.
Bit flipping glitches have long been a topic in hardware security that can bypass secure boot processes or corrupt memory, potentially disable protections, or enable features like debug ports that are usually restricted. Bit flips are dangerous because they allow the extraction of cryptographic keys, allow memory to be dumped, force JTAG/debug, support the “attack once, rule them all” hacking strategy, and can be just one step in an attack plan. What is new is that complex systems are more vulnerable to attacks due to low voltage and high clock frequency, which equates to increased interference.
“You can easily gain some privileges on the platform, because it’s so simple to tamper with the clock. After all, a clock is just that — something you can manipulate,” noted Sylvain Guilley, CTO at Secure-IC, a Cadence company. “You mess with the clock, so you have one clock period that is shorter, and this will trigger many bad things in the chip. If you are reaching a critical section, you will basically skip an instruction. You’re supposed to read some memory, but because your clock glitch is making the clock shorter, you don’t have time to read, so basically, you don’t read. It’s very easy to bypass this. The effect is not usually to crash the system. It’s to limit the security features when a glitch is injected in the security portion. Glitches have become popular because they’re cheap and effective. I recall attacks on gaming consoles to prevent you from using a game that you didn’t buy. Obviously, this industry has been attacked like crazy, and the easiest attack was to bypass the authentication of the cartridge and the game firmware by glitch, and it worked.”
This may seem complicated, but a processor and its memory are nothing more than a collection of binary circuits. “As those who have studied introductory engineering classes know, ‘binary’ means that every bit of data in the system is encoded as either a 0 or 1,” said Scott Best, senior technical director, Silicon IP at Rambus. “But in slightly more advanced engineering classes, engineers learn that it’s not just the computer’s data expressed in binary terms. It’s the computer instructions, as well. For this reason, some ‘bit flips’ are substantially more impactful than others. For example, if the computer is calculating ‘13 + 0’ and a bit flip causes the operands to become ‘13 + 1,’ no great harm might befall the system. But if the bit flip causes the instruction to become ‘13 ÷ 0,’ all sorts of havoc might occur. For mission-critical systems, including safety systems and cryptographic ones, one can understand why the most important data and instructions require protection against bit flips.’
Chip architects and design engineers should be familiar with bit flipping because of rowhammer-type attacks, which can have security implications such as corruption of page tables and even key extraction. “Specifically for the memories in today’s technology, they are evolving a lot,” noted Dana Neustadter, senior director of product management for Security IP Solutions at Synopsys. “DRAM cells are shrinking more and more, and that makes it even more challenging. As the technology of DRAM evolves, because there are smaller capacitors, the noise margins become lower. So this is not going away.”
Just how vulnerable today’s advanced automotive and aerospace chips are depends on the memory and CMOS technology on which those chips are built. “The latest CMOS processors, etc., are less susceptible because they are using finFET transistors, which perform better than standard planar transistors,” said Helmut Puchner, vice president and fellow, Aerospace & Defense, at Infineon Technologies. “When it comes to memory type and level of susceptibility, the more logic content they have, the more vulnerable they are to functional interrupts caused by particles that can only be recovered by power cycling. DRAMs and DRAM-based technologies like GDDR and HBM are typically relatively stable for the memory cells but have high logic content. Any functional interrupt will cause failures that cannot be recovered from in real time and will require power cycling.”
DRAM is particularly sensitive to radiation and will be sensitive to X-rays, but it’s also sensitive to cosmic-type radiation events. “In a DRAM are bit cells that store a 1 or a 0, consisting of a capacitor and a transistor,” noted Christopher Egan, principal product development engineer at Nordson Test and Inspection. “The X-rays that I’m interested in affect the transistors, and they cause leakage across them. The transistor is like a tap. You switch it on and off, and if the tap is leaking, it isn’t going to work properly, but they can tolerate a certain amount of leakage over time. The actual bit of information, whether it’s 1 or 0, is stored as charge on this capacitor, whether it’s full or empty. The capacitor will lose charge over time, but it also loses more charge over time if the tap is leaking. In the simplest terms, X-rays cause the tap to leak more. Eventually, if there’s so much radiation damage from X-rays that this tap is just free flowing, it doesn’t work anymore, it doesn’t store charge properly anymore. In an aviation scenario, they’re more susceptible to these individual events — protons, neutrons, and cosmic rays. They smash in and deposit a whole load of charge. If this capacitor was empty to start with, which meant it was a 0, then suddenly the load of charge appears, and it becomes a 1.”
That is effectively a bit flip. “It can happen in other places, as well, where there is a huge deposit of charge at the wrong time,” said Egan. “It’s random. If you’ve got more aviation applications coming in, these things are happening a lot, and it does change the memory read of that cell, and then it happens to all the cells, depending on how much radiation is going in. The ultimate thing is that it doesn’t work properly. Airbus will no doubt have X-ray equipment and will want to inspect its electronics. They’re interested in not giving their electronics too much X-ray radiation, so that this tap is working. When the plane is flying, they’re concerned about these cosmic ray radiations, neutrons, protons, etc., that are causing these individual events, and which caused the bit flips in the Airbus A320.”
Security risks caused by bit flipping
No matter what specific memory they are working with, electrical engineers and designers of SoCs need to be aware of mitigating and protecting the data and memory systems’ base. “Bit flipping can occur elsewhere,” said Synopsys’ Neustadter. “It can be specifically related to networks. It can be specifically related to the CPU or the firmware. For example, we’re talking nowadays about AI/machine learning, and flipping one bit in a neural network-type weight can help insert a backdoor. Then, you can affect models themselves and the integrity of those models. That means the types of attacks and the bit flipping have implications that can affect the security of the system. It can bypass cryptographic-type operations. It can also influence and be relevant for recovering key secret material, open back doors, or even bypass secure boots. As such, it is something very important that designers need to take care of, and there are a lot of mitigations to be considered.”
In fact, there are many aspects to security and cryptography where an accidentally flipped bit — due to random power supply noise or cosmic ray exposure, etc., for instance, or even a maliciously flipped bit, due to an adversary’s ‘glitch’ attack on the system’s power supply, or due to precision laser-based fault-injection attacks, can lead to either a benign hiccup or disastrous result, depending on exactly where in the digital circuit the bit flip occurred.
“Though it is exceedingly difficult for an adversary to cause a specific bit flip at exactly the right circuit at exactly the right time, a determined and well-funded adversary has both the time and expertise to research this attack. It might take weeks or even months to analyze a system sufficiently, all for an attack that takes microseconds to take effect,” Rambus’ Best said. “In one possible, perhaps worst-case example, an adversary could cause a bit flip in the ‘Authentic or not?’ check that was about to prevent a piece of malware from executing. In this imaginary attack example, the bit flip would cause the check to pass instead of fail, allowing the malware to execute in the system, potentially permanently modifying its security strength in subtle but critical ways going forward.”
In the automotive sector, there is the infamous case in which researchers Charlie Miller and Chris Valasek hacking a Jeep Cherokee used glitching in one of their attacks.
The big issue with glitches is that they can be induced without direct physical contact, and while it is common to assume that glitching requires the use of probes such as an oscilloscope to interface with a system, it is possible to trigger a fault remotely. “One common method is through a rowhammer attack, which involves repeatedly accessing certain memory rows,” said Secure-IC’s Guilley. “This process exploits inherent memory vulnerabilities. Despite high reliability standards (e.g., 99.999%), persistent hammering may eventually result in a bit flip.”

Fig. 1: Examples of types of remote execution of bit flipping. Source: Secure-IC, a Cadence company
These vulnerabilities highlight the evolving challenges that engineers and designers face as technology advances. As systems become increasingly complex and exposed to both accidental and intentional threats, it becomes crucial to distinguish between random errors and deliberate attacks, ensuring robust safeguards for both safety and security.
Marc Witteman, senior director of Device Security Testing at Keysight EDA, explained that semiconductor chips work much like humans, thriving within specific environmental boundaries. “For people, comfortable temperature and sufficient oxygen are crucial. We perform best when these parameters are met, and our effectiveness drops outside them. Devices like chips also have key environmental limits. Staying within them ensures proper function, while going beyond can cause errors that start as safety problems and may evolve into security threats. Safety deals with accidental issues, while security involves intentional attacks. We mainly test for security issues — intentional faults — whereas industries like aerospace or automotive focus on safety to guarantee their products survive harsh conditions. This is critical in aerospace due to mission importance and exposure to extreme factors, such as low oxygen and high radiation, which affect both humans and electronics.”
Unintentional chip errors occur randomly and less frequently, while intentional errors happen at precise moments, often exploited by attackers. “For example, a payment chip verifies your PIN, which is a single atomic event that, if manipulated, could allow theft,” Witteman said. “To counter unintentional errors, triple redundancy is used: every process step is performed three times and then voted on, tolerating one mistake. This redundancy can be applied to processes of varying lengths, but shorter steps require more frequent voting, increasing system load. The longer you wait before voting, the lower the impact on performance.”
As error rates escalate and both accidental and deliberate threats become more prevalent, traditional approaches like triple redundancy may no longer suffice. This growing complexity demands a careful reevaluation of mitigation strategies, prompting industry experts to explore advanced solutions that address the limitations of current methods while balancing safety and security requirements.
“I don’t think triple redundancy is false, as it’s a solution that can be adapted to the error rate,” Witteman noted. “If you have a very low error frequency, then you only need to check at regular flow intervals. But if you have a high rate, you need to check more often. That could be the solution, checking this more often. If your error rate gets very high due to high radiation intensity, then you should take more of a safety/security approach rather than just a safety approach.”
There is, however, a difference between safety and security. “The difference is that with safety, you want to defend against incidental, accidental problems,” he said. “You take your solution and bring it into an environment where there are random problems over time. Then you test it for a certain amount of time. If you see that all the problems that occurred over that period of time are nicely mitigated, then you’re going to be happy. But if it’s a statistical process, you wait for some time trying to see if any issues that randomly happen are all mitigated, and then you’re okay. That doesn’t prove that you would mitigate all issues. It proves that you mitigate a certain number of issues over time. When it comes to security, we know that the problems are generated intentionally, so we can’t satisfy ourselves by saying, ‘Okay, let’s just do some random testing and hopefully cause everything.’ An intelligent attacker would target the exact moments where I’m most vulnerable, which means I should know exactly when those moments are and test what happens if a problem occurs at that time. You could also brute force the problem and test everything. If I test everything, then I’ve also tested my most vulnerable moments, so that would be the security approach.”
For chip architects, these challenges demand a disciplined approach to designing resilient systems. They must not only anticipate accidental faults but also proactively address sophisticated attack vectors, ensuring that both safety and security considerations are integrated from the earliest design stages. This perspective is crucial as the industry moves toward implementing layered defenses and robust error mitigation strategies.
Preventing bit flipping
Approaches to bit flips have evolved over the decades. “Some years ago, I was developing turbo codes, which are a form of forward error correction capable of correcting a huge number of errors, and during the tape out of a chip, its SRAM was experiencing bit flips, and our turbo code kept fixing these errors so efficiently that we couldn’t initially identify the source of the flips,” recalled David Garrett, vice president of technology and innovation at Synaptics. “Our robust error correction was masking underlying yield problems. Today, for bit flips, we use ECCs — error correcting codes — for resilient computing. In some caches, we implement SEC-DED (single error correction and double error detection) to safeguard critical areas against such bit flips.
Strategies to prevent bit flipping vary by application, but more is typically better. Secure-IC’s Guilley believes defense in depth provides the strongest protection. With only one technology, even error correcting codes cannot detect an error that will turn a code word into another code word, and therefore have a limitation.
“Everything has a limitation. Even lockstep has a limitation. So the best way to drastically reduce bit flips is to combine multiple detection mechanisms, like physical detection on the memory,” Guilley said. “We know error correcting codes. That’s pretty simple. On top of that, you get, for instance, some integrity of the control flow. Usually, we call that CFI (control flow integrity). The more protections you stack, the more you are sure that the residual faults will vanish, which is what we call defense in depth.”
Preventing bit flips starts in the circuitry where the most mission-critical algorithms are executing. “For any type of security or cryptographic calculation, it simply requires a fraction of the instructions to perform a calculation by utilizing hardware-based algorithm accelerators, as opposed to performing the algorithm in software via a general-purpose processor circuit,” Rambus’ Best said. “For this reason, hardware-based security is considered to have a substantially smaller attack surface than software-based approaches. That being said, there is no one silver bullet. Security engineers are combining hardware-based fault-detection, redundant calculations, and multi-bit secure comparisons to reduce the efficacy of single-bit disturbances within their circuits. Additionally, it is no less important to design the general-purpose processing circuit and memory subsystems with both performance and resiliency in mind. That is, these circuits can be built to expect they’ll be executing software in an imperfect environment with occasional bit flips, both of the accidental and adversarial variety.”
Error correction code (ECC, mentioned above) is another approach that, while common, doesn’t solve all the problems of bit flipping. “It can help with the integrity of the data to some extent, but it doesn’t solve the problem,” Synopsys’ Neustadter noted. “Common techniques, such as targeted row refresh, can reduce bit flips and help maintain integrity, though they do not fully address security issues or perform memory scrubbing. While memory scrubbing is used, hardware security also requires focusing on confidentiality and stronger integrity protection, where encryption and cryptography-based solutions are essential. Encrypting data ensures that if someone gains access to it, they cannot understand it, making data confidentiality a crucial concern addressed through methods like AES-XTS-based cryptography.”
Additionally, various algorithms, such as AES-GCM, go beyond just securing data to ensure integrity. “Secure enclaves are used for access control, key generation, and management, along with fault detection mechanisms and zero polarization,” she said. “These are established techniques, but their effectiveness depends on consistent and organized implementation, often described as security by design. These considerations are especially important for electrical engineers and SoC designers who work with systems vulnerable to bit flipping. Addressing these threats requires hardware-level security measures to protect both confidentiality and key integrity.”
Big picture, if systems are not designed without considering the environment or trust assumptions, increased complexity makes them more vulnerable. “With smaller voltage supplies, even minor disturbances can cause errors, so decoupling capacitors are needed off-chip to prevent IR drops and other issues, even at the clock pads,” Guilley said. “These help sharpen the signal edges and so on. However, within the chip itself, the signal is quite weak due to the small voltages and tiny transistors involved. Everything inside is susceptible to malfunction unless the environment is ideal, so it is only reliable under favorable conditions.”
To further improve safety protection, it is essential to avoid dependent failures. Even systems employing lockstep — which duplicates or triplicates subsystems — can be vulnerable if all copies run synchronously, as simultaneous failure remains possible. “Consequently, safety mechanisms are typically designed with slight offsets to reduce the risk of correlated failures,” Guilley said. “That ensures that if one is getting faulty, then the other one will be faulty, but in a different way. And there will be a mismatch that can be observed. For security, we need to go much further beyond that because it cannot be just that we could use the probability of an attack. Nobody will trust your system to be secure if you just say, ‘My probability of failure is half.’ That’s nothing for an attacker. We are using many technologies, such as dedicated sensors.”

Fig. 2: Illustration of a sensor detecting temperature, voltage, and other modalities of a failure. Source: Secure-IC, a Cadence company.
These layered safeguards highlight the ongoing challenge of balancing hardware and software strategies to mitigate bit flips and security vulnerabilities. As engineers implement increasingly sophisticated error detection and response systems, understanding their limitations and unintended consequences becomes critical.
Conclusion
For chip architects and designers, bit flipping should be treated as a foundational design assumption rather than an exceptional failure mode, influencing decisions from power margins and memory architecture to control logic and security boundaries. As faults increasingly blur the line between reliability events and exploit vectors, resilience must be engineered into the architecture itself through intentional redundancy, observability, and fault‑aware execution paths.
The call to action is clear — designing silicon that expects faults, exposes them early, and responds deterministically — because in advanced nodes, unmodeled bit flips are no longer just errors. They are gaps in the design.
Very nice article thank you. I would like to point out that bit flips also occur due to Single Event Upsets (SEU) caused by charges resulting from radiation (packaging material, space, ,…). IROC Technologies has EDA tools to analyze these under different conditions, to help the designer mitigate the effect and increase the FIT rate.
I hope that at some point, side-band ECC becomes mandatory for consumer memory (a new standard like DDR6). It could raise all prices by 12.5%, but what does that matter after people lived through 5-10x memory price increases?