Securing DRAM Against Evolving Rowhammer Threats

A multi-layered, system-level approach is crucial to DRAM protection.

popularity

Advanced process nodes and higher silicon densities are heightening DRAM’s susceptibility to Rowhammer attacks, as reduced cell spacing significantly decreases the hammer count needed for bit flips.

Rowhammer exploits DRAM’s single-capacitor-per-bit design to trigger bit flips in adjacent cells through repeated memory row accesses. This vulnerability allows attackers to manipulate data, recover sensitive information, and crash processes or systems. First identified in 2014, evolving Rowhammer variants continue to target DRAM, successfully bypassing security techniques such as error correction code (ECC) and transactional row refresh (TRR).

Fig. 1: DRAMs on a DIMM, with corresponding mapping of row addresses and DRAM banks. A RowHammer attack can flip bits in the same victim row in multiple DRAMs, overwhelming ECC protection. Source: Rambus

Fig. 1: DRAMs on a DIMM, with corresponding mapping of row addresses and DRAM banks. A RowHammer attack can flip bits in the same victim row in multiple DRAMs, overwhelming ECC protection. Source: Rambus

Effectively protecting DRAM against Rowhammer requires a multi-layer, system-level implementation of robust security techniques, from encryption and obfuscation to enforced data isolation and advanced error correction schemes. This is easier said than done, however, as countermeasures can potentially impact power, performance, and area (PPA). Engineers should therefore evaluate PPA-security tradeoffs alongside key features and components at the start of the design process.

A top-down, system-level approach to securing DRAM
“Security is always a cat-and-mouse game, and the evolution of Rowhammer attacks and defenses is no different,” said Nicole Fern, senior security analyst at Riscure. “Researchers have demonstrated successful Rowhammer attacks on commercial DRAM modules employing both TRR and ECC, recovering TLS signing keys in several cryptographic libraries (Amazon-s2n (CVE-2022-42962), WolfSSL (CVE-2022-42961), and LibreSSL (CVE-2022-42963). Many speculate that real-world attacks are imminent. For countermeasures, the question should not be, ‘Will they ultimately be able to counter Rowhammer attacks in general?’ Rather, the question should be: ‘For a specific system and threat model, is the attack effort greater than the value of the assets being targeted and costs of a successful attack?’”

Traditionally, only PPA tradeoffs are considered during the silicon design process. However, recent hardware-based attacks, including Rowhammer, Meltdown, and Spectre, and those exploiting DVFS features to inject faults from software—such as clkscrew and Plundervolt—highlight the importance of prioritizing security during the design process. “Often, it is new features added for performance that create a foothold for attacks,” explained Fern. “As DRAM technology [nodes] shrink over time, with density and performance improving, susceptibility to Rowhammer increases. [Engineers] need to be aware of this effect and proactively design in appropriate countermeasures — with thorough testing ensuring these perform as expected as DRAM technology evolves.”

Jason Oberg, co-founder and CTO at Cycuity, agrees. “Hardware susceptibility is a key component of a larger chain of weaknesses used to exploit vulnerabilities. Rowhammer, a physical attack that’s done remotely, is one of those easy-to-exploit vectors, because if you can flip or modify a bit, you can chain that together with other software-based exploits. In isolation, it may be less of an issue, but in the context of a bigger strain of weaknesses that someone is exploiting, it’s problematic. Many systems vulnerable to Meltdown and Spectre, for example, are also points of concern for exploits like Rowhamer. You wouldn’t worry about these attacks on your smart light bulb or robot vacuum, but I would be concerned about my phone or laptop.”

To address these concerns, various encryption and obfuscation techniques have been proposed to protect DRAM from Rowhammer attacks. “If you encrypt or obfuscate your data, and then someone hammers a row and causes bits to flip, they won’t be able to target a specific bit,” Oberg explained. “They won’t know what the specific bit is. Whereas if it’s just plain text and it’s like a supervisor bit and they know where that supervisor bit is, then they can be very direct with what they’re doing.”

Although these techniques are crucial, Oberg emphasized that security considerations must be part of the design process, starting at the architectural level. “If I’m building a chip using licensed IP, I need to take a step back, analyze its function, and determine the assets that need to be protected,” Oberg noted. “From there, you can license a hardware-based root of trust. Maybe you trust one and not the other, even though it’s cheaper. These are the kind of decisions you should drive at the top level, and then try to manage as best you can without having full control of everything in your supply chain.”

Analyzing a system holistically also allows the design team to reduce the impact of security mitigation on PPA. “If you jump straight into saying, ‘I am concerned about memory,’ then you’re already very isolated,” he said. “If you start picking at each of the weaknesses independently, then the overhead goes up a lot higher because there may be an overlap between [mitigation techniques]. So you should take a higher-level view. It’s important to look at that top level and then drive your security program from that level. If you drive it from the bottom up, you’re going to have huge overheads, a lot of complexity, and you’re going to have problems.”

Ultimately, Oberg sees a combination of system-wide hardware and software solutions, paired with strict access controls and enforced data isolation, as a more effective method of countering exploits like Rowhammer. “In any multi-tenant or shared environment, containers are needed to isolate data. Data should also be assigned, for example, to processor thread A where it can’t be read by another thread. Of course, it can’t just be software. Foundation-level hardware protections are required. Otherwise, software protection will be subverted.”

Siloing processes and tagging memory
Kos Gitchev, senior technical market manager at Cadence, pointed to Arm’s confidential compute architecture (CCA) and memory tagging extension (MTE) as examples of a multi-layered, system-centric defense strategy against various attacks and exploits, including Rowhammer and RAMBleed. CCA ensures data protection during processing by isolating or siloing computation in a secure, hardware-backed environment, while MTE tags memory allocations with metadata that is verified during runtime operations. Although not specifically designed to counter Rowhammer or RAMBleed, both mechanisms help protect against such exploits.

“A Rowhammer attacker can’t say: ‘Well, I’ve taken over the machine and I want to go read this memory,’” Gitchev explained. “If you don’t have the appropriate MTE tags for your process, then you won’t be able to read it. The system will basically block it.”

To protect data held in DRAM, 128-bit or 256-bit AES encryption is also essential. “This is generally done by the memory subsystem, not the DRAM itself,” Gitchev noted. “Blocks of data will come in, they’ll get encrypted, and then pass to the memory. If anything happens to the encrypted data, it won’t properly decrypt. Encryption is almost always done in conjunction with ECC, so there are almost two layers of protection when you implement this scheme.”

Gitchev emphasized that encryption is only effective if keys are properly managed and secured. “A memory subsystem does the encryption. It has the algorithm and adds the XTS extension. Even when you write two blocks of the same data, they’ll look different on the bus to the memory. Of course, all of this can be overcome if someone compromises the encryption key.”

AES encryption can be added without major PPA penalties, making it an optimal choice for memory subsystems. “There are many different encryption schemes out there, but AES is easiest to implement,” said Gitchev. “Adding encryption, however, does increase the number of gates and power. To be fair, most of the memory subsystem power goes into driving the interface [for transferring data off-chip to the memory and back]. There is also a little bit of performance and area cost. The memory subsystem is now bigger because it needs to execute complex mathematical calculations for encryption and decryption in real time without significant latency.”

Tightly coupling encryption and decryption ciphering functions inside the DDR or LPDDR controllers facilitates maximum memory efficiency and lowest overall latency. “When doing both functions separately, certain functionality may have to be repeated, such as bus interface logic or support for read-modify-write operations,” said Ruud Derwig, system architect, solutions group at Synopsys. “When tightly integrated, the scheduler inside the controller can request encryption and decryption at the most optimal times, for example, when overlapping other controller operations or while waiting for data.”

Rowhammer and its variants aren’t necessarily the primary drivers for memory encryption solutions that require secure key management. “Inline memory encryption (IME) is mainly intended to defend against cold-boot attacks and provide confidential compute features,” Derwig said. “For example, a newly created virtual machine (VM) or process may get access to physical memory pages used previously by another VM or process when memory is not erased first, compromising the confidentiality of that previous computing context. With proper key management, IME mitigates these compromises. Or, when the hypervisor itself cannot be trusted, confidentiality of user data is still guaranteed by using different IME keys for different privilege levels and VMs.”

Nevertheless, IME contributes to Rowhammer attack countermeasures, as post-encryption data in the memory appears random to attackers. “Certain data patterns — rowstriped or checker patterns, for example — give the highest success rate for row hammering,” Derwig elaborated. “Moreover, when a single or a few bits are flipped, this is amplified to a full 128-bit decrypted block getting random data, so exploiting bit flips becomes much harder. When there is no attacker control over the changes, it is more likely to get detected by causing malfunctioning. IME [also offers] cryptographically strong integrity protection that mitigates bypassing less strong ECC protection.”

The cycle of Rowhammer attacks and countermeasures will continue as new vulnerabilities are identified and addressed. “Multi-level defenses and mitigations, such as hardware design of memory chips and memory controllers, as well as system software mitigations in hypervisors and operating systems, are needed to [counter] evolving threats,” Derwig added.

Bolstering DRAM reliability in data centers
Although Rowhammer can target any device equipped with DRAM, protecting the data center remains a priority for the semiconductor industry and many security researchers. “New memory used to debut in high-performance PCs and then move into servers,” said Steven Woo, fellow and distinguished inventor at Rambus Labs. “These days, new memory technologies debut for AI [applications] in data centers. The concern is, ‘What if somebody gains access to many servers in the data center and launches programs that intentionally try to repeatedly activate addresses?’ If enough bits flip and can’t be corrected, it could cause what looks like a large hardware fault. You might have to take down memory channels or a machine.”

While the risks of Rowhammer and other exploits in the data center are well known, the semiconductor industry may need more time to comprehensively bolster DRAM security and reliability at the design and system levels. “If you go back 25 or 30 years, nobody was really that concerned about power,” Woo stated. “You can dissipate the heat. You just burn a little more power to get more performance. But today, power is a first-class design parameter that everybody thinks about. Reliability is in that same place that power was in the 2000 to 2005 timeframe, where people are starting to realize, ‘Well, wait a minute, things aren’t infinitely reliable. We’re now going to have to consider DRAM reliability as a first-class design parameter.'”

As DRAM process geometries continue to shrink, electronic engineers will need to develop new or improved architectures and techniques that resist deliberate and repeated errors caused by attackers. “And the tradeoff is, ‘What are you willing to pay to do that? Is there a performance hit? Is there an area hit? Are we storing lots of extra bits?’ In 10 years, we’ll look back and we’ll be talking about reliability in the same way that we talk about power today,” he said.

Bolstering DRAM security and reliability without significantly impacting PPA was the primary driver behind the development of Rambus Labs’ RAMPART: Rowhammer mitigation and repair for server memory systems. Essentially, RAMPART mitigates Rowhammer attacks and improves server memory system reliability by remapping addresses in each DRAM, confining bit flips to a single device for any victim row address. When paired with existing error detection and correction methods, such as single-device data correction (SDDC) and patrol scrub, the system successfully detects and corrects bit flips. To effectively minimize mitigation overhead, RAMPART employs BRC-VL, a variation of DDR5’s bounded refresh configuration (BRC).

Fig. 2: RAMPART row address mappings produce unique neighbors, so Rowhammer attacks have different victim addresses in each DRAM. (a) Circular left shifts of controller row addresses based on unique DRAM IDs are shown. The tables at the bottom illustrate how controller row addresses map to internal bank rows in each DRAM. Row addresses 0x0000 and 0x0001 are bolded to highlight increasing separation with larger shifts. (b) Hammering controller row address 0x0001 flips bits in controller row addresses 0x0000 and 0x0002 in DRAM 0, but controller row addresses 0x8000 and 0x8001 in DRAM 1. A subsequent read to controller row address 0x0000 sees errors only from DRAM 0 that can be corrected with SDDC ECC. Source: Rambus

Fig. 2: RAMPART row address mappings produce unique neighbors, so Rowhammer attacks have different victim addresses in each DRAM. (a) Circular left shifts of controller row addresses based on unique DRAM IDs are shown. The tables at the bottom illustrate how controller row addresses map to internal bank rows in each DRAM. Row addresses 0x0000 and 0x0001 are bolded to highlight increasing separation with larger shifts. (b) Hammering controller row address 0x0001 flips bits in controller row addresses 0x0000 and 0x0002 in DRAM 0, but controller row addresses 0x8000 and 0x8001 in DRAM 1. A subsequent read to controller row address 0x0000 sees errors only from DRAM 0 that can be corrected with SDDC ECC. Source: Rambus

Assuming 70% area utilization and conservative routing, RAMPART reaches a speed of 2.85GHz in an area of 3910µm², or roughly 51K NAND2 gates. For a server with 1,024 banks, the total area required is only 0.1251mm². “We did a sample implementation at TSMC’s 7nm process, showing RAMPART’s small [footprint],” Woo said. “The controller side of it that does the tracking and figures out how often to issue a mitigation operation is very small, just a few gates. It’s very reasonable to implement something like this in a memory controller, and it has no die size impact as far as we can tell. There’s no latency impact on the accesses. It’s a very simple remapping change. And the DRAM is already doing remapping, so it’s not like asking for a new function. It’s simply modifying an existing function.

Conclusion
The continued proliferation of new and improved Rowhammer variants highlights the critical importance of implementing multi-layered, system-level countermeasures to protect DRAM, alongside of other key components and features. These should encompass a wide range of security techniques, from encryption and obfuscation to advanced error correction, address remapping, and data isolation. Still, to fully optimize performance and minimize latency, PPA security tradeoffs must be assessed from the top down at the start of the design process.

Related Reading
How To Stop Row Hammer Attacks
What it is, and why this has become such a significant security issue.
Power/Performance Costs Of Securing Systems
Security requires significant overhead, but it is no longer an option to ignore it. Cybercriminals will continue to exploit weak components.
Developing An Unbreakable Cybersecurity System
New approaches are in research, but threats continue to grow.
DRAM Choices Are Suddenly Much More Complicated
The number of options and tradeoffs is exploding as multiple flavors of DRAM are combined in a single design.



Leave a Reply


(Note: This name will be displayed publicly)