DRAM Chips That Employ On-Die Error Correction & Related Reliability Techniques


This new PhD thesis paper titled "Enabling Effective Error Mitigation in Memory Chips That Use On-Die Error-Correcting Codes" from ETH Zurich researcher Minesh Patel won the IEEE  William C. Carter Award in June 2022. Abstract "Improvements in main memory storage density are primarily driven by process technology scaling, which negatively impacts reliability by exacerbating various circu... » read more

HARP: Practically and Effectively Identifying Uncorrectable Errors in Memory Chips That Use On-Die Error-Correcting Codes


Abstract: "State-of-the-art techniques for addressing scaling-related main memory errors identify and repair bits that are at risk of error from within the memory controller. Unfortunately, modern main memory chips internally use on-die error correcting codes (on-die ECC) that obfuscate the memory controller's view of errors, complicating the process of identifying at-risk bits (i.e., error pr... » read more

What Designers Need to Know About Error Correction Code (ECC) In DDR Memories


As with any electronic system, errors in the memory subsystem are possible due to design failures/defects or electrical noise in any one of the components. These errors are classified as either hard-errors (caused by design failures) or soft-errors (caused by system noise or memory array bit flips due to alpha particles, etc.). To handle these memory errors during runtime, the memory subsyst... » read more