Knowledge Center
Navigation
Knowledge Center

Error Correction Code (ECC)

Methods for detecting and correcting errors.
popularity

Description

Error correction codes, or ECC, are a way to detect and correct errors introduced by noise when data is read or transmitted.

ECC includes a wide array of mathematical ways to deal with errors. The most common type uses Hamming codes, which can correct one error and detect two errors. This “single-error-correct, double-error-detect” approach is often abbreviated SECDED. The second generation of ECC can correct a whole device, while the third adds internal ECC.

In memory, the principal purpose of ECC has been to correct for noise that may randomly occur while reading. The strength — and hence the size and cost — of the ECC block will depend on the number of bits to be corrected and detected.

In general, the more capable the approach, the more computationally expensive it is. If done in hardware, that means more silicon area. If done in software, that means more CPU cycles.

While noise can show up anywhere, ECC also can correct deterministic errors, such as those caused by faulty cells. This makes it possible to develop a design and test strategy that leverages some of the ECC bits as a way of handling faulty cells rather than repairing them outright. Given three-bit corrections, for example, one could elect to use two of those bits for repairing errors and one bit for run-time reading noise. This creates some room for push and pull between the amount of sampling done for trimming, the amount of repair capacity in place, the size and strength of the ECC, and the amount of noise to be corrected.

ECC and DRAM

How ECC is applied to DRAM depend on how the memory chip and controller interact. There are four main approaches.

Fig. 3: Four types of DRAM ECC. (a) Side-band ECC, where codes are stored in a memory chip separate from the data. (b) In-line ECC, where the internal memory of each chip is divided between data and code. For both (a) and (b), ECC work is done in the controller. (c) In-chip ECC, where the data as read is checked with ECC before being sent to the controller. By itself, this doesn’t catch transmission errors. (d) Link ECC, which catches transmission errors, but by itself doesn’t detect array errors. (c) and (d) need to be combined with each other or another technique to provide end-to-end coverage. Source: Bryon Moyer/Semiconductor Engineering
Four types of DRAM ECC. (a) Side-band ECC, where codes are stored in a memory chip separate from the data. (b) In-line ECC, where the internal memory of each chip is divided between data and code. For both (a) and (b), ECC work is done in the controller. (c) In-chip ECC, where the data as read is checked with ECC before being sent to the controller. By itself, this doesn’t catch transmission errors. (d) Link ECC, which catches transmission errors, but by itself doesn’t detect array errors. (c) and (d) need to be combined with each other or another technique to provide end-to-end coverage. Source: Bryon Moyer/Semiconductor Engineering

The most common approach has been so-called “side-band” ECC. With this approach, each memory chip on a DRAM is fully used to store data. Extra chips are added to the DIMM for storing the error codes. This widens the input bus so the data and code can be written at the same time. The controller is responsible for calculating the code when writing data, and verifying the code when receiving a read value.

While this works for some types of DRAM, LPDDR DRAM needs a different solution because it uses a 16-bit bus. The first concern is this makes for a much larger bus if adding side-band memory. Second, the codes are typically 7 or 8 bits, which makes for an inefficient use of a 16-bit memory structure. This is handled by using the same memory chip for data and codes.

This is referred to as “inline” ECC. The controller has to do two sets of writes or reads — one for the data and one for the code, adding latency to each access. Some controllers can pack multiple codes together for sequential data, making it possible to read or write several at once. If sequential data access is common, that reduces the latency caused by the codes.

In each of the above cases, it’s the controller that handles the ECC calculations. “On-chip” ECC is new with DDR5, and it places the ECC inside the memory chip itself. Single errors can be corrected before being sent to the controller. However, if there is an error in transmission, the on-chip ECC won’t catch it. So side-band ECC may still be useful in conjunction for end-to-end protection.

Finally, “link” ECC protects just the communicated data. It’s calculated at both ends of the link and doesn’t involve any stored codes. On-chip and link ECC could be combined to cover end-to-end.

A cyclic redundancy check (CRC) is another option for checking whether data arrived reliably.

Adapted from More Errors, More Correction In Memories by Bryon Moyer.


Multimedia

Enterprise-Class DRAM Reliability