Taming Novel NVM Non-Determinism

The race is on to find an easier-to-use alternative to flash that is also non-volatile.

popularity

New memory technologies may have non-deterministic characteristics that add calibration to the test burden — and may require recalibration during their lifetime. Many of these memories are in development as a result of the search for a storage-class memory (SCM) technology that can bridge the gap between larger, slower memories like flash and faster DRAM memory.

There are several approaches to such a memory underway, including magnetic RAM (MRAM), phase-change RAM (PCRAM), and resistive RAM (RRAM or ReRAM).

“Progress is being made with storage-class memory, but there are a lot of challenges,” said Rick Gottscho, EVP and CTO at Lam Research. “With 3D XPoint, which is phase change memory, the chalcogenides are notoriously sensitive to ambient conditions and process chemistry. There are a variety of technical strategies to deal with all of those things, and it’s in volume production. There is certainly a lot of activity looking for alternatives to that particular storage class solution, and they encompass ReRAM and MRAM, and it’s fair to say we’ll see new devices and new architectures into the future. We’re seeing a variety of new materials, new architectures, and in general our product lines are pretty well suited for dealing with those new architectures and new materials.”

That’s the initial assessment, but things could change. “We may find out as we develop solutions that we need a new product capability or a new feature,” Gottscho said. We don’t envision entirely new product families, though, which were required for 3D NAND. For example, the mold stack deposition of ONON (oxide/nitride) or OPOP (oxide/polysilicon) was really a whole new product line that we created, and we have been evolving it ever since. The high aspect ratio etches were a derivative of high-aspect ratio etch technology we already had. In the case of OP-OP, a new product had to be developed. It’s possible we’ll see something similar in new storage-class solutions.”

While none of these three memory types have yet met all of the desired criteria for SCM, each of them is under consideration for a range of non-SCM applications that can leverage their non-volatility and their ease of reading and writing compared with flash memory.

Spin-transfer torque (STT) MRAM, in particular, is already in production for dedicated memory chips, but it’s also being developed as an embedded memory IP block for use within systems-on-chips (SoCs). While dedicated memory manufacturers can develop in-house techniques to optimize manufacturing quality and reliability, SoC designers must have access to such techniques to ensure that the embedded MRAM (eMRAM) blocks they include in their SoCs will provide the quality and reliability expected of their specialized counterparts.

A hint of random
A new challenge is the fact that programming an MRAM memory cell doesn’t provide a deterministic result. There’s an element of randomness that requires extra steps to ensure reliable reading and writing of the memory.

In fact, this non-determinism isn’t unique to MRAM. PCRAM and ReRAM also have a similar behavior. At first glance, this might look like the normal variation that designers already have to deal with. But that variation comes from manufacturing variability. This non-determinism is due to the specifics of the write process for the memory, not to manufacturing.

“It seems to be a ‘property’ of these new types of non-volatile memories to have a certain degree of non-determinism. It is best explained with PCRAM and RRAM, which have a visible component to this,” said Lori Schramm, product manager for the Tessent Platform and Memory Test, at Mentor, a Siemens Business. “In the former, the change of state (amorphous vs. crystalline) might not always succeed completely, or in the latter, the forming of the conducting filament is not complete or not to the full width. In either case, the resulting cell might not ‘store’ the requested information at all, or might show a ‘weak’ behavior, such as a higher-than-assumed resistance. It must be understood that such cells are not considered defective, since subsequent writing to the very same cell might succeed just fine.”

Parallel or anti-parallel?
An MRAM cell consists of a magnetic tunnel junction (MTJ) that, depending on the state, will have either a low resistance or a high resistance. This junction consists of three layers: a fixed layer called the pinned or reference magnetized layer, the tunnel junction, and the free layer. By running a high current through the junction, the magnetic state of the free layer can be set as either parallel or anti-parallel to the pinned layer. If it’s parallel, the resulting resistance of the cell will be low. If anti-parallel, the resistance will be high. The state of the programmed cell is then read by running a current (one that’s lower than the current used for writing) through the cell and measuring the resistance.


Fig. 1: A magnetic tunnel junction (MTJ) that, depending on the state, will have either a low resistance or a high resistance. Source: Mentor

What causes non-determinism in the MRAM? “MRAM behavior is more subtle [than PCRAM and ReRAM], since the storage of the content is non-visible,” said Schramm. “Public information from literature point to thermal fluctuations in the cell, which then cause a distribution of the magnetization in the free layer, which in turn implies a certain probability of a write error.” She noted that write success can be improved by, essentially, driving things harder when writing. But that increases the risk of cell wear-out, so a balance has to be struck.

Because of the probabilistic nature of the write operation, the measured resistance will land within some range, and that range can vary non-deterministically with each write operation. Reading — that is, deciding whether the resistance is high or low — is done by comparing the measured resistance with a reference resistance.

Figure 2 illustrates the reference resistance. The left side shows an ideal situation. Even though there is a range of resistances for a 0 state and a range for a 1 state, the reference resistance falls cleanly between those two ranges. Reality, however, may look more like what’s on the right, where the as-built reference resistance falls near or within one of the read-state ranges and needs to be trimmed to fall between the two as-built resistance ranges.


Fig. 2: Reference resistance. Source: Mentor

This trimming operation must be done before the memory can be read in any meaningful way. The ability to test the array for faulty cells, and then to repair any such cells, relies on the ability to read and write to the memory with confidence. All MRAM-related test operations rely on the reference resistance being trimmed first.

Tuning the reference
Mentor, Arm, and Samsung are collaborating on a project to develop this trimming capability as a part of the MRAM built-in self-test (BiST) function. Arm has an eMRAM IP block on which this is being developed, with the assistance of chip data from Samsung. The solution will be included in Mentor’s Tessent design-for-test (DFT) offering. The following image (Figure 3) shows their planned architecture, under the control of the memory BiST (MBiST) controller and driven by the test access port (TAP).


Fig. 3: A trimming architecture for an MRAM built-in self-test (BiST) function, planned by Mentor, Arm, and Samsung. The architecture is under the control of the memory BiST (MBiST) controller and driven by the test access port (TAP). Source: Mentor

Successive read operations will determine the trim level across the cells, since a single reference resistance serves multiple cells. Ideally, one would check all cells when trimming the reference resistance, but, as a practical matter, one may use a significant sample in order to reduce the time it takes to determine the trim value. In the Figure 3, the blocks used for trimming the reference are shown in red. The left-most yellow blocks (including BIRA — built-in redundancy analysis) help to implement any repairs needed once trimming is complete.

A given die may have one reference resistance that works for the entire array, or there may be localized reference resistances to accommodate local variations. This is determined by the specific foundry being used to manufacture the chip. If more than one reference exists, each reference must be independently trimmed. The resulting trim value or values are then programmed into the chip as a permanent calibration.

Persistent storage — and renewal
Storing the trim values can, at least in theory, be done using any kind of non-volatile memory cell. Mentor points out that it would most simply be done with eFuses, but conceptually, it could even be MRAM cells. A practical implementation of the latter would need to take into account a possible “bootstrap” issue: if the MRAM cells storing the trim values need to be read before setting the reference levels, then that read might be hampered by the fact that the reference levels haven’t been set yet. Said Schramm, “This might not be technically feasible, but theoretically, it would be fitting if it could be pulled off.”

The challenge with eFuses is that they require high voltages to program, which would typically be provided by test equipment. If the intent is to do the trimming at wafer sort, that high-voltage access could be provided through pads that aren’t bonded out in the packaged unit, making it impossible to apply high voltages later in the device’s lifecycle. Even if the access points were made available, those high voltages would have to be provided in the system.

That’s a critical consideration, since it’s likely that these devices will need to be recalibrated over time. Said Schramm, “The current In-System operation mode of Tessent Memory BiST is also taking control of the trimming circuitry. A new, adjusted reference value can be computed [in a deployed unit].” The challenge is then storing the new values in eFuses if there’s no high voltage available. In addition, if the eFuse technology is one-time programmable, spare eFuses would need to be made available for future trim values.

Schramm also notes that the basic MBiST and repair operations may be executed on each power-up. This isn’t a new requirement driven by MRAM; it’s already common practice for existing memories like SRAM. “Memory test must run at power-on due to automotive requirements. A subsequent repair step does not take much additional time. Executing a trim-search, however, takes significantly more time than a standard MBiST test of a memory of the same size.” That makes it unlikely that re-trimming could be done at power-up (which, if possible, would obviate the need to store the trim values in non-volatile cells). This is one of the issues that will need to be sorted out as this capability is rolled out.

The role and impact of ECC
In addition to the adjustment of reference resistances, the BiST block must also provide repair for faulty cells. Such cells may simply not be operating correctly. Or, if the trimming search wasn’t exhaustive, then they may be cells that weren’t sampled and have resistance ranges that are too close to the reference resistance. The use of error-correcting code (ECC) can affect the choices available for such repairs.

ECC exists to correct (or at least detect) errors encountered when reading the memory. As the reading of leading-edge memories can be noisy, the principal purpose of ECC has been to correct for noise that may randomly occur while reading. The strength — and hence the size and cost — of the ECC block will depend on the number of bits to be corrected and detected.

While noise can show up anywhere, ECC also can correct deterministic errors, such as those caused by faulty cells. Mentor notes this makes it possible to develop a design and test strategy that leverages some of the ECC bits as a way of handling faulty cells rather than repairing them outright. Given three-bit corrections, for example, one could elect to use two of those bits for repairing errors and one bit for run-time reading noise. This creates some room for push and pull between the amount of sampling done for trimming, the amount of repair capacity in place, the size and strength of the ECC, and the amount of noise to be corrected.

The cost and effort required
DFT design flows have evolved to provide a highly automated way of inserting test circuitry. Said Schramm, “The Tessent Shell MBiST solution for SRAMs provides a fully automated flow from DFT planning, through DFT insertion, to pattern generation and validation.” MRAMs would follow that same basic flow. “The only difference is that there is no established Tessent library specification for the data points unique to MRAMs. However, our cooperation with Arm and others will eliminate this ease-of-use obstacle.”

As to the silicon area consumed by the solution, it can vary widely depending on implementation. If there are multiple memory blocks, for example, one could control them all with a single MBiST controller, saving silicon. But that would mean testing the blocks sequentially, which adds to test time. That test time could be reduced by using more silicon to give each block its own MBiST controller.

That additional test time may be manageable for the simple MBiST tests, but trimming complicates matters. “The MRAM requires the addition of the trimming circuit, which is not needed for SRAM. This circuitry contains a few registers and simple adders and comparators, which is not a lot of overhead. The trimming circuitry could even be shared between MRAMs at the same controller to limit the area impact,” said Schramm. But that sharing would have a bigger impact on test time due to the time the trim search takes — especially if done sequentially for multiple memory blocks. The decision, then, would have to balance the cost of extra MBiST controllers and trim circuits against the cost of testing, and it’s a choice each design team would make as appropriate for their SoC.

As these and any other remaining issues are solved for the eMRAM offering, we can likely expect to see some of the work accrue to newer memories as they appear in embedded form.

Related
Building An MRAM Array
Why MRAM is so attractive.
MRAM Process Development And Production Briefing
What is MRAM and why is it becoming more attractive to the industry?
Magnetic Memories Reach For Center Stage
Why MRAM technology works best for connected devices.
The Next New Memories
A new crop of memories in R&D could have a big impact on future compute architectures.
MRAM and STT-MRAM Knowledge Centers
Special reports, top stories, videos, white papers and more



1 comments

Rick Ridgley says:

Bryon, there is a memory called NRAM (Non-volatile RAM) based on carbon nanotube memory cells. It is a true universal memory that does not have the problem you write about.

Leave a Reply


(Note: This name will be displayed publicly)