But one size does not fit all, and fine-tuning is required.
Magnetoresistive RAM (MRAM) is one of several new non-volatile memory technologies targeting broad commercial availability, but designing MRAM into chips and systems isn’t as simple as adding other types of memory.
MRAM isn’t an all-things-for-all-applications technology. It needs to be tuned for its intended purpose. MRAMs targeting flash will not do as well targeting SRAMs, and vice versa. Nevertheless, MRAMs are starting to gain significant traction. MRAM devices are already on the market, with many more implementations coming.
“MRAM device physics allows application-specific tuning by trading off retention for endurance and speed,” said Cyrille Dray, senior principal engineer at Arm. “This leads to flash-like and SRAM-like forms.”
Both the design of the critical magnetic tunnel junction (MTJ) and the surrounding circuitry will have an impact on device characteristics. Designers have knobs they can turn to balance the need for read and write speed, data retention, and endurance. How those choices are made will determine success in their targeted markets.
MRAM basics — two flavors
MRAM uses magnetic polarization to store bits. The state of a bit cell depends on the relationship between two magnetic layers. One of the two magnetic layers — the so-called “pinned” or “reference” layer — has a fixed polarization. The other one is the “free” layer, and its polarization is changed when writing. One state is established when they are magnetized in the same direction (parallel). The other state has them magnetized in opposite directions (anti-parallel).
The magnetic layers are separated by a thin tunnel oxide, commonly made of MgO. Resistance is lower when the polarizations are parallel. So reading the state of the cell means measuring that resistance. “The MR [magnetoresistance] is the change in resistance of the bit between the zero and the one states,” said Daniel Worledge, distinguished research staff member and senior manager at IBM.
This material stack, which is far more complex than the simplistic picture we’re painting, forms the MTJ. Physics and material science determine how the MTJ performs.
Fig. 1: An idealized MTJ. The one on the left is in the low-resistance state, with the pinned and free layers having parallel magnetization. On the right, the layers have anti-parallel magnetization, which results in a higher resistance (and lower current). Source: Bryon Moyer/Semiconductor Engineering
Writing to the cell is accomplished using a higher current, but there are two ways this is done. Today’s best-developed version is called “spin-torque transfer,” or STT MRAM. (There’s an older stand-alone technology called “toggle MRAM,” which is not covered here.) The programming current goes through the MTJ in the same direction as the read current. A newer variant, which is still in early development, is “spin-orbit transfer,” or SOT MRAM. In that case, the current runs “across” the MTJ instead of through it.
Fig. 2: STT MRAM (left) uses current through the MTJ for writing the state. SOT MRAM (right) uses current parallel to the layers in the stack to write the state. Source: Bryon Moyer/Semiconductor Engineering
Four MRAM target applications
MRAM can target four different application areas, the most difficult of which has not yet been achieved. Easiest to achieve is stand-alone memory, which has been around for a while. It has a niche application in replacing battery-backed SRAM and DRAM, as well as for buffering hard drives.
Fig. 3: Four different MRAM applications and their requirements. Note that the “mobile cache” application is for small IoT devices, not smartphones. Source: IBM
Next is embedded MRAM, for use in systems-on-chip (SoCs). It replaces embedded NOR flash memory, largely for code storage. (“MRAM is not going to replace NAND flash ever,” noted Worledge.) The challenge is economical integration with CMOS.
The next two applications both target SRAM, but in different ways. The first reflects a possible unification of SRAM and flash into a single MRAM block for use in small, portable devices that may run on batteries, which is typical of devices in the Internet of Things (IoT). The benefit here is low power, along with a simpler single-memory system. These tend to run more slowly, so access time is less critical, but power savings are far more important. Even so, a 10ns cell read time is possible.
The final application — as yet unachieved — is the ability to replace last-level caches with large quantities of inexpensive, non-volatile MRAM. The benefit is lower power, as well, but performance requirements are very high. Assuming it can be achieved, it’s probably several years away.
Some teams have been working on this for years. IBM considers this its holy grail. “This is our long-term vision at IBM, the reason why we’ve worked on MRAM for 25 years,” said Worledge. “But it’s really tough, and we’re not there yet.”
These applications establish the tradeoffs that must be made for MRAM to target them. A single MRAM will not be successful in all of these situations. Understanding the reasons behind the tradeoffs will help engineers make good choices when designing and when using MRAM.
Four critical characteristics
MRAM shares the same basic parameters that any non-volatile memory (NVM) will focus on — write speed, read speed, data retention, and endurance. The first two are the time it takes to write or read a bit, respectively. Data retention is the amount of time that a programmed bit can be reliably read, which typically is 10 or more years for flash memory. Endurance is the number of times that the bit can be rewritten before it is no longer able to hold its state for the specified retention period.
These four parameters are determined by the design of the MTJ and the surrounding circuits. There are two critical characteristics of the MTJ:
Both of these reflect an energy barrier between the two states. The higher the energy barrier, the more resistant it is to accidental bit flips. But that also means it takes more energy to intentionally flip the bits, requiring a higher writing current.
The tension between these two characteristics motivates the optimization choices. One can make the cell easier to program – less write current means lower power – but that can’t come at the expense of stability when trying to read it. In particular, temperature can cause bits to lose state depending on how the cells have been engineered.
MTJ design is a materials game. With a perfect material you could optimize all four parameters. “If you can keep improving the materials, in principle, you could eventually get everything,” said Worledge. Today there is no such material. That leaves designers with tradeoffs.
Write speed
Writing MRAM turns out to be a complicated process. The fundamental idea with STT MRAM is that electron spin torque can be transferred to the free layer, establishing the magnetic direction. As a process, however, it needs one (or a few) electrons to get started.
Initiation is literally stochastic, so how the process unfolds will be different each time it’s programmed. “You require some thermal fluctuations to give it a little bit of deviation, and then your torque can start to take over, and it starts to precess the MRAM element,” explained Steven Soss, distinguished member of technical staff, non-volatile memory technology solutions, at GlobalFoundries. “And as it starts to precess, you can get more torque involved, and eventually you can switch it. That overall initiation rate is what costs you your time to switch the device.”
The precise timing of this process is unpredictable. Given an array, some bits may program faster than others. Repeating the process will have a completely different set of bits that program faster. It’s not necessarily a function of the bit itself; it’s more about which ones randomly get a faster start.
Faster writing can be achieved by lowering the thermal energy barrier. “There are material things that you can do to make it less thermally stable so that it fluctuates more on its own,” added Soss. “You can also play with the magnetic damping of the material.”
As a further complication, it turns out that writing from parallel (P) to anti-parallel (AP) is harder than the reverse. “To do the P-to-AP transition, it’s actually back-scattered electrons that are causing the torque transfer,” said Soss. “The relative percentage of available electrons to do that is lower.”
In general, the higher the write current, the faster the process will complete. Above a certain critical threshold, it’s linear. “It’s a conservation of angular momentum argument,” said Worledge. “If you double the number of spins you’re dumping, you halve the amount of time.”
But higher current can accelerate the wear-out of the MTJ – particularly for STT MRAM, where the write current goes through the MTJ. The main benefit of SOT MRAM technology is that the write current doesn’t go through the MTJ – it passes parallel to the layers. That means that the current can be set arbitrarily high (within reason) without worrying about wear-out. For this reason, SOT MRAM cells are expected to have much faster write times than STT cells.
Write current also impacts the cell size – and thus the cost. Each cell has a selector transistor, through which the write current must pass. Higher currents require larger transistors, increasing the cell size. “The cell is not really limited by the size of the MTJ itself,” said Worledge. “It’s really limited by the current needed to write the MTJ. That write current is limited by the size of the transistor in the cell.”
A newly proposed vertical selector transistor from Spin Memory may address this by placing the MTJ atop, rather than alongside, the selector transistor, bringing the cell closer to the size of DRAM.
Fig. 4: A vertical selector transistor that can reduce the size of an MRAM bit cell. Source: Spin Memory
While a last-cache replacement MRAM is still years away, IBM, working with Samsung, has achieved 2ns write times with high endurance. They’ve also produced a demonstration array – the first of its kind on 14nm — placed between metal 1 and metal 2 for tighter integration.
Read speed
The time it takes to read a bit is controlled by very different considerations than the write speed. The dominating effects are the magnetoresistance (how tight the polarization distributions are for a single cell) and the memory window (how far apart those distributions are for 0 and 1 states across the entire array). The tighter the distributions and wider the windows, the easier it is to read.
Ultimately, read speed is determined by how much time it takes the sense amplifier to integrate the current it detects. Low current takes longer, especially if there is noise and variation. Higher current makes reading faster. “The bigger your TMR, the bigger your memory window is, then the less integration you’re sensing has to do,” said Soss.
If the read current is too high, however, then it starts to act like a write current and may change the value of (or “disturb”) the state of the bit. So there’s a limit to how much current can be used for reading. Even with SOT MRAM, which isn’t designed to be written with a through-MTJ current, if it gets too much read current, then it will behave like a write current.
This makes read speed harder to improve. And, while SOT MRAM will have a write-speed advantage over STT MRAM, it won’t have a corresponding read-speed advantage.
Read speed can be enhanced with a differential cell using two MTJs programmed in opposite states, but at a cost. “The differential sense amp should be simpler than a single-ended sense amp because there is twice as much sense margin, and the ‘cell’ essentially becomes self-referencing,” explained Jeff Lewis, senior vice president of business development at Spin Memory. “So the variations in the reference for a single-ended cell don’t have to be dealt with. Therefore, you can read faster. But the 2X larger write power and 2X larger bit-cell area are major drawbacks.”
Endurance
Each time an STT MTJ is written to, it contributes to the wear-out of the MgO tunnel layer, which is very thin. This is not a new phenomenon; flash memory has the same issue. The overriding effect is called time-dependent dielectric breakdown, or TDDB. The longer the tunnel oxide is exposed to a field, the greater the risk of it breaking down.
“TDDB is a voltage-dependent phenomenon,” said Soss. “It’s well modeled, and everybody knows that property in their dielectrics. The more you can reduce the voltage that you have to place across that MgO, the longer it will live.”
So, in general, using lower voltages preserves the material for longer. That pushes against using strong write currents (stimulated by higher voltages). In this way, for a given MTJ, faster read and write times lead to lower endurance.
One proposed idea for mitigating the write effect on endurance is to use a pulsed process. The entire array is hit with a pulse, which will program some of the bits, but maybe not others. Which bits are programmed is random.
“You write something into it, and you read what got written to make sure that it actually got written,” said Rahul Thukral, product marketing manager, embedded memory IP at Synopsys. “You rewrite only the things that didn’t get written.”
If the unsuccessfully written cells could then be determined, then they – and only they – could be hit again with another pulse. This could be repeated until all cells are programmed. The idea is that, on average, cells would be hit by fewer pulses, helping to preserve endurance.
There’s some significant complexity in this approach. One has to be able to detect which bits didn’t program – and do it quickly enough that you don’t lose the advantage of the higher currents for faster writing. And determining which bits did or didn’t write is more complex in an array that may be in an arbitrary state before writing, because there is no bulk erase required before the write operation in the way there is with flash.
Flash replacement is less affected by the notion of endurance. Code storage doesn’t need to be written often, so endurance is less of an issue there. And flash has its own endurance limit, so switching to MRAM doesn’t change that.
But endurance is particularly important when replacing SRAM. SRAM has no known endurance limit, so an MRAM in that application will introduce such a limit. The challenge is for the endurance to be high enough that it doesn’t matter – and to convince designers that it doesn’t matter.
Data retention
Retention is all about how hard it is to disturb a bit unintentionally. “Retention is dominated by the energy barrier,” said Barry Hoberman, Applied Materials’ designated board member at Antaios. “In a well-designed tunnel junction, the zero and the one have the same energy level, but then there’s an energy barrier to get from one state to the other.”
MTJs can be designed for data retention of years – or minutes or seconds. A harder write will increase retention, but that harder write can compromise endurance.
This is where temperature stability is particularly important. “If you want to be highly thermally robust, then you want a high magnetic volume, a very strong magnetic field, and strong anisotropy fields,” said Soss. “You’re basically saying, ‘I want this thing very strongly coupled to its reference magnet.’”
One of the challenges of early MRAM cells was their ability to survive solder reflow events. Those involve temperatures of 260°C for up to 90 seconds (specifics can vary). That heat could disturb the contents of pre-programmed MRAMs on boards, meaning that they had to be soldered onto the board before being programmed.
“One of the primary application demands is to be solder-reflow safe,” said Soss. “That requires you to be very [magnetically] stiff.”
Today, devices that have been designed for this typically specify that they can handle five such reflow cycles: an original cycle plus two remove-and-repair cycles (each of which involves heat to remove the device and heat to replace it). That means they can be pre-programmed prior to being soldered.
For applications that replace NOR flash for code storage, endurance less important. And because writing to flash is very slow, one can be lenient with write speed. This allows retention to be strengthened for higher temperature stability. These MRAMs will have the usual 10-year retention.
Retention is also less of a question for applications targeting flash than it is for applications targeting SRAM. Designers tend to treat SRAM as if it has infinite data retention – as long as the power is on. If that SRAM is replaced by an MRAM that has a long, but finite, data retention, is that going to cause some worry for system designers?
There are various estimates as to how long SRAM data needs to live. It depends partly on the role of the SRAM. If it’s just buffering data on the way to some other storage, then it won’t have to last long. If it’s working memory, however, then its less clear how long it will be needed. Estimates are from a few seconds to a few days as being sufficient. The MRAMs being developed today for SRAM replacement have retention of a couple of months.
That said, it turns out that MRAM retention times are becoming comparable to the soft error rate experienced by SRAMs – which belies the notion that SRAMs have infinite retention. For applications where this is a concern, ECC may help. But rather than simply correcting read data, it may also trigger a scrub or refresh to correct any errant bits so that errors don’t accumulate.
Such a refresh could be much simpler and less frequent than DRAM refresh – perhaps on the order of once per second. Those extra writes would be traded off against any reduction in endurance that they might cause.
Flash or SRAM?
These four parameters set up the tradeoffs that both MTJ designers and memory designers must make. Those designers must decide whether to create a stiff, harder-to-disturb (and write) cell or a more responsive one that may also be more delicate.
For commercial viability, however, all of these interacting factors must be abstracted away. While it’s useful to understand the forces behind the different specs on different memories, when all is said and done, those specs should speak for themselves.
For applications targeting flash, some companies are even disguising the MRAM to look like flash, including bulk erase. “We fake them out. We create pages so that it actually interfaces just like e-flash,” said Thukral. “They’ve got their RTL in the older designs all done. They don’t want to change that interface.”
There are multiple ways to address that. “For a flash-like operation, you can write 1s and 0s as two separate operations,” said Soss. “This could be done by first erasing the entire macro, and then writing would be a single-pass operation where only the bits required were flipped. This leads to a simpler macro design, which leads to silicon area savings. But it does mean that if you want to ‘toggle’ the bits for a given address, you would potentially need to do this as a two-step write operation, masking to toggle only the bits required.”
SRAM-oriented MRAM needs to be byte- (or word-) addressable. That may make for a larger, faster array. But it’s competing with a six-transistor SRAM cell, which relaxes the size requirement. “The SRAM cell is so large anyway, so you can hide a lot of stuff under the carpet,” said Andy Walker, vice president of product at Spin Memory.
Selective writing then can save power. “You read whatever is in there, you compare it to the data that’s coming into it, and you change only those things that need to change,” added Thukral.
The physics of read and write speeds aside, SRAM-like applications tend to expect symmetric access times. “It really does you no good to have a write faster than a read,” observed Lewis.
“For a working-memory SRAM implementation, the flash-like architecture is probably going to be too slow,” said Soss. “You need to be able to control each bit’s bit line and source line independently from the other bits on the word line. This adds complexity to the architecture.”
It’s for these reasons that different MRAM technologies will be offered for different applications. One size will never fit all. While it’s always important to review critical specs for any memory, with MRAM, it’s particularly important to ensure that the version being chosen has characteristics that align with the application requirements.
Related
MRAM and STT-MRAM Knowledge Centers
NVM Reliability Challenges And Tradeoffs
What’s solved, what isn’t, and why these different technologies are so important.
Taming Novel NVM Non-Determinism
The race is on to find an easier-to-use alternative to flash that is also non-volatile.
Building An MRAM Array
Why MRAM is so attractive.
Magnetic Memories Reach For Center Stage
Why MRAM technology works best for connected devices.
Excellent comprehensive article.
This being said there are nuances not captured here.
For example much faster Read times can be achieved by optimum circuit design (see Numem’s Read times). The dual cell is not a good solution as it is high power and almost akin to pinging 2 separate cells which can be done on any memory.
The process trade-offs can be exercised as explained but in reality it is ok in Research but more difficult in Production as process qualification is an onerous and multi step process. Numem is therefore optimizing circuit design for optimization of power/ performance/ area .