Spin-orbit torque memory adds endurance and faster write speeds, but displacing existing memories is still not easy.
In an era of new non-volatile memory (NVM) technologies, yet another variation is poised to join the competition — a new version of MRAM called spin-orbit torque, or SOT-MRAM. What makes this one particularly interesting is the possibility that someday it could supplant SRAM arrays in systems-on-chip (SoCs) and other integrated circuits.
The key advantages of SOT-MRAM technology are the promise of faster write speeds and much longer endurance. As with any new technology, there are challenges with respect to cost and complexity, and what looks attractive out of the gate still may not achieve mass adoption. But with research continuing, some of these drawbacks could have solutions by the time the technology becomes commercially available.
“With SOT-MRAM, because it uses a different mechanism, its write performance is 10 times higher than prior-generation STT-MRAM,” said Mingchi Liu, staff technical marketing manager, embedded memories at Synopsys. “And it will have an advantage on lifetime reliability.”
MRAM is one of the many non-volatile memory technologies moving into commercial production today. An earlier version, known as toggle MRAM, has been around for a while, and the latest commercially available option is spin-transfer torque, or STT-MRAM.
MRAM joins other technologies in a race to knock flash off its NVM pedestal. Flash still provides by far the lowest per-bit cost — a barrier to any newcomers — but it has drawbacks such as long programming times and the need to program and erase large chunks of the data at a time. The latter makes flash memory management much more complex. It also uses a lot of energy when programming cells, and it’s difficult to embed with logic.
Compared with flash, STT-MRAM promises lower power and faster write times. But it requires tradeoffs between speed and endurance, which suggests different devices may be better for different markets, with some tuned towards performance and others toward applications with a need for many write cycles.
SOT-MRAM shows promise for removing that tradeoff. “With STT, you have to give up on retention or endurance,” said Jean-Pierre Nozières, founder and CEO at Antaios. “Here you can get endurance and retention and high speed.”
Replacing SRAM is, of course, a mammoth challenge. “SRAM may be implemented on any logic die without any additional processing steps, it’s one of the fastest memories, can be made power efficient (if leaky) in operation,” said Marc Greenberg, group director of product marketing for DDR, HBM, flash/storage and MIPI IP at Cadence. “But it’s also one of the least area-efficient, and it is volatile.”
SOT-MRAM should provide some area relief and add volatility. And while it can’t replace the fastest SRAM, it may help with larger arrays using slower SRAM.
Understanding how SOT-MRAM may address these challenges requires a discussion of how the basic bit cell works — and, more importantly, how it’s programmed.
STT-MRAM
Modern MRAM devices employ what’s called a magnetic tunnel junction (MTJ). It can be simplified as having three layers — a fixed or “pinned” magnetic layer, a tunneling dielectric, and a free magnetic layer.
When programming a device, one sets the free layer’s magnetic field to be oriented either parallel to, or anti-parallel to, the fixed layer’s field. The idea is that, effectively, the fixed layer filters the spin orientation of the current passing through it. When that current hits the other magnetic layer, if it’s oriented the same way, then more current can pass through. If oriented the opposite way, then less current makes it through.
Fig. 1: The amount of read current depends on the relative orientations of the two magnetic layers, establishing the state of the cell. Source: Bryon Moyer/Semiconductor Engineering
A cell is programmed in an STT-MRAM MTJ by running a large current through the MTJ. The direction determines which way the free layer is programmed. More current programs it “harder,” enabling faster access times. But it also creates damage in the cell, which contributes to wear-out. If higher endurance is desired, the write current must be lower, slowing the process.
“The disadvantage currently is that its write speed is 10 times slower than the read speed,” said Liu. “Because the performance is slow, it will be difficult to replace SRAM.”
Fig. 2: STT-MRAM cells are programmed using a write current larger than the read current that runs through the MTJ, which causes long-term damage to the cell. Source: Bryon Moyer/Semiconductor Engineering
SOT-MRAM
SOT-MRAM has a different write path, with current flowing along the bottom of the MTJ (in-plane) rather than through it. That decouples the programming from the read path and eliminates the damage that can occur with STT-MRAM programming. This requires a new layer, commonly referred to as the strap, through which that programming current will flow.
Fig. 3: SOT-MRAM cells are programmed with a current that runs along the strap, eliminating MTJ damage. In this case, the free layer tends to be on the bottom for proximity to the strap. Source: Bryon Moyer/Semiconductor Engineering
There are two fundamental challenges to SOT technology that remain the subjects of much research and development. The first deals with the need for an external magnetic field while writing. The second relates to the size of the bit cell.
Field-free switching
The first challenge has to do with how a deterministic field direction can be set up. In its purest sense, the programming mechanism involves some stochastics, and for a given magnetic domain it’s hard to control which way the field will go.
There are at least two ways to accomplish this control. One is to make an asymmetrically shaped strap, but that affects cell area. The other is to apply an external magnetic field, which has obvious drawbacks.
While commercial developments have been moving forward with the external field approach, researchers have identified potential ways to perform so-called “field-free” switching. The emerging solutions are challenging to understand for folks not steeped in the intricacies of materials, surface states, spin indices, the Dzyaloshinskii–Moriya effect, the Rashba effect, and other arcana that arise in many papers.
But researchers [1,2] have found that by sandwiching the right metals together with the right combination of ferroelectric or ferrielectric materials, and with the right spin-index relationships, the magnetic symmetry can be broken to drive the desired orientation.
The general expectation appears to be that when commercial devices appear, they will be capable of field-free switching.
Reducing the bit-cell size
A more challenging issue is the fact that while the STT cell has two terminals shared by read and write currents, the SOT cell has three terminals, since the write current has its own path. That means another select transistor. This is the default configuration, and it makes for a larger bit cell than STT has. It’s the extra transistor that’s the main cause of the larger size.
“SOT is better than STT everywhere save for one factor that is very important — area cost,” said Nozières. “Our target is to achieve half of the area of SRAM. You do this with zero leakage and a much simpler periphery, because there’s no need for sleep and wake up modes and so on.”
Fig. 4: STT uses a two-terminal cell, while SOT requires an additional terminal for the write current. Source: Bryon Moyer/Semiconductor Engineering
The difference between the STT and SOT bit cells may not be as big as one might think. In fact, it’s less than the 50% increase that one might intuitively assume, said Barry Hoberman, board member and advisor at Antaios.
With STT, the write path consists of the select transistor and the high-resistance MTJ. In SOT, it’s the select transistor and the low-resistance strap. So, with STT, there is pressure on the transistor to be lower resistance — meaning it needs to be wider.
The SOT transistor doesn’t have that same pressure, so it can be sized smaller — enough that it reduces the bit-cell size difference. The specific size and current numbers remain proprietary.
Nevertheless, the SOT cell is still bigger than the STT cell, so efforts are underway to reduce that impact. Circuit techniques and more sophisticated ideas have been proposed in papers showing ways to eliminate one of the terminals, a couple of examples of which we’ll discuss. These appear to be only ideas at this point, and it’s unclear whether they will be commercialized.
A 2018 paper [3] from CEA Tech, Aix-Marseille University, University Grenoble Alpes, and CEA Leti shows two possible circuit approaches, one of which involves some leakage through the MTJ. These don’t eliminate the third terminal, but they reduce the number of select transistors to one and zero, respectively.
Fig. 5: Two proposed circuits that eliminate one or both select transistors to reduce the bit-cell size. On the left, the read and write paths are shared, and a small current goes through the MTJ when writing; most of the current goes through the select transistor since it has lower resistance. On the right, there are separate paths for read and write, and there is no select transistor. (Some details were unclear in the paper.) Source: Bryon Moyer/Semiconductor Engineering based on source paper. [3]
Work has also been done at the National University of Singapore, the India Institute of Technology, and Korea University [4] that suggests a way to share the write terminals for all the cells in a single row, eliminating the need for each cell to have its own write terminal.
The mechanism is subtle and involves a gate material over the magnetic stack that either accumulates or removes oxygen ions from the free layer. The presence or absence of those ions sets the polarity of the spin accumulation at the interface of the strap and the free layer. In that manner, using a current of one direction plus a gate voltage per cell — which doesn’t take the area required by an access transistor — each cell on the line can be programmed to its own state.
Fig. 6: A layer of gadolinium oxide contains oxygen atoms (yellow circles) that can migrate into the cobalt under the influence of a positive voltage, setting up one spin polarity at the cobalt/platinum interface. A negative voltage pulls the oxygen back, giving the opposite interface polarity. Source: Bryon Moyer/Semiconductor Engineering based on source paper. [4]
The challenge at this point is that the migration time of those ions is long, hurting the write access time, so this is conceptual for now. It remains to be seen whether it can evolve into a commercially viable mechanism that shrinks the bit cell.
There are other ideas over the longer term. “More radical approaches go even further, making the SOT size smaller than STT, but these will take time and are not really yet on the horizon,” said Nozières.
Access-time symmetry
Another question gets to read and write access times for a bit cell. The SOT technology changes the writing mechanism in a way that allows fast writing without hurting endurance. “We believe we can operate this thing down the road anywhere between 3 and 10 ns,” said Nozières.
But the read mechanism remains unchanged. So while write times now can be faster for SOT than for STT, read times don’t move.
While faster writes might sound good in principle, it’s not clear whether having a faster write time than read time has value. Memories will, on average, be read many more times than they will be written. Overall performance will therefore be impacted more by improving read speed than write speed.
Some argue that the best arrangement is symmetry, where read and write times that are the same. Nozières agreed: “In practice, you want the read time to be symmetrical.”
That ends up potentially giving up some of the write speed gain that SOT technology brings, but symmetry is intended to provide write times that are “good enough.”
Testing magnetic fields
There also have been some test equipment announcements in support of SOT-MRAM lately. The necessary developments have to do with external magnetic fields and with write times. The equipment available today is focused more on development than commercial high-volume production, so it needs to be able to do things that may or may not be necessary in a production tester.
One of those capabilities involves the external magnetic field. The ability to apply that field is necessary today. But even if field-free programming becomes a commercial reality, this field still will be needed for characterization and even full-on testing.
As with any other parameter, the magnetic characteristics of a given bit cell are subject to manufacturing variation. There are intrinsic magnetic characteristics of the bit cell that must be measured, and that is done with an external field. In fact, testing an array for stability in the presence of an external field means determining the coercive field — that is, how well the programmed state can hold up to some other nearby field.
“With STT, we vary a magnetic field on top of the MTJ, and then we can flip the state of the device,” said Siamak Salimy, CTO and co-founder at Hprobe. “From that we extract the coercive field, and then the anisotropic field. We do the same thing for SOT, but we then need a 2D magnetic field, where only 1D was required for STT.”
Nozières explained the notion of magnetic anisotropy. “It is the preferred direction in which magnetization wants to lie,” he said. “It is desired to be uniaxial, which enables two states, depending on the direction along this axis. It represents the ‘stiffness’ of the magnet. Coercivity is the magnetic field value it takes to overcome anisotropy and switch from one direction to the other. In an ideal world, it would be the same as anisotropy. However, materials have defects, and coercivity is always lower than anisotropy, sometimes by far.”
So testers are likely to require the ability to apply and adjust a local field with precision and speed. Getting a known field with precision is difficult, because the field can vary by position — particularly when using a probe card.
“The probes do some scrubbing when they hit the pads, so the point of contact will change a little bit from location to location,” explained Henry Patland, president and CEO at ISI. “And the wafer itself is not perfectly flat. We integrate our magnet with the probe card so that, during the positioning of the wafer, when we contact the probe card, we’re working inside a uniform field.”
“We typically limit ourselves to about a 1mm uniform field,” said Wade Ogle, vice president and COO of ISI. “We can measure up to eight devices in parallel and still be within the uniform field.”
High-speed testing pulses
The analog circuitry on test heads also must be improved. Programming pulses as short as 200 ps must be generated, at least for the time being, as the bit cells are evaluated and improved prior to launch. That’s a shorter time than has been necessary in the past.
But there’s an accompanying capability that’s even harder. The focus of SOT devices is the ability to program using an in-plane current rather than a perpendicular current. When testing and characterizing a device, one needs to be assured that all of the programming is happening through that in-plane path. The vertical path, normally used for STT, must be suppressed to ensure that none of the programming is coming from any stray STT currents.
“We want to maximize the switching of the device through the SOT and try to avoid the current flowing through the STT channel,” said Salimy. “For that, we need to apply extremely synchronized pulses to make sure that, when the pulse is at the bottom of the MTJ pillar, we have exactly same amplitude on the other side.”
Generating those kinds of pulses can be particularly difficult in a test environment. “We’re doing 500-ps pulses through a one-meter cable into unterminated high-impedance devices,” remarked Patland. The way they pulse the MTJ appears to be a little different, but the idea is the same — to eliminate any current through the MTJ.
So, on test chips, the tester has to be able to generate an accurate pulse on two signals and have them synchronized within 10s of picoseconds. On commercial circuits, those pulses will be internally generated. In addition, some technologies use “STT-assist,” where a small STT current bolsters the SOT mechanism, lowering overall write current. But test structures for wafer acceptance test may still need this capability.
SOT-MRAM vs. flash and NAND
The big question is where SOT-MRAM can gain traction. It will be extremely difficult for it — or for any of the new NVMs — to compete against either NAND flash or DRAM simply based on cost.
“Even though DRAM and NAND are highly optimized and may not be able to make new progress at the same rate of new memories, they benefit from decades of technology investment by some of the world’s largest semiconductor companies in a market that sells tens of billions of dollars of product per year in each technology,” cautioned Greenberg. “There is a lot of money that has been spent to optimize DRAM and NAND to the point that they are at today. So even if a new technology gets 20% better every year, it still has a long way to go to catch up with DRAM and NAND.”
Being better at one or two things may help, but, in general, the maturity of flash and DRAM and the fact that they continue to evolve will likely keep dedicated MRAM chips out of economic reach.
“It’s not good enough for a novel memory to beat the established memories on some metrics,” added Greenberg. “It almost has to beat DRAM or NAND on all the key metrics like bandwidth, latency, capacity, cost, power, and endurance.”
Antaios appears to have no illusions about this. “As the technology is today, dedicated chips seem out of reach because cost-wise it’s never going to compete with DRAM.”
MRAM vs. SRAM
As an embedded memory, however, things are different. Both embedded flash and DRAM are difficult and require many extra processing steps. MRAM requires fewer extra steps, and it’s largely compatible with CMOS logic processes.
“There are commercial chips that use MRAM for its unique value proposition, although features are bigger than DRAM or NAND,” said Subodh Kulkarni, president and CEO of CyberOptics.
From a processing standpoint, SOT doesn’t require anything that STT doesn’t already require, with one exception, so a foundry running STT should be able to run SOT, as well. That exception is the possible metal layer that will provide field-free programming. Some consider it to be close enough to known fab-friendly metals so that it won’t be an issue. But it is a new material, so there is the possibility of some extra qualification there.
As an embedded memory, SOT may or may not compete directly against STT. “I’m pretty sure that, to begin with, SOT will complement STT,” said Nozières. “The manufacturing environment, the materials, the processes, and the equipment are the same. Nothing will prevent the foundry offering flavor A and flavor B of the MRAM to all customers.”
In this scenario, it’s more likely to compete against SRAM. SOT-MRAM and SRAM are roughly equivalent in energy consumption. But SRAM still has an edge on speed, so the first levels of cache are likely to remain out of reach for SOT-MRAM.
“The first-layer cache puts them in the 300 ps cycle-time range,” said Hoberman. “And that’s strictly the domain of MOS — and probably will be for a very long time.”
Last-layer cache appears to be an opportunity, and density is a benefit for that application. STT-MRAM theoretically could play there if endurance can be achieved at the needed write speeds. But system designers are used to the infinite endurance of SRAM, so even having to think about endurance becomes a negative. “With SRAM, nobody will question its lifetime,” observed Liu.
There has been discussion of a simple refresh capability for STT in exchange for lower data retention. That could make it suitable for last-level cache, because data lifetimes there are relatively short.
“If you want to use embedded MRAM for SRAM replacement, do you care about data retention?” asked Liu. “If it can hold the data for one day, is it good enough? Actually, one day is too long. Maybe just one hour is good enough. And the system gets totally refreshed every minute.”
In this manner, STT and SOT might compete, with the added complexity of refresh trading off against the smaller bit-cell size. Then again, if the refresh is done in an easy-to-use manner that doesn’t increase the die cost too much, it could be tolerated much the way DRAM refresh is today.
IoT devices would be a different matter. “[STT with refresh] is definitely not suitable for the IoT, where you need data retention,” noted Liu.
They also need any power reductions they can get, since many of them run on small batteries. “The write energy per bit for SOT is about an order of magnitude lower than it is for STT,” said Hoberman. Non-volatility is also a plus, since system state can survive a power cycle. That has the potential to speed up boot times.
Still some years away
Whether or not these benefits will give it a leg up on SRAM in some applications won’t be known for a while. SOT-MRAM development continues, but it’s going to take some time. “I don’t think that technology will be ready for customer sampling before 2024,” said Nozières.
And by some estimates, that’s an optimistic timeframe.
Notes:
Related Articles
MRAM Evolves In Multiple Directions
But one size does not fit all, and fine-tuning is required.
NVM Reliability Challenges And Tradeoffs
What’s solved, what isn’t, and why these different technologies are so important
I would like to known the more exact data. I would like to know the read speed of SOT MRAM, and Read and Write speed of STT MRAM.