New Memories And Architectures Ahead

So far there is not widespread adoption, but many see change as inevitable.


Memory dominates many SoCs, and it is rare to hear that a design contains too much memory. However, memories consume a significant percentage of system power, and while this may not be a critical problem for many systems, it is a bigger issue for Internet of Things (IoT) edge devices where total energy consumption is very important.

Memory demands are changing in almost all systems. While new memories and memory architectures have been on the drawing board for a long time, adoption still is not widespread. However, many in the industry believe the tipping point is near.

SRAM and DRAM have been the workhorses of the memory hierarchy for the past 50 years, with flash being adding into the mix more recently. All of these memory structures have problems scaling at smaller geometries, partially resulting from the fact that they are all surface-level constructs. The newer memory technologies, based on resistance switching, are metal layer constructs, which eliminate many of the fabrication issues. So while there may be reluctance to adopt them today, they may be the only memory technologies suitable for future generations of products.

Fig. 1: Memory taxonomy. Source: CEA Leti

The list of new memories vying for attention include Phase-Change memory (PCM), Ferroelectric RAM (FeRAM), Magneto-resistive RAM (MRAM), Resistive RAM (RRAM or ReRAM), spin-transfer torque RAM (STT-RAM), Conductive Bridging RAM (CBRAM) and Oxide-based resistive memory (OxRAM).

“Within the classic memory architecture there is a gap that needs to be filled,” said Steven Woo, distinguished inventor and vice president of marketing solutions at Rambus. “There will be one or two that survive, whether that’s 3D XPoint (PCM from Intel and Micron) or ReRAM or MRAM, or some other memory, we don’t know. The big question is which one takes off. But this is largely about moving data in and out rapidly across ranges of performance parameters.”

Directly related to that performance is power. “Performance at a given power consumption is a very important factor,” says Gideon Intrater, CTO for Adesto. “Memories are being used for very different things in different applications and the demands are changing. There is no one solution that solves all of the problems.”

So how much power is used by the memory subsystem? “Statistics for chips from specific applications have been gathered and they provide data for the compute system, as taking X percent of the energy,” says Drew Wingard, chief technology officer at Sonics. “Those figures tend to include the memory sub-system. As you peel back the layers, you see the memory power and for many architectures, memory power may be a third to a half of the total energy.”

Power and non-volatility were the factors that made flash attractive. “Many IoT devices rely on energy harvestin, so customers are concerned about the total power consumption,” says Faisal Goriawalla, senior staff product marketing manager within the IP group of Synopsys. “Consider an RFID tag. These passive tags are used in point of sale terminals and not powered by a battery source, instead getting their power from the RF field of the reader and thus have a very stringent power constraint.”

Another driving force for IoT devices is cost. “If you bring memory on-chip, you remove the pins and that reduces the cost of the system,” says Lou Ternullo, product marketing group director for memory and storage interface IP at Cadence. “The other side of that is that the silicon area increases. It is typically SRAM. But if you need embedded flash capability, that requires a specialty process, which is more costly. If you keep all of the memory on die, you lower the bill of materials costs. But if you need more density than you can fit on the die, then you have to go off chip.”

Computing total energy
The total energy consumed by memory has several components, all of which must be considered. These include:

• Memory cell maintenance power;
• Read, write and erase power;
• Interface power, and
• Architectural optimizations.

Different applications may balance these in different ways, along with other attributes such as persistence and performance.

There is an energy cost associated with the memory cell itself. This may include leakage currents, refresh circuitry, or active current necessary to maintain state. The total energy associated with many memories also may depend on their size, because increasing the size of the bit-line increases the power consumed by the driver circuitry or the amount of data that has to be refreshed during each cycle.

Consider DRAM. “DRAM is cost competitive although not ideal,” says Ternullo. “It uses a capacitive cell that has to be refreshed. As you increase density, the cell capacitance goes down and—thanks to the laws of physics—you have to refresh more often. So they are trying to be more intelligent, including techniques such as partial array self-refresh where in certain modes, if the entire DRAM is not required, that part will not get refreshed.”

Other memory types, such as SRAM, can have a significant passive and dynamic power component just to maintain their state. And while non-volatile memory (NVM) may have zero retention current, you cannot forget the surrounding logic. “Leakage may come from the circuitry that surrounds the memory core,” says Goriawalla. “NVMs have a large analog component using traditional CMOS devices. These are larger devices so the gate leakage from these is fairly small, but there is a digital component and these do leak.”

Then there is the energy required to read, write and possibly erase the memory locations. Several of these costs will be memory technology related. “For many types of flash memory, write current tends to be higher than read current,” continues Goriawalla. “Multi-time programmable (MTP) NVM are quite power efficient in comparison. They are 50 times lower on program current and about 10 times lower in read current. The reason for this is the mechanism used for storing the charge. In MTP memory you are utilizing Fowler-Nordheim (FN) electron tunneling which is more energy efficient compared to hot carrier electron injection used by embedded flash.”

Another consideration is the access mechanism. For example, many flash technologies require serial access. “With some of the emerging memory technologies, you can access them randomly and do not need to have sequential access,” says Michel Harrand, senior system and integrated circuits architect at CEA-Leti. “You need some power to write a bit, which may be 10 microAmps, and this is slightly more than DRAM, but DRAM has destructive read. When you read a bit, you read the full wordline and then you have to rewrite it all. Emerging NVMs can save some power even if they need more power to write a bit. They do not require refreshing. It is a tradeoff between how many bits you have to write. So it is difficult to have exact numbers because it depends on how many bits you write versus how many you read.”

The data to and from memory has to be transferred over some kind of bus. “Most of the power for memory is related to the interface itself,” says Ternullo. “In some cases you are dealing with the laws of physics which are hard to change. If you follow the standards, there have been changes in I/O voltages to help alleviate that. DDR3 was 1.5V moving to DDR4 which was 1.2V and LPDDR4 which is 1.1V. From a dynamic power perspective the voltage is a key component and as you lower the voltage, you lower the dynamic power.”

Another way to reduce interface cost is to integrate on chip, but DRAM cannot be embedded and scaling flash is becoming increasingly difficult. “System costs are affected by things such as the number of masks,” says Goriawalla. “Embedded flash needs 12-14 extra masks compared to CMOS technology and that can add 25% to the die costs.”

There are other factors that can affect total memory power, such as the architecture of the system. Cache is designed to reduce access times by placing a small amount of fast, power hungry memory on-chip and only having to access external memory when the information required is not in the cache. This reduces the time and energy related to accessing the external memory. If enough data can be stored in cache, then interface power can be minimized.

But SRAM is power hungry and so IoT systems would like to reduce their dependence on it. However, many flash devices, especially when off-chip, require significant amounts of SRAM. “Whenever the processor needs a piece of program that is not in cache, the processor will get the necessary program from the external memory and then continue,” says Intrater. “You hope that most of the program will sit in the cache. Flash becomes part of the memory hierarchy.”

But that is changing. “There is a feature called execute in place (XIP) where you use a serial NOR flash just like off chip memory. You have the processor directly accessing data off the serial NOR flash like a memory device.”

Yet another factor that affects power is the size of the reads and writes. When using DDR, single bytes or words are not read at a time, instead possibly 512 or 1024 are read in a burst. It is dependent on the application if this was a good use of energy or if the extra bytes read were not needed and thus represent a waste. This is an example where performance has been the driving issue and may be an expensive power tradeoff for many systems.

A similar type of problem exists with NAND-type flash memories that are unable to write to a single location, instead requiring that a block of memory is first erased and then rewritten. This means there is a power penalty associated with small writes. An SRAM cache, even though it consumes more power, defers writes and may finish up providing energy savings.

For many systems, existing memory technologies may be too power hungry. “For systems that use power scavenging, they often have a requirement for write current less than 5µA and read current of a few µA/MHz,” says Goriawalla. “This is coupled with a minimum Vdd of 0.7~0.9V. So we are talking about tiny system power requirements. These figures are an order of magnitude less than most embedded flash technologies.”

Other Considerations
Similarly, there are many aspects to cost, some of which have already been discussed. Power consumption has a direct impact on cost, but other factors include number of pins, packaging, on-chip costs or costs associated with integrating on a board.

“Smaller systems tend to have embedded flash that sits on the same die,” says Intrater. “But there are limitations with this. We are seeing systems that want to migrate to 28nm, but at that process node there is no embedded flash. Flash is lagging behind the standard CMOS process. If you can use embedded flash, it is invariable the best solution, but there are many cases where you cannot do that because of cost, or the rest of the system needs to be on a more aggressive process node. Then you need to revert to using external memory and you start running into issues of performance and power consumption. The link between the two needs to have the right characteristics to provide enough performance and it consumes a lot more power.”

This decision becomes a lot easier with the new memory types. “It is getting to be difficult to have a flash process together with logic on the same chip,” says Harrand. “The new memories are friendlier and everything is on the back-end of the production process. This means you can have exactly the same transistors in the logic process, and this makes it easier to embed with logic or processor. At 28nm it is becoming very difficult to make an embedded flash process, so there is an opportunity to replace embedded flash with these new memories.”

Many of the interconnect architectures were optimized for existing memory types. “The DDR interface is not well suited for NVM,” says Harrand. “With DDR you first declare what line or page you want to address and then you declare if you want to make a read or write. This is not good for the emerging NVMs where you have to know at the beginning if you want to do a read or write.” Harrand says there are ways to overcome this limitation but they reduce performance.

Another barrier is accepting new tradeoffs. “Today you can have the speed of a DRAM with sufficient endurance but not density, but you have speed and endurance at the same time,” continues Harrand. “There will be some that are optimized to be close to DRAM and have full retention or ones that have a few days of retention but not 10 years or you will have ones optimized for retention, but they will not have the speed.”

It is not yet clear when and which of the new memories will break down the barriers first, but microcontrollers are appearing with these new memories embedded into them. As volume increases, costs will come down and this will accelerate adoption. Before long, many IoT systems will have no choice but to incorporate them especially if they want to make use of newer fabrication technology nodes.

Related Stories
What Are FeFETs?
How this new memory stacks up against existing non-volatile memory.
How To Make 3D NAND
Foundries progress with complex combination of high-aspect ratio etch, metal deposition and string stacking.
Sorting Out Next-Gen Memory
A long list of new memory types is hitting the market, but which ones will be successful isn’t clear yet.
The Future Of Memory (Part 2)
The impact of 2.5D and fan-outs on power and performance, and when this technology will go mainstream.
New Memory Approaches And Issues
What comes after DRAM and SRAM? Maybe more of the same, but architected differently.


Paul says:

In the article it says DRAM cannot be embedded. But I’ve just been reading that Intel GT3e graphics chips will feature eDRAM cache for higher on-chip bandwidth and use 14nm process technology. What is the truth about eDRAM?

Brian Bailey says:

Hi Paul – yes, you are right that if you control the entire fabrication line, then embedded DRAM is still a possibility, but is not offered by any standalone foundry that I am aware of. That means if you are not Intel, or IBM or a few other players that can do custom development with a fab, you are most likely not going to have that option. The last node, that I am aware of that had a DRAM option was 90nm. The biggest problem is that DRAM is optimized in a very different way than logic and that makes it difficult to combine them.

Paul says:

Apparently IBM’s Power9 is fabbed at Globalfoundries (14nm FinFET) and has 120MB of L3 cache on board (using IBM’s embedded DRAM). The eDRAM is presumably IBM proprietary and therefore not a standard offering by the foundry, as you say.

Brian Bailey says:

A luxury available to only a few. But with the new memories, I think this will become a viable option for the masses. It just takes one large volume product to work out all of the kinks and bring the prices down, then the avalanche will probably start.

Leave a Reply

(Note: This name will be displayed publicly)