Thanks For The Memories

As memory becomes more complicated and difficult to predict, system architects may be longing for previous generations of technology.


By Ed Sperling
The amount of real estate in a design now devoted to memories—SRAM on chip, DRAM off chip, and a few other more exotic options showing up occasionally—is a testament to the amount of data that needs to be utilized quickly in both mobile and fixed devices.

Memory is almost singlehandedly responsible for the routing congestion now plaguing complex SoCs. It is one of the main reasons why so much power is expended on devices. And it has become one of the hotbeds of design and architectural change that is now underway throughout the IC industry as design teams strive to boost performance, decrease power and push the barriers of what’s possible in electronics.

Put in perspective, the improvements from DDR2 to DDR3 and soon DDR4 have been momentous—but they’re clearly not enough.

“DDR2, 3 and 4 are not keeping up with algorithmic growth,” said Martin Lund, senior vice president of R&D for the Silicon Realization group at Cadence. “All the data has to get stored using packet buffers and DRAM, but the data is accelerating faster than the memory standards.”

That’s only part of the problem. On-chip SRAM is being besieged by so much data that it’s being overwhelmed. But just adding more memory creates other problems. As Jim Elliott, vice president of marketing at Samsung Electronics, noted in a recent speech at MemCon, “Memory is always on and always connected.” In fact, according to Samsung’s statistics, DRAM and storage consume 34% of the power used by servers today. And just to put that in perspective, by the year 2030 10% of the power consumed by the Pacific Northwest will be consumed by data centers—more than is consumed by the entire nation of Austria.

Latency and cache coherency
Adding more memory to deal with all of this data solves some problems and it creates others. Latency is one of the big issues.

“The big bottleneck right now is bandwidth to the memory,” said Kurt Shuler, vice president of marketing at Arteris. “Even if you get the memory close enough to the processing units there is still a latency problem.”

That latency has become particularly troublesome with multi-core processors that need to be cache-coherent. Adding multiple processors on an SoC with dedicated or even shared memory is relatively straightforward as long as they aren’t required to share data. But if that data is required to be coherent, then snooping of caches is required, and that can bog down an entire system.

“We’re seeing a surge in interest in cache coherency,” said Shuler. “Whether it’s transactions or packets, you still need scheduling and protocol management.”

Stacked memory
Stacking memory solves at least some issues with coherency. With Wide I/O and shorter distances, it may be possible to eliminate some cache memory entirely. The first examples of this kind of approach are expected to begin hitting the market next year, with a 3D memory stack on top of a logic layer due in late 2013 in the form of the Hybrid Memory Cube.

“You have to get around the memory bottleneck somehow, and 3D memory gives you higher bandwidth and reliability and performance,” said Scott Graham, general manager of hybrid memory at Micron. “At this point we’re running high-speed SerDes to the outside edge of the HMC, with a DRAM stack in the middle. So the heat is on the outside, and there are more than 2,000 TSVs, which provide redundancy and repair capabilities.”

He noted that the initial devices did have signal integrity problems because of the noise and heat, but the HMC consortium has figured out solutions to those problems. “Right now we have three generations of active design under our belts,” Graham said. “It’s still the same DRAM cells with the same DRAM process, but it’s a radical change in DRAM performance. We’ve isolated the logic level on a separate layer, which solves a lot of challenges for performance, power and real estate.”

A number of companies are working on this approach. “It’s our belief this is the right direction,” said Joe Rash, RTP site manager at Open-Silicon. “DDR4 is beginning to sample now. The question is whether there will be another generation after DDR4. We may never see a DDR5.”

Future issues
That’s good and bad. Standards for DDR2, 3 and 4 have allowed makers of controller IP to support multiple DRAM generations. As new memory architectures roll out, more proprietary solutions will fill the gap—at least in the short term. Micron opened up the Hybrid Memory Cube to other companies, though initial implementations are aimed at the server market. Future generations will target mobile applications, but no one knows when those will begin showing up. That means there will be no standards to guide IP developers and system architects in the short term.

There are other problems that are beginning to surface, as well. Quantum effects have been theorized for a long time, but they are only now beginning to be understood—and experienced. Michael Miller, vice president of technology innovation and systems applications at MoSys, said that latency in memory may vary in the future because the charge applied to memory at advanced process nodes is sometimes released at unpredictable intervals.

“We’re going to see this bouncing around,” said Miller. “The charge is being pushed into the substrate and stored.”

STMicroelectronics has observed this at gate lengths lower than 10nm, when the quantum well between source a and drain induces energy-level splitting, according to Laurent Le-Pailleur, the 32nm/28nm Technology Line Management director for front-end manufacturing and process R&D at ST. While this isn’t a problem at 28nm, it’s something to begin preparing for in the future.

Other types of memory also bring their own challenges for longevity, holding charges at low power and cost and yield. There has been a continuous search for a universal memory that can replace both DRAM and NAND, and possibly SRAM. That search has been underway for years, and so far the industry is still focused on DRAM and SRAM. Both of those have their own set of issues, but at least they’re well known at this point.