Memory Architectures Undergo Changes

New options emerge for breaking down barriers for throughput, capacity and cost.


By Ed Sperling
Memory architectures are taking some new twists. Fueled by multi-core and multiple processors, as well as some speed bumps using existing technology, SoC makers are beginning to rethink how to architect, model and assemble memory to improve speed, lower power and reduce cost.

What’s unusual about all of this is that it doesn’t rely on new technology, although there certainly is room for new developments within this scheme. The real focus, instead, is on using memory differently—carving it up in different ways that put an emphasis on integration, faster connectivity to other memories and processors, and in some cases using only slices of a particular memory for a very specific task rather than viewing it as a monolithic structure.

Among memory makers, this approach is aimed at minimizing the effects of what is known as the “memory wall.” Communications between chips, and sometimes within the same chip, are too slow. Multiple levels of cache have been added to hide that wall from users, but with multi-core chips that cache now has to be shared among cores and regularly updated. As a result, new approaches need to be considered.

Re-architecting the memory
The Hybrid Memory Cube Consortium has created one such approach. By stacking memory on memory atop a logic platform, and using through-silicon vias to connect various layers, the group has created an all-in one solution for many of these problems.

“If you have a custom chip with a memory interface that supports high bandwidth, low-latency and the right form factor, then the Hybrid Memory Cube is a very good solution,” said Shafy Eltoukhy, vice president of operations and technology development at Open-Silicon. “You don’t have to worry about manufacturing all of the pieces and you get Wide I/O on the stack. This could be a very good solution for some companies.”

He said the upside of a do-it-yourself approach is the ability to differentiate. The downside is increased risk from yield and manufacturability of custom memory configurations, which in turn can affect overall cost. And because multiple memories are an integral part of the overall SoC design, getting any piece of that wrong can have economic reverberations for the entire design—and with rising costs of SoC designs, possibly an entire company. Even if engineering teams do get it right, there are limits to using DRAM.

“The trouble with DDR3 and DDR4 is that it does not address how to go to 1 terabyte per second,” Eltoukhy added. “There is a lot of cost with the Hybrid Memory Cube, and even with high-bandwidth memory—at least up front—but you will never get there with DDR3 and DDR4.”

Still, it remains to be seen just how economies of scale affect pricing of the HMC approach. The memory industry has proven very adept at commercializing technology and reducing prices over time, but that history is with commodity parts. TSVs and a logic layer add new costs into the equation, even though there are huge benefits in speed and market readiness.

Throughput issues
The HMC addresses the throughput challenge by using a TSV, but it doesn’t address the movement of data to and from the HMC. That’s the job of the interconnect, which used to be a simple wire. But as Moore’s Law has reduced the thickness of wires along with the transistors, resistivity and conductivity have become big problems. The result is heat, more power required to drive signals, as well as electromigration and signal integrity issues. In addition, the wire lengths are now significantly longer than in the past, even though they are thinner.

Doubling the wire widths, rather than shrinking them at 20nm and beyond, is one way around this problem. In fact, some designs now use double-width wires rather than shrinking all of the components in the design at the same rate. In conjunction with that, Wide I/O or Wide I/O 2 can improve throughput to and from memories. Those approaches are gaining favor among chipmakers and standards bodies such as JEDEC.

“Wide I/O, and stacking a memory device on a controller is a logical future,” said Bill Gervasi, memory technology analyst at Discobolus Designs, during a panel discussion at MemCon earlier this month. “But DDR3 remains the mainstream. You would pay 50% to 100% more for a 10% to 15% power decrease in DDR4.”

Latency issues
Connectivity is just one piece of the puzzle, though. Latency is another, and latency is only partially dependent on the memory. Because it’s impossible to continue turning up the clock frequency without cooking a chip, multicore configurations are required for improved performance. But getting performance increases out of a single application requires it to be processed across more than one core. In the case of multiple cores, using a single memory is too slow. Still, using multiple memories requires that data be updated and checked across them to ensure it is coherent.

“We’ve got very heterogeneous machines that are trying to communicate with each other through external memories with increasing latencies,” said Drew Wingard, CTO at Sonics. “Much of the ASIC design model is tied with the highly synchronous design model, so my flip-flop will capture this thing on the rising edge of this clock, and I’ve got my fundamental synchronous digital abstraction of chips should work.”

Memories themselves have different latency rates. Some are better than others. As a result, mixing and matching of memories for specialized uses that suddenly is looking very promising.

“There is no universal memory,” said Marc Greenberg, director of product marketing for DDR Controller IP in Synopsys’ Solutions Group. “But there is an opportunity for new layers of memory that are optimal for cost and performance. That includes next-generation high-bandwidth memory, the Hybrid Memory Cube, and memories that appear in a hierarchy between memories that exist today. So you may see high-bandwidth memory or HMC plus DDR4.”

The challenge, he noted, is that unless they share the same bus it adds cost. “So you can get a new tier with a different interconnect—through-silicon vias or Wide I/O—but at what cost. The place we’re likely to see that first is in places where this can be built into the price, such as high-end networking.”

Capacity issues
Another challenge involves capacity. While DRAM can be stacked to increase capacity, the big challenge in the server world is capacity per core.

“This runs into the fundamental device physics of DRAM,” said Bob Brennan, senior vice president of the System Architecture Lab at Samsung Semiconductor. “You need Wide I/O 2 or high-bandwidth memory in there, but you also have to organize the same memory differently. Even if you go wide, you still access the DRAM with the same latency. This is a problem because of the massive capacity problem in CPU scaling. You scale the number of cores, and the memory capacity required goes up. We’ve gone from 4 cores to 8, 16, 32 and even 64. That’s 2 gigabytes per core.”

Brennan said there clearly is room for a middle layer of new memory, as well as a mix of multiple types of memory in devices ranging from smart phones to servers.

Memory architectures are evolving. DRAM is the default memory. It’s not the fastest type of memory, but it is cheap and relatively reliable. But talk about memory walls is increasing, and some new approaches will have to be taken.

The reality is that DRAM will continue to play a role for years to come—maybe decades. It will be a commodity part of most memory configurations. But in many instances, including 4K high-definition video processing and fast processing of multiple cores, it also won’t be the only type of memory in use. Cost will always play a role, but minimum throughput, reliability and capacity will require additional types of memories as well as different ways to configure them more effectively. After years of predictions about change, at least some changes are definitely appearing on the horizon. The only question is when and how they will be implemented.

Leave a Reply

(Note: This name will be displayed publicly)