Memory Gets Smarter

…And so does the context for what kind of memory gets used where, and how it will be used.


By Ed Sperling
Look inside any complex SoC these days and the wiring congestion around memory is almost astounding. While the number of features on a chip is increasing, they are all built around the same memory modules.

Logic needs memory, and in a densely packed semiconductor, the wires that connect the myriad logic blocks are literally all over the memory. This is made worse by the fact that performance gains in the CPU now require multiple cores, whether that’s actual processor cores or GPUs or a combination of those plus hardware accelerators. The more cores, the greater the focus on cache coherency so that the data in one cache remains in sync with other memory caches.

Add more functionality into the SoC, and the contention for memory modules goes up. Add more memory bits, and the need for more intelligent and extensive scheduling goes up. Put them all together, and the amount of signal traffic around and into a memory skyrockets.

Given this context, it’s not hard to understand why chipmakers are rethinking what kind of memory they use, how they utilize it, and how it should be connected to the logic. Memory has become the bottleneck in designs, and getting signals into and out of that memory—intact and quickly—is now a top priority.

Rethinking DRAM
One solution that has received much attention is the Hybrid Memory Cube (HMC), initially created by Samsung, Micron and IBM, and now an industry consortium. HMC uses a 3D-IC of DRAM chips connected by through-silicon vias atop a logic layer created by IBM.

HMC is arguably the most radical architectural redesign in the history of off-chip memory, and the first incarnation is aimed at networking and high-performance computing hardware to address what has become known as the “memory wall.” It likely will be part of a 2.5D offering in the next couple years, as well, with the memory connected to other logic, analog IP and I/O subsystems through an interposer.

“We entered the last phase of Moore’s Law in 2003 because of power,” said Rich Murphy, senior architect for advanced memory at Micron. “The clock rates on processing have gotten to the point where we now have to deal with concurrency and multicore architectures. The big thing now is the way we organize systems has changed.”

Memory hierarchies and ever-larger on-chip cache have been so predictable since the 1970s that they can be plotted almost in a straight line. In fact, one of the key drivers of Moore’s Law has been DRAM. But multicore and multiple cores accessing the same memory have made this task more difficult for on-chip SRAM, and keeping cache coherent across multiple cores requires extremely fast communication to the memory and within the memory.

“What’s changed is that we’re now getting more creative about how we arrange memory,” said Murphy. “We’re moving more data around so we’ve had to change the memory and place that data in a way that makes sense, and schedule it or stage it in a way that’s most efficient. Big cache hierarchies can’t keep up.”

This approach may only be a first step, too. While there is a long history for DRAM manufacturing, there are other memory technologies under development that can greatly improve power and performance.

“One of the advantages with this approach is that the underlying memory technology can be change and it can still operate the same way,” said Manohar Ayyagiri, technical marketing manager at Open-Silicon. “One of the issues customers are facing is that a DDR solution does not scale. The traditional solution of memory and ASIC on a PCB isn’t fast enough. Memory and ASIC in a single package is a different way of looking at the problem.”

Revisiting SRAM
For on-chip memory, SRAM remains the standard in most SoC designs. Embedded DRAM, which was popular at older nodes, is not even supported by major foundries at 28nm and beyond because of its dynamic power requirements.

But SRAM has its own set of problems. It’s expensive, it takes up a lot of space, and it doesn’t shrink at Moore’s Law rates without high defect density. Moreover, there are limits to just how low the voltage can go. Prasad Saggurti, senior manager of product marketing for memory and memory test at Synopsys, said 0.8 volts is the nominal voltage at 16/14nm, while retention voltage is 0.5 volts. There is talk of pushing the voltage lower, however, because it can extend battery life and also increase the lifespan of parts.

“Memory design is changing,” said Saggurti. “There’s more focus on dynamic power than in the past. We’re also seeing more interest multiple-port memories because ARM is the dominant processor in this space, but so far we haven’t seen too much real activity. And we’re seeing a lot of people thinking about NBTI (negative bias temperature instability) with finFET memories.”

New plumbing
In conjunction with smaller memories, there also are issues with getting data to and from the memories more quickly. In the networking market, speeds of 100 Gbytes per second are not uncommon. At the high end of networking, these speeds can exceed 150 GBytes/sec.

“The memory market is divided between high-end networking, the mobile space and the rest of the world,” said Gopal Raghavan, CTO of the SoC realization group at Cadence. “For the high-end networking market the big problem is raw bandwidth. You either need more DRAM or more I/Os. For the mobile space, you’re dealing with high-end graphics. And for the rest of the world, DDR4 will be sufficient because it has up to 50GBytes of bandwidth.”

The high-end of the market is focused on Wide I/O, which offers low power and high bandwidth. There also are hybrid schemes emerging, combining just enough memory for graphics with Wide I/O plus flash. But there is uncertainty at just about every level over what exactly will be the best formula for area, power and performance at the SoC level, which memory controllers will work with which architectures, and how all of this will be affected as more cores are added to processors and cache coherence becomes critical.

“The NoCs can relieve congestion on the high-end SoCs, and allow you to change frequency up and down and control it better,” said Raghavan. “And there are tools for memory access patterns and modeling traffic that is latency sensitive or latency insensitive. But what’s more difficult is how you prioritize traffic and optimize all the IP. The IP business is definitely getting more interesting.”

Even the NoC part of the IP business is getting more interesting. Kurt Shuler, vice president of marketing at Arteris, said one of the new approaches for network-on-chip technology is to use one large NoC and break it into smaller NoCs, or take smaller NoCs and combine them into one large NoC.

“This allows derivatives or the integration of IP into a chip, and then you can chop off the part and glue it onto another one,” Shuler said.

From a memory standpoint, this becomes interesting because it paves the way for more complete subsystems that include digital logic, PHY, I/O and memory—and a quick way to attach the subsystem to something else.

The history of semiconductors has been about reducing bottlenecks and solving problems, and memory has been both a bottleneck and a solution over the past few decades. New approaches in stacking memory, new memory architectures and materials, and new I/O solutions are all under development.

Which approaches win will vary by market. Highly cost-sensitive applications will use the least expensive solution. High-performance computing and high-end networking will use the best-performing technologies. What isn’t clear, yet, is what direction the market in between—the high-end smartphones and tablets and some of those just below that threshold—will take. But there will certainly be plenty of people paying attention, because any advances in this market will affect the rest of the SoC design, from architecture all the way to manufacturing.