Why compute architectures will start wrapping around the memory rather than the processor.
For decades, the starting point for compute architectures was the processor. In the future, it likely will be the DRAM architecture.
Dynamic random access memory always has played a big role in computing. Since IBM’s Robert Dennard invented DRAM back in 1966, it has become the gold standard for off-chip memory. It’s fast, cheap, reliable, and at least until about 20nm, it has scaled quite nicely.
There is plenty of debate about what comes next on the actual DRAM roadmap, whether that is sub-20nm DRAM or 3D DRAM. DRAM makers are under constant pressure to shrink features and increase density, but there are limits. That helps explain why there is no DDR5 on the horizon, and why LPDDR5 is the last in line for mobile devices.
All of this ties directly into compute architectures, where the next shift may be less about the process used to create the memory than where the memory is placed, how it is packaged, and whether a smaller form factor is useful.
There are several options on the table in this area. The first, the Hybrid Memory Cube (HMC), packs up to eight DRAM chips on top of a logic layer, all connected with through-silicon vias and microbumps. This is an efficient packaging approach, and it has been proven to be significantly faster than the dual in-line memory modules (DIMMs) found in most computers and mobile devices. But it’s also proprietary and may never achieve the kinds of economies of scale that DRAM is known for.
HMC was introduced in 2011, but systems using these chips didn’t start rolling out commercially until last year. The problem for HMC is that the second generation of high-bandwidth memory, a rival approach, also began rolling out last year. HBM likewise packs up to eight DRAM chips and connects them to the processor using a silicon interposer. HBM has a couple of important advantages, though. First, it is a JEDEC standard. And second, there are currently two commercial sources for these chips—SK Hynix and Samsung.
A third approach, which Rambus is exploring, is to put DRAM on a single card that can be shared by racks of servers in a data center. The goal, as with the other memory approaches, is to limit the distance that huge amounts of data have to travel before back and forth to be processed. This approach shows some merit in the cloud world, where huge data centers need a solution for minimizing distances that data needs to travel.
The key in all of these approaches is understanding that it isn’t the processor that is the bottleneck in compute performance anymore. It’s the movement of data from one or more processor cores in and out of memory. Processor cores, regardless of whether they are CPUs, GPUs, MPUs or even DSPs, generally run fast enough for most applications if there is an open path to memory. Just turning up the clock speed on processors doesn’t necessarily improve performance, and the energy costs are significant. Those costs can be measured in data center operating costs and and mobile device battery life.
The two big knobs for boosting performance are more efficient software (a subject for another story), and faster movement of data in and out of memory. While multiple levels of embedded SRAM help improve processor performance for some basic functionality, the real heavy lifting on the memory side will continue to involve DRAM for the foreseeable future. That requires a change in memory packaging and I/O, but in the future it also will become a driver for new packaging approaches for entire systems, from the SoC all the way up to the end system format.
New memory types will come along to fill in the spaces between SRAM and DRAM—notably MRAM, ReRAM and 3D XPoint—but there will always be a need for a more efficient DRAM configuration. What will change is that entire chip architectures will begin to wrap around memories rather than processors, softening the impact of what arguably is one of the biggest shifts in the history of computing.
Related Stories
What’s Next For DRAM?
1xnm DRAM Challenges
An Insider’s Guide To Planar And 3D DRAM
Dynamic Random Access Memory in our Knowledge Center
Hi Ed
You are 1000 x right.
Stanford-Berkeley-CMU jointly publish recently – “Stanford-led skyscraper-style chip design boosts electronic performance by factor of a thousand, Stanford Report, December 9, 2015.”
Best Regards, Zvi
PS.
‘skyscraper-style chip design’ is Monolithic 3D IC.
Logic in memory has been around since the 1960s. Search on “logic in memory”. See the PEPE system developed for ballistic missile defense. In the 1970s we toyed with the idea of fitting logic onto DRAM chips at the sense amp outputs, to do vector processing, or stacking DRAM with logic chips. See Dave Patterson’s 1990s IRAM project at Berkeley. The challenge in all of these is that commodity DRAM is very cheap, so there has to be a huge performance gain to use non-commodity memory, or eDRAM.
Actually, neither MRAM, ReRAM nor 3D XPoint fill the gap between SRAM and DRAM. DRAM will fill the gap between SRAM and those new (slower) memory technologies.
Basically, RAM is of two types, DRAM and SRAM. One is dynamic and the other is static. Both hold data, but in different ways, Static is faster than the dynamic. Also the given information is very useful and informative.