Why compute architectures will start wrapping around the memory rather than the processor.
For decades, the starting point for compute architectures was the processor. In the future, it likely will be the DRAM architecture.
Dynamic random access memory always has played a big role in computing. Since IBM’s Robert Dennard invented DRAM back in 1966, it has become the gold standard for off-chip memory. It’s fast, cheap, reliable, and at least until about 20nm, it has scaled quite nicely.
There is plenty of debate about what comes next on the actual DRAM roadmap, whether that is sub-20nm DRAM or 3D DRAM. DRAM makers are under constant pressure to shrink features and increase density, but there are limits. That helps explain why there is no DDR5 on the horizon, and why LPDDR5 is the last in line for mobile devices.
All of this ties directly into compute architectures, where the next shift may be less about the process used to create the memory than where the memory is placed, how it is packaged, and whether a smaller form factor is useful.
There are several options on the table in this area. The first, the Hybrid Memory Cube (HMC), packs up to eight DRAM chips on top of a logic layer, all connected with through-silicon vias and microbumps. This is an efficient packaging approach, and it has been proven to be significantly faster than the dual in-line memory modules (DIMMs) found in most computers and mobile devices. But it’s also proprietary and may never achieve the kinds of economies of scale that DRAM is known for.
HMC was introduced in 2011, but systems using these chips didn’t start rolling out commercially until last year. The problem for HMC is that the second generation of high-bandwidth memory, a rival approach, also began rolling out last year. HBM likewise packs up to eight DRAM chips and connects them to the processor using a silicon interposer. HBM has a couple of important advantages, though. First, it is a JEDEC standard. And second, there are currently two commercial sources for these chips—SK Hynix and Samsung.
A third approach, which Rambus is exploring, is to put DRAM on a single card that can be shared by racks of servers in a data center. The goal, as with the other memory approaches, is to limit the distance that huge amounts of data have to travel before back and forth to be processed. This approach shows some merit in the cloud world, where huge data centers need a solution for minimizing distances that data needs to travel.
The key in all of these approaches is understanding that it isn’t the processor that is the bottleneck in compute performance anymore. It’s the movement of data from one or more processor cores in and out of memory. Processor cores, regardless of whether they are CPUs, GPUs, MPUs or even DSPs, generally run fast enough for most applications if there is an open path to memory. Just turning up the clock speed on processors doesn’t necessarily improve performance, and the energy costs are significant. Those costs can be measured in data center operating costs and and mobile device battery life.
The two big knobs for boosting performance are more efficient software (a subject for another story), and faster movement of data in and out of memory. While multiple levels of embedded SRAM help improve processor performance for some basic functionality, the real heavy lifting on the memory side will continue to involve DRAM for the foreseeable future. That requires a change in memory packaging and I/O, but in the future it also will become a driver for new packaging approaches for entire systems, from the SoC all the way up to the end system format.
New memory types will come along to fill in the spaces between SRAM and DRAM—notably MRAM, ReRAM and 3D XPoint—but there will always be a need for a more efficient DRAM configuration. What will change is that entire chip architectures will begin to wrap around memories rather than processors, softening the impact of what arguably is one of the biggest shifts in the history of computing.