Driving Memory Beyond DDR4

Total memory bandwidth will need to increase about 33% per year to keep pace with processor improvements.

popularity

While attending recent technology trade shows, the Intel Developer’s Forum (IDF) in August and last week’s ARM TechCon, I participated in many interesting discussions around server performance, power consumption, memory bandwidth and capacity. The race to introduce higher-performing servers that consume less power is fueled by the growing demand for new applications in the enterprise, communications, storage, and cloud computing markets.

Surrounding IDF a number of new servers were introduced using Intel’s 18-core Xeon E5-2600 v3 processor (Grantley). These servers (from HP, Dell, Lenovo and IBM) are the first to take advantage of DDR4 memory.

DDR4 memory improves server platform performance on memory-intensive workloads with up to 1.4x higher bandwidth versus previous generations. The use of DDR4 memory delivers an increase in bandwidth with a maximum data transfer speed of 3.2Gbps along with a lower core voltage (1.2v) providing a reduction in power consumption compared to DDR3, which is the primary memory solution used in servers today.

In addition to the Intel-based server announcements in August, last week at ARM TechCon AppliedMicro and Hewlett-Packard debuted the first commercially available 64-bit ARMv8 server. The new server, targeted at Web caching workloads, is based on AppliedMicro’s X-Gene SoC. The X-Gene1 chip has up to eight cores running at 2.4 GHz, four DDR3 memory channels, and two 10 Gbps Ethernet NICs embedded on the SoC. Higher memory bandwidth is achieved by using four memory channels that include two DDR3L-1600 SO-DIMMs per channel for a total of eight x8GB DIMMs (per server cartridge). Although this platform is using DDR3, the X-Gene cores scales well (according to HP) as it is upgraded to DDR4. Upgrading to DDR4 allows for even higher memory bandwidth with lower power consumption.

Given this backdrop of server announcements, I wasn’t surprised that the Rambus ‘Beyond DDR4’ demonstration at ARM TechCon generated many good discussions and speculation about the future of memory subsystems in servers. Our ‘Beyond DDR4’ is a silicon demonstration showing a multi-rank, multi-DIMM configuration with data transfers up to 6.4Gbps, targeted for the next-generation server memory systems. The memory interface is more than 3x the speeds of current DIMMs that top out at 2.133Gbps, and 2x the maximum speed specified for DDR4 at 3.2Gbps. In addition to the significant performance improvements, the memory system provides a 25% improvement in power efficiency.

How is this level of performance achieved? First, the system uses low-swing single-ended signaling that is compatible with the current industry standard DDRx I/O designs. Maintaining compatibility is important to keep server memory designs on the current industry roadmap since it will not require a change to other memory solutions like hybrid memory cube (HMC) or high bandwidth memory (HBM). The system also uses dynamic point-to-point topology, which means all data signals between CPU and DRAMs are point-to-point, allowing the memory bus to run at maximum data rates even when the memory channel is fully loaded (which is not the case with multi-drop topology).

The 25% power savings is attributed to several factors. The low-swing signaling reduces the I/O power required on the interface. In addition, the design is ‘asymmetric,’ meaning that complex timing and the equalization circuits are all implemented in the PHY, thus greatly simplifying the DRAM interface and reducing cost. Removing complex timing circuits like PLLs and DLLs from the DRAM makes it extremely agile allowing it to come into and out of power down mode rapidly. Because the memory controller is the originator of all memory requests, it can implement a very aggressive and granular DRAM power management scheme.

As we look to server requirements over the next five years, it is estimated that the total memory bandwidth will need to increase ~33% per year to keep pace with processor improvements. Given this projection, the DRAM would have to achieve speeds of over 12Gbs in 2020! Although this is a 4X speed increase over the current DDR4 standard, the Rambus ‘Beyond DDR4’ silicon shows traditional DRAM signaling still has plenty of headroom for growth and that these speeds are possible.