How the scaling of memory bandwidth and capacity keeps system performance improving.
Moore’s Law, the observation that the available transistors in an integrated circuit doubles every two years, has driven the semiconductor and IT industries to unparalleled growth over the last 50+ years.
These transistors have been used in CPUs to increase the number of parallel execution units and instruction fetches, expand the levels of on-chip cache (and overall capacity), support speculative/predictive capabilities, as well as out-of-order and multi-threading hardware. Moore’s Law has also enabled memory controller support, the expansion of instruction sets for acceleration, the re-architecture of the CPU into a multi-core system and the addition of IO and co-processor interfaces. Put simply, Moore’s Law has allowed the cost per unit of compute to decline at an astounding rate and has given rise to server virtualization and the use of these virtualized servers in the cloud.
But to realize actual system performance improvements proportional to the raw CPU computational increase requires a balanced system. In 1967, Gene Amdahl theorized that the speedup of parallel computers interconnected to accomplish a workload will be ultimately limited by the portions of the workload that are serial in nature (e.g. data management housekeeping). He concludes that the effort in achieving high parallel processing rates will be wasted unless there is an equivalent speedup in the serial portion as well. Similarly, in today’s multi-core, multi-threaded processors running parallel tasks and parallel virtual servers, the external memory interface bandwidth and capacity are shared resources that can limit the overall speedup achieved unless they also follow the same growth curve as Moore’s Law.
Memory for servers (as well as for desktops and workstations) are procured and deployed in the form of dual in-line memory modules (DIMM). This form factor fits the application space well as it integrates 8-18 DRAM chips in parallel to achieve suitable system bandwidth and minimum capacity, allows for serviceability if a module has ECC errors or goes bad, and allows for expandability through population of previously unused slots or upgrading of existing DIMMs. Traditionally, DIMMs have just been composed of DRAMs, but other memory types such as Flash and 3D Xpoint are on the horizon. On the DIMM itself, multiple “ranks” of DRAM (sets of 8-18 DRAM chips that share the same data, control, and address signal lines as rank 0) can be used to increase capacity of the DIMM. Furthermore, multiple DIMMs can be present on a single CPU memory channel to once again increase overall capacity. With the rise of in-memory compute, database applications, and low latency data analytics, maximizing the amount of server memory per CPU is highly desirable.
While this topology and form factor are very flexible and allow for a range of memory system capacities, the capacity increases come at the cost of loading down the fixed number of channels (composed of data, control, and address signals) per CPU and reducing the achievable bandwidth per channel. The addition of memory buffer chips on the DIMM to isolate the full DIMM load from the CPU channel can break this fundamental tradeoff and enable high system capacity and bandwidth simultaneously. This technology was first introduced in standard DIMMs during the SDRAM era of the 90’s and started off as a simple clock repeater.
In conclusion, the complexity, capability, and value of memory buffer chips has evolved over the last 20 years and its importance will continue to grow as the industry tries to sustain Moore’s Law and the requisite scaling of memory bandwidth and capacity.
Leave a Reply