The different flavors of DRAM each fill a particular AI niche.
To meet the increasing demands of AI workloads, memory solutions must deliver ever-increasing performance in bandwidth, capacity, and efficiency. From the training of massive large language models (LLMs) to efficient inference on endpoint devices, choosing the right memory technology is critical for chip designers. This blog explores three leading memory solutions—HBM, LPDDR, and GDDR—and their suitability for AI accelerators.
Generative AI and LLMs have redefined computational requirements, with models exceeding a trillion parameters and demanding immense memory bandwidth for training. High Bandwidth Memory (HBM) has become the go-to solution for AI training, thanks to the bandwidth capabilities of its revolutionary 2.5D/3D architecture.
HBM4, the latest iteration nearing standardization by JEDEC, builds on the success of HBM3 and HBM3E. By doubling the data lines to 2,048 and supporting data rates up to 6.4 Gb/s (Gigabits per second), HBM4 can achieve a bandwidth of 1.6 TB/s (Terabytes per second) per device. An accelerator equipped with eight HBM4 devices can deliver 13 TB/s of aggregate memory bandwidth, no other memory solution comes close.
The tradeoff is HBM’s 2.5D/3D architecture introduces greater complexity and cost. The 2.5D aspect is the use of a silicon interposer as the interconnect platform. The silicon interposer is etched with the thousands of traces, far more than can be implemented on a PCB, needed between the HBM devices and the accelerator. The HBM devices are 3D stacks of DRAM chips offering an extremely compact and power-efficient solution.
As Gen AI capabilities extend beyond data centers to the edge and ultimately endpoint devices such as smartphones and laptops, Low-Power Double Data Rate (LPDDR) memory is an alternative for inference. LPDDR’s evolution from DDR technology emphasizes low power consumption without compromising bandwidth or capacity, making it a great solution for compact, power-constrained devices.
Like HBM, LPDDR has multiple DRAM die per package. With HBM, the DRAM stack is interconnected with Through Silicon Vias (TSVs). LPDDR, on the other hand, uses a stack of wirebonded DRAM devices with configurations of up to 64GB of memory in a multi-die package. For optimal inference performance, the whole model being used should be loaded into main memory, which makes the high-capacity capabilities of LPDDR very attractive.
From a bandwidth perspective, LPDDR5X delivers data rates of up to 8.533 Gb/s and with x64 configuration can achieve aggregate bandwidths of 68 GB/s (Gigabytes per second). The next evolution, LPDDR5T (“Turbo”), pushes data rates to 9.6 Gbps, delivering 76.8 GB/s of aggregate bandwidth. With its compact form factor and energy efficiency, LPDDR5T enables endpoint AI solutions to process data-intensive tasks without sacrificing battery life.
Graphics Double Data Rate (GDDR) memory is traditionally associated with GPUs, but its high bandwidth and low latency make it an excellent choice for AI inference workloads, particularly in edge servers and client PCs. GDDR7, the latest generation released in 2023, sets a new benchmark in performance.
With data rates of 32 GT/s and future scalability up to 48 GT/s, GDDR7 delivers bandwidths of 128 GB/s per device. By employing a novel PAM3 signaling scheme, GDDR7 achieves a 50% improvement in data transmission efficiency over its predecessor, GDDR6. This makes it an ideal solution for real-time inference tasks requiring rapid processing of text, images, video, and more.
In addition to higher speed, GDDR7 introduces advanced RAS (Reliability, Availability, Serviceability) features, including on-die ECC, error scrubbing, and command address parity. These enhancements ensure data integrity, a critical factor as AI accelerators push the limits of performance.
For chip designers, the choice of memory depends on the target application and performance requirements.
Whatever your choice, Rambus offers a memory controller that can be paired with your desired PHY, whether it be internally developed or from a third party. Rambus can provide full integration service to ensure first-time right implementation of the entire memory subsystem.
Links:
Leave a Reply