The Power Of HBM3 Memory For AI Training Hardware

The third generation of HBM boosts bandwidth while improving power efficiency and memory access.


AI training data sets are constantly growing, driving the need for hardware accelerators capable of handling terabyte-scale bandwidth. Among the array of memory technologies available, High Bandwidth Memory (HBM) has emerged as the memory of choice for AI training hardware, with the most recent generation, HBM3, delivering unrivaled memory bandwidth. Let’s take a closer look at this important memory technology.

HBM reached new performance heights with the introduction of HBM3 in January 2022. Like previous generations, HBM3 is based on a high-performance 2.5D/3D memory architecture and uses a wide 1024-bit data path. These features enable HBM3 to operate at 6.4 Gigabits per second (Gb/s), resulting in an outstanding bandwidth of 819 Gigabytes per second (GB/s). The combination of high bandwidth, excellent capacity, and a compact footprint positions HBM3 ideally for demanding AI workloads.

So, what exactly is a “2.5D/3D” architecture? The “3D” aspect is clear, as HBM memory is a stacked 3D structure of Dynamic Random-Access Memory (DRAM) within a packaged device. However, the “2.5D” dimension is related to how HBM memory devices interconnect with processing chips, whether they are Graphics Processing Units (GPUs) or AI accelerators. Herein lies the challenge: the data path between each HBM memory device and the processor requires 1024 “wires” or traces. With the addition of command and address, clocks, etc., the number of traces necessary grows to over 1,700. This exceeds what can be implemented on a standard Printed Circuit Board (PCB). To bridge this gap, a silicon interposer serves as the platform to make the essential connections between memory devices and processors. Like an intricate pattern etched on an integrated circuit, finely spaced traces are etched into the silicon interposer, allowing for the creation of the required network of wires in the HBM interface. The HBM device(s) and the processor are mounted on top of this interposer, creating a 2.5D architecture.

HBM3 is the “third generation” of the HBM standard (if we count HBM2E as an extension rather than a full generation update) continuing the upward trend in data rate, 3D stack height, and DRAM chip density. These advances translate to higher bandwidth and increased memory capacity. The HBM journey began with a 1 Gb/s data rate and a maximum of 8-high 3D stacks of 16 Gb devices. In contrast, HBM3 boasts a data rate of 6.4 Gb/s and the ability to support 16-high stacks of 32 Gb capacity DRAM. Furthermore, leading DRAM manufacturers have introduced “HBM3E” devices that push data rates even further, promising data rates of over 9 Gb/s. Another way to get more bandwidth is to increase the number of memory devices per accelerator, and chip architects are doing that too, pushing to higher attach rates in their designs. A configuration with six HBM3E devices each operating at 9.6 Gb/s, for example, would deliver a massive 7.4 terabytes per second (TB/s) of memory bandwidth.

However, it’s not only greater bandwidth that HBM3 delivers. It also introduces significant advancements in power efficiency, memory access, and Reliability, Availability, Serviceability (RAS) compared to its predecessor, HBM2E. HBM3 significantly reduces core voltage, dropping it to 1.1V from HBM2E’s 1.2V. Additionally, HBM3 lowers IO signaling to 400mV. These voltage reductions translate into better power efficiency, helping to offset the increased power consumption inherent in moving to higher data rates. The channel architecture of HBM3 is another major change. It divides the 1024-bit wide data channel into 16 64-bit channels or 32 32-bit pseudo-channels. This effectively doubles the number of memory channels, resulting in improved performance compared to HBM2E’s channel structure of 8 128-bit channels and 16 64-bit pseudo channels. HBM3 also introduces important features related to Reliability, Availability, Serviceability (RAS). It incorporates additional host-side and device-side Error Correcting Code (ECC) mechanisms and supports Refresh Management (RFM) and Adaptive Refresh Management (ARFM).

All of the above makes HBM3 an indispensable memory technology for state-of-the-art AI accelerators, catering to the demands of ever-expanding AI training and other data-intensive workloads. The Rambus HBM3 Memory Controller delivers a data rate of 9.6 Gigabits per second (Gb/s), supporting the continued evolution of HBM3 beyond the 6.4 Gb/s benchmark. The Rambus HBM3 Memory Controller can be combined with a broad variety of third-party HBM3 PHYs, and full support for the Controller and the Controller/PHY integration is provided.

Additional reading

Leave a Reply

(Note: This name will be displayed publicly)