It’s Official: HBM3 Dons The Crown Of Bandwidth King

How the latest version of the High Bandwidth Memory standard is keeping up with increasingly demanding applications.


With the publishing of the HBM3 update to the High Bandwidth Memory (HBM) standard, a new king of bandwidth is crowned. The torrid performance demands of advanced workloads, with AI/ML training leading the pack, drive the need for ever faster delivery of bits. Memory bandwidth is a critical enabler of computing performance, thus the need for the accelerated evolution of the standard with HBM3 representing the new benchmark.

Here’s what HBM3 offers:

  • Delivers, first and foremost, higher data rate. HBM3 raises the per-pin data rate to 6.4 Gigabits per second (Gb/s), double that of HBM2 (and a 78% increase over the 3.6 Gb/s data rate of HBM2E).
  • Keeps the 1024-bit wide interface of previous generations. Bandwidth is the product of data rate and interface width, so 6.4 x 1024 is 6,554 Gb/s. Dividing by 8 bits/byte yields a bandwidth of 819 Gigabytes per second (GB/s) that is possible between a host processor and a single HBM3 DRAM device.
  • Doubles the number of memory channels to 16 and supports 32 virtual channels (with two pseudo-channels per channel). With more memory channels, HBM3 can support higher stacks of DRAM per device and finer access granularity.
  • Supports 3D DRAM devices of up to 12-high stacks (with provision for a future extension to as high as 16 devices per stack) with device densities of up to 32Gb. A 12-high stack of 32Gb devices translates to a single HBM3 DRAM device of 48GB capacity.
  • Keeps the 2.5D architecture of host processor and HBM3 DRAM devices mounted on an interposer to support the routing of thousands of signal traces. So as with previous generations, HBM3 is a 2.5D/3D architecture.
  • Improves energy efficiency by dropping the operating voltage to 1.1V and by using low-swing 0.4V signaling.

Let’s roll it all up in a potential use case. A future AI accelerator implementation has six (6) HBM3 DRAM devices. Total aggregate memory bandwidth at 6.4 Gb/s is 4.9 Terabytes per second (TB/s). Each 12 x 32Gb HBM3 DRAM device has a 48GB capacity, so the AI accelerator can access 288 GB of direct-attached HBM3 memory.

That’s tremendous capability. HBM3 extends the track record of bandwidth performance set by what was originally dubbed the “slow and wide” HBM memory architecture. While the interface is still wide, HBM3 operating at 6.4 Gb/s is now really quite fast. All things being equal, higher speeds mean higher power. The motivation of the wide interface (which necessitated the higher complexity 2.5D architecture) was to run at low data rates delivering high bandwidth at low power. To compensate, HBM3 drops the operating voltage (the last bullet in our list above) for higher power efficiency.

But there is no free lunch, and lower voltage means lower design margin for what is already a challenging 2.5D design. Fortunately, Rambus has your back with our 8.4 Gb/s HBM3 Memory Subsystem that provides plenty of design headroom plus room to scale. To help you successfully harness the full potential of HBM3 memory, Rambus provides both interposer and package reference designs.

The Rambus memory subsystem includes a modular and highly configurable memory controller. The controller is optimized to maximize throughput and minimize latency, and its memory parameters are run-time programmable. With a pedigree of over 50 HBM2 and HBM2E customer implementations, it has demonstrated efficiency over a wide variety of configurations and data traffic scenarios.

While the road to higher performance is a journey and not a destination, the latest generation of HBM promises to deliver some very extraordinary capabilities. All hail the new king of memory bandwidth, HBM3.

Additional resources:

Leave a Reply

(Note: This name will be displayed publicly)