GDDR6 Memory Enables High-Performance AI/ML Inference

Keeping inference processors fed with data requires extremely high bandwidth memory.


A rapid rise in the size and sophistication of inference models has necessitated increasingly powerful hardware deployed at the network edge and in endpoint devices. To keep these inference processors and accelerators fed with data requires a state-of-the-art memory that delivers extremely high bandwidth. This blog will explore how GDDR6 supports the memory and performance requirements of artificial intelligence and machine learning (AI/ML) inference workloads.

Graphics double data rate (GDDR) memory can be traced back to the rise of 3D gaming on PCs and consoles. The first graphics processing units (GPU) packed single data rate (SDR) and double data rate (DDR) DRAM – the same solution used for CPU main memory. As gaming evolved, the demand for higher frame rates at ever higher resolutions drove the need for a graphics-workload specific memory solution.

GDDR6 is a state-of-the-art graphics memory solution with performance demonstrated to 18 gigabits per second (Gbps) – and per device bandwidth of 72 GB/s. GDDR6 DRAM employs a 32-bit wide interface comprised of two fully independent 16-bit channels. For each channel, a write or read memory access is 256-bits or 32-bytes. A parallel-to-serial converter translates each 256-bit data packet into sixteen 16-bit data words that are transmitted sequentially over the 16-bit channel data bus. Due to this 16n prefetch, an internal array cycle time of 1ns equals a data rate of 16 Gbps.

AI/ML has been traditionally deployed in the cloud due to the massive amounts of data and computing resources it requires. However, we are now seeing more and more AI/ML inference shifting to the network edge and in endpoint devices, leaving the computation-intensive training to be done in the cloud. AI/ML at the edge comes with many advantages, including the ability to process data faster and more securely, which is something that is especially important for applications requiring real-time action. Of course, with this comes the need for specific memory requirements.

For inference, memory throughput speed and low latency are critical. This is because an inference engine may need to handle a broad array of simultaneous inputs. For example, an autonomous vehicle must process visual, LIDAR, radar, ultrasonic, inertial, and satellite navigation data. As inference moves increasingly to AI-powered edge and endpoints, the need for a memory solution that is manufacturing-proven is paramount. With reliability demonstrated across millions of devices, efficient cost, and outstanding bandwidth and latency performance, GDDR6 memory is an excellent choice for AI/ML inference applications.

Designed for performance and power efficiency, the Rambus GDDR6 memory subsystem supports the high-bandwidth, low-latency requirements of AI/ML for both training and inference. It consists of a co-verified PHY and digital controller – providing a complete GDDR6 memory subsystem. The Rambus GDDR6 interface is fully compliant with the JEDEC GDDR6 JESD250 standard, supporting up to 16 Gbps per pin.

AI/ML applications continue to evolve at lightning speed, and memory really is key to enabling these advances. The memory industry ecosystem, including memory IP providers like Rambus, are continuing to innovate to meet the future needs of these demanding systems.

Additional resources:

Leave a Reply

(Note: This name will be displayed publicly)