GDDR7 Memory Supercharges AI Inference

High bandwidth and low latency are paramount for AI-powered edge and endpoints.

popularity

GDDR7 is the state-of-the-art graphics memory solution with a performance roadmap of up to 48 Gigatransfers per second (GT/s) and memory throughput of 192 GB/s per GDDR7 memory device. The next generation of GPUs and accelerators for AI inference will use GDDR7 memory to provide the memory bandwidth needed for these demanding workloads.

AI is two applications: training and inference. With training, memory bandwidth and capacity are critical requirements. This is particularly so given that the size and complexity of neural networks are on a pace of 10X increase per year.

Neural network accuracy depends on the quality and quantity of examples in the training data set, which translates into needing enormous amounts of data. Application-specific silicon solutions have been developed to speed training runs given the enormous “time-to-market” value potential.

The output of neural network training is an inference model that can be deployed broadly. With this model, an inference device can process and interpret inputs outside the bounds of the training data. For inference, memory throughput speed and low latency are critical, especially when real-time action is needed. An inference engine may need to process a broad array of media including text, images, speech, music, video and more. As inference moves increasingly to AI-powered edge and endpoints, the need for a memory solution with outstanding bandwidth and latency performance is paramount.

On the critical parameter of bandwidth, GDDR7 memory really shines. At a data rate of 32 GT/s, and a 32-bit wide interface, a GDDR7 device can deliver 128 GB/s of memory bandwidth, more than double that of a memory such as LPDDR5T. GDDR7 memory offers a great combination of speed, bandwidth and latency performance for the most demanding AI inference workloads.

GDDR7 was released by JEDEC in March of this year and offers several new features designed to meet the escalating demand for more memory bandwidth in AI applications. GDDR7’s transition to three-bit pulse amplitude modulating (PAM3) encoding is the biggest technical change compared to GDDR6, which used NRZ (PAM2) signaling and had a practical limit of 24 Gbps. With this new encoding scheme, GDDR7 can transmit “3 bits of information” per cycle, resulting in a 50% increase in data transmission compared to GDDR6 at the same clock speed.

GDDR7 can, therefore, accommodate much higher data rates than the previous generation. The first GDDR7 memory devices are expected to run at around 32 GT/s, but the JEDEC specification leaves room for future expansion of data rates up to 48 GT/s.

With the move to higher data rates, RAS (Reliability, Availability, Serviceability) has become an increasingly important consideration. GDDR7 addresses the need for greater reliability by incorporating, amongst other things, additional data integrity features including on-die ECC with real time reporting, data poison, error check and scrub, and command address parity with command blocking (CAPARBLK).

Finally, another difference between GDDR6 and GDDR7 is the number of channels. GDDR6 used two 16-bit channels, but GDDR7 uses four 10-bit channels (8 bits data, 2 bits error reporting).

The Rambus GDDR7 Controller IP is designed for use in applications requiring high-memory throughput, high clock rates and full programmability. The GDDR7 Controller accepts commands using AXI or a simple local interface and translates them to the command sequences required by GDDR7 SGRAM devices. The controller also supports all low-power modes. The Rambus GDDR7 Controller IP offers GDDR7 performance of up to 40 GT/s and 160 GB/s of available bandwidth per GDDR7 memory device. The controller supports all GDDR7 link features, including PAM3 and NRZ signaling, CRC with retry for reads and writes, data scramble, data poison, clamshell mode and DQ logical remap.

The rapid rise in the size and sophistication of AI inference models requires increasingly powerful AI accelerators and GPUs deployed in edge servers and client PCs. To keep these inference processors and accelerators fed with data requires state-of-the-art GDDR7 memory that delivers an excellent combination of high bandwidth and low latency.



Leave a Reply


(Note: This name will be displayed publicly)