Memory throughput speed and low latency are critical as inference shifts from the data center to the network edge.
AI/ML is evolving at a lightning pace. Not a week goes by right now without some new and exciting developments in the field, and applications like ChatGPT have brought generative AI capabilities firmly to the forefront of public attention.
AI/ML is really two applications: training and inference. Each relies on memory performance, and each has a unique set of requirements that drive the choice for the best memory solution.
With training, memory bandwidth and capacity are critical requirements. This is particularly so given the size and complexity of neural network data models that have been growing at a rate of 10X per year. Neural network accuracy depends on the quality and quantity of examples in the training data set which translates into needing enormous amounts of data, and hence memory bandwidth and capacity.
Given the value created through training, there is a powerful incentive to complete training runs as quickly as possible. As training applications run in data centers increasingly constrained for power and space, solutions that offer power efficiency and smaller size are favored. Given all these requirements, HBM3 is an ideal memory solution for AI training hardware. It provides excellent bandwidth and capacity capabilities.
The output of neural network training is an inference model that can be deployed broadly. With this model, an inference device can process and interpret inputs outside the bounds of the training data. For inference, memory throughput speed and low latency are critical, especially when real-time action is needed. With more and more AI inference shifting from the heart of the data center to the network edge, these memory features are becoming even more critical.
Designers have a number of memory choices for AI/ML inference, but on the critical parameter of bandwidth, GDDR6 memory really shines. At a data rate of 24 Gigabits per second (Gb/s), and a 32-bit wide interface, a GDDR6 device can deliver 96 Gigabytes per second (GB/s) of memory bandwidth, more than double that of any alternative DDR or LPDDR solutions. GDDR6 memory offers a great combination of speed, bandwidth and latency performance for AI/ML inference, in particular for inference at the edge.
The Rambus GDDR6 memory interface subsystem offers performance of 24 Gb/s and is built on a foundation of over 30 years of high-speed signal integrity and power integrity (SI/PI) expertise, critical to operating GDDR6 at high speeds. It consists of a PHY and digital controller – providing a complete GDDR6 memory interface subsystem.
Join me at the Rambus webinar this month on “High-Performance AI/ML Inference with 24G GDDR6 Memory” to discover how GDDR6 supports the memory and performance requirements of AI/ML inference workloads and learn about some of the key design and implementation considerations of GDDR6 memory interface subsystems.
Resources:
Leave a Reply