How AI 2.0 Will Shape The Memory Landscape

The need for more memory bandwidth and capacity spans from data center to endpoint.

popularity

AI is such a big part of our lives that we don’t even think about it as “AI”; it’s simply normal life these days. If you’ve asked your home assistant for the weather, used a search engine, or been recommended something to watch today, then that’s all been AI discretely at work. While these AI-enabled applications represent notable advancements in incorporating intelligence into systems, they do have a limited set of inputs and outputs. For example, you can use speech to output speech or text to output text. Essentially, they are systems that analyze data and make predictions, without creating anything new in the process.

With the meteoric rise of applications like ChatGPT in the past year, we’ve now firmly transitioned to the next phase of AI where we have systems that can create something new from data. This evolution, or AI 2.0 as it’s being called by some, marks a new era characterized by generative AI capabilities, made possible by large language models (LLMs). These LLMs can understand and interpret complex inputs to deliver outputs ranging from traditional text-based responses to more advanced forms such as code, images, video, and even 3D models. This multi-modality, or breadth and combination of inputs and outputs, opens limitless possibilities for creativity and innovation.

There are several notable trends that are set to influence the technologies used across the AI 2.0’s data pipeline. Firstly, large language model sizes have been increasing 10x every year for the last few years and show no sign of slowing down any time soon. The state-of-the-art model size is currently over a trillion parameters, with GPT-4 being a prime example. The larger the model, the greater the accuracy and the capability to tackle even more complex tasks. At the same time, we’re also witnessing the democratization of AI, with more open-sourcing of models and frameworks – greatly expanding the universe of developers and professionals that can contribute to the advancement of generative AI. And finally, for reasons of scalability, flexibility, and confidentiality, there’s the desire to run generative AI applications at the edge or on endpoints, in addition to the data center cloud.

At the heart of these trends is the constant need for more memory bandwidth and capacity. And this is a need that spans the entire computing landscape from AI training in the heart of the data center to AI applications at the edge or on endpoints such as desktops, laptops and mobile phones. Delivering more memory bandwidth and capacity, as we all know, is not a simple task. In many cases, we’re pushing up against practical limits in signaling rates and package pin counts, limiting the growth of bandwidth. And at the same time, we’re increasingly up against the limits of memory cell shrinks and physical constraints, limiting the growth of memory capacity.

What we are likely to see in the future are many new innovations and novel architectures applied to the memory subsystem that will help us overcome the limits of scaling bandwidth and capacity for AI 2.0. These include innovations like multiplexing, with DDR5 MRDIMM being an example under discussion in the industry right now, serial attached memory options like CXL, new encoding schemes like the PAM4 signaling used in the recently announced GDDR7 specification, memory stacking, and much more.

Rambus state-of-the-art chip and silicon IP solutions deliver the performance and security needed at scale for demanding AI 2.0 workloads, from complex training in the heart of the data center to high-speed inference at the edge. Find out more at www.rambus.com.


Tags:

Leave a Reply


(Note: This name will be displayed publicly)