The Best DRAMs For Artificial Intelligence

The choice of DRAM depends on where the action is.

popularity

Artificial intelligence (AI) involves intense computing and tons of data. The computing may be performed by CPUs, GPUs, or dedicated accelerators, and while the data travels through DRAM on its way to the processor, the best DRAM type for this purpose depends on the type of system that is performing the training or inference.

The memory challenge facing engineering teams today is how to keep up with the rapidly increasing computational requirements for AI, and synchronous DRAM plays a critical role. All that data needs to be processed, stored, and accessed, and any mismatch in those steps can impact overall system performance.

“We’re getting all this compute power,” said Frank Ferro, product marketing group director at Cadence. “But how do you take advantage of it from a memory bandwidth standpoint?”

There is no simple answer, and no one-size-fits-all solution. Today, there are four classes of synchronous DRAM (SDRAM), each with its own targeted uses and tradeoffs:

  1. Double data rate (DDR) memory tends to accompany CPUs (and complex-instruction-set architecture, or CISC, processor in particular). Programs may have complex branching and a wide variety of operations, and DDR is optimized for such computing. DDR is the most general-purpose architecture, has the fastest latency (time to first data), with moderate bandwidth due to a 64-bit data bus. The “double-data-rate” aspect refers to the fact that data is clocked into and out of the memory on both rising and falling edges of the clock. This was not typical for prior memories and most logic, which clock only on one edge.
  2. Low-Power DDR (LPDDR) is similar to DDR, but it includes a number of features that were gradually added over successive generations in order to reduce power while maintaining high performance. Its power-saving features include:
    • Lower supply voltage;
    • A temperature-compensated refresh rate that allows less frequent refresh when cold;
    • Deep and partial power-down modes;
    • Partial array-refresh options;
    • Write leveling, which compensates for skew between the data-strobe and clock signals;
    • Command/address training to optimize signal timing and integrity;
    • Lower I/O capacitance;
    • A six-bit single-data-rate (SDR) command and address bus on later generations rather than the prior 10-bit DDR bus;
    • Two half-width buses rather than one full width;
    • Differential clocks;
    • Data-copy and Write-X (write all 1s or all 0s) commands to reduce data transfers for those use cases, and
    • Differential voltage and frequency scaling (DVFS).

    Later generations contain a more complex clock structure that has a quarter-speed master clock running constantly, from which multiple full-speed clocks are derived that run only when necessary.

    LPDDR is not built into dual in-line memory modules (DIMMs). Instead, it is housed in BGA packages to be soldered directly to the board.

  3. Graphics DDR (GDDR) is a variant created to accompany GPUs for graphics processing. It has much higher bandwidth than DDR for feeding large volumes of graphics data to the processor, but it also has higher latency than DDR. “GDDR is much better for bandwidth, but capacity is a problem,” said Ferro.
  4. High-bandwidth memory (HBM) involves stacks of DRAM chips with very wide buses that can provide the very high bandwidth necessary to keep memory access from being a bottleneck in data-intensive computing such as AI training, AI inference, and high-performance computing (HPC).

The primary differences in these four DRAM types are the access protocols, not the memory cells. “Whether you’re using GDDR, LPDDR, DDR, or HBM, it’s basically all the same memory technology underneath the hood,” noted Ferro. “You’ve still got the basic DRAM technology. It’s about how you’re accessing that DRAM.”

Those different access approaches can have a big impact on performance and power consumption.

Fig. 1: Comparison of different SDRAM families. Capacity is on a per-die basis (not per-stack for HBM). No single family excels at everything. DDR and LPDDR can have comparable throughput and capacity, with cost being a major differentiator. Source: Bryon Moyer/Semiconductor Engineering

King of the data center
The data center is HBM’s undisputed territory. “We believe HBM will remain primarily the domain of data centers for training and ultra-fast interface,” said Ramteja Tadishetti, principal software engineer at Expedera. “But the price tag of HBM likely keeps it in the cloud and away from cost-conscious edge devices.”

HBM consumes more energy and has a higher price, but so does everything else in the data center. “While HBM is the most expensive and most power-consuming choice for local memory, the cost and power are a rounding error compared to the cost and power of full-reticle-sized die used in training chips,” observed Steve Roddy, chief marketing officer of Quadric. “To use a real estate analogy, if you purchase a plot of land in Beverly Hills for $25 million, you don’t then scrimp and save on the build cost of the house. The same is true in the data center. Once you’ve committed to expensive die and packaging, the HBM increment is inconsequential. The corollary to that is we’ve not seen any planned uses of HBM outside of the data center — not even the high-end automotive market. Car companies building high-end SAE Level 4 automated driver-assistance systems (ADAS) want silicon solutions that are air-cooled and cost less than four figures. They cannot accommodate 1,000 watt modules that cost $10,000 or more.”

These cost considerations will force those who can’t afford HBM to make compromises. “Hyperscalers typically have deep enough pockets and enough resources that they can go for HBM,” explained Brett Murdock, product line director for memory interfaces at Synopsys. “Second‑tier players have to start making tradeoffs as they don’t necessarily have the volumes necessary to get the attention of the HBM vendors or the 2.5D assemblers to support an HBM solution.”

HBM is particularly important for training, which requires higher bandwidth than inference. It still is used for data-center inference, but LPDDR and GDDR are gaining traction there, as well. “HBM has become popular for near-memory usage for training these models,” said Ferro. “My guess is that GDDR and LPDDR are going to be the dominant memories on inference accelerator cards.”

Murdock agreed, pointing to more of a mix. “Training requires more memory than inference, so one might desire a combination of HBM4 and LPDDR6, where the LPDDR6 is along for the capacity ride — unless you’ve already taken the step down from HBM4 to LPDDR6 for other reasons.”

Samsung is witnessing a similar trend. “We’re seeing a lot more mixed memories,” said Kevin Yee, senior director of IP and ecosystem marketing at Samsung. “It’s not just going with DDR, LPDDR, GDDR, or HBM. We’re starting to see mixes to save on power, where you’ll mix DDR and LPDDR, or HBM and LPDDR.”

One emerging angle is custom HBM, where high-volume purchasers can work with memory makers to swap the standard logic base die at the bottom of the stack for a custom die with proprietary value-added functions or even optimized channels. “Going to a custom HBM where you can use some kind of proprietary die-to-die protocol will give you better bandwidth and better shoreline efficiency,” noted Yee.

While heat is a concern for all memories, it’s critical for HBM because it’s a stack, and any stack raises challenges for thermal dissipation, especially from dies in the middle of the stack. Multi-physics simulations are necessary, but those require more precise models. “There are a number of challenges in representing the thermal behavior of such a 3D stack,” said Roland Jancke, head of design-methodology department at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “This is obviously true for a memory stack, but also if you have a stack including processor, memory, sensors, or whatever.”

And finally, there’s geopolitics. “Another consideration beyond availability and cost is politics, and if I paint with a very broad brush, it’s fair to say HBM is essentially off-limits to China,” noted Murdock. “So Chinese companies are using LPDDR5X for their AI designs of today and are moving to LPDDR6 for their AI designs of tomorrow.”

CPUs’ constant companion
DDR has a role in the data center, but it usually serves the CPUs that coordinate operations. Accelerators, whether GPUs or neural processing units (NPUs), typically rely on HBM in the data center and/or LPDDR for higher bandwidth or lower power, respectively.

Fig. 2: Memories employed with CPUs vs. G/NPUs. DDR excels at general-purpose sequential computing of the sort the CPU would typically execute, whereas HBM and LPDDR provide higher bandwidth or lower power for the masses of data necessary for training and inference algorithms that involve high levels of parallelism. GDDR also may appear there. Source: Bryon Moyer/Semiconductor Engineering

“Typically, servers and DDR go hand-in-hand,” said Murdock. “Today DDR5 RDIMMs are the gold standard, but some are moving toward DDR5 MRDIMMs [multiplexed RDIMM, which can double bandwidth by ping-ponging two RDIMMs] as they offer increased performance using DRAMs available today. The DDR5 MRDIMM comes at a premium both in terms of pricing and power consumption.”

But DDR isn’t optimized for handling AI data patterns. “DDR remains a reliable secondary storage method,” said Expedera’s Tadishetti. “However, unless there is a significant improvement in latency and performance, it cannot compete with LPPDR in terms of efficiency, and GDDR/HBM in terms of raw performance.”

That said, it’s everywhere and it’s cheap. “For volume inference-optimized devices that aren’t power-constrained, DDR is the local memory of choice,” said Roddy. “For any device that is line-powered (home, office, factory) or comes with its own power plant (car), the speed and cost combo of DDR is the undefeated champion. Inference applications running on NPU subsystems that smartly manage external memory are able to batch and prefetch DDR accesses to maximize performance while taking advantage of the enormous economies of scale of DDR availability.”

The new belle of the ball
Still, LPDDR is starting to permeate a wide range of systems, and if it doesn’t displace an alternative, it might be added to the mix to lower power.

“For battery and power-constrained devices, LPDDR offers a superior bandwidth versus power tradeoff,” said Roddy. “The massive volumes of LPDDR produced for the phone market make it the cost-sensitive choice for most emerging AI-centric consumer and portable devices.”

Others agree. “LPDDR is the jack-of-all-trades of memory — and, in fact, master of some,” said Murdock. “It owns both the mobile and automotive application spaces.”

One can even try to create a poor-person’s HBM with it. “You can stack LPDDR to get capacity,” noted John Eble, vice president of product marketing for memory interface chips at Rambus.

LPDDR also is making its way into the data center as one way of reducing power, although it fails to offer everything hyperscalers want. “The main drawback with LPDDR is its lack of RAS [reliability, availability, serviceability] features, and it doesn’t have the same degree of ECC,” said Eble. “Or there’s this ‘chip kill’ notion, where you can recover from a DRAM chip dying. LPDDR wasn’t designed for that level of kind of RAS capability.”

And even though DDR tends to accompany CPUs, LPDDR has a toe in the door. “NVIDIA introduced their Grace Arm-based processor, and they chose to tie it to LPDDR memory,” Eble added.

LPDDR also may replace DDR in edge systems where performance matters. “Many edge devices don’t have memory, and those that do often need very little, so they tend to look for whatever is cheapest,” said Murdock. “Those that actually require some level of performance from the memory use LPDDR due to its power and performance profile.”

Always a bridesmaid?
The one family less often seen in AI systems is GDDR, which has characteristics that should appeal to AI systems, except that it tends to be second-best across key parameters. Its throughput is higher than that of LPDDR, but lower than that of HBM. It costs less than HBM or LPDDR, but not less than DDR. There’s no obvious parameter that would mandate GDDR for some class of systems. As a result, it’s often passed over for AI.

“GDDR seems to be the anti-Goldilocks for AI applications,” Roddy said. “It’s always ‘just wrong.’ GDDR is too expensive for inference-oriented consumer devices, and a well-designed NPU with offline compilation can intelligently pre-fetch weights and activations such that the faster random-access speed of GDDR is never needed. And in the data center, the raw speed advantage of HBM has displaced GDDR.”

However, it is promising for graphics-related generative algorithms, as long as its limited capacity isn’t a barrier. “It’s primarily used for graphics and certain aspects of generative AI,” said Tadishetti. “As we observe an increasing trend in image and video generation models, some demand might be shifted to GDDR — but to be clear, we have not seen OEMs doing this yet.”

Four roadmaps
All of the DRAM standards originate in JEDEC, but each type has its own committee. DDR is owned by the JC-42.3 subcommittee (typical JEDEC nomenclature, with JC-42 covering all solid-state memories), JC-42.1 standardizing GDDR, JC-42.2 handling HBM, and JC-42.6 working on LPDDR. All four continue to push their DRAMs, but the LPDDR and HBM versions are getting more attention.

“LPDDR5X is readily available, reasonably affordable, and can fit the power and performance needs for many applications,” said Murdock. “We are already seeing demand for LPDDR6 for design starts given the performance boost it will bring compared to LPDDR5X.”

While the specific changes in LPDDR6 aren’t yet public, they’re largely expected to affect clock speeds, banking, bus widths, and burst accesses. In addition, it’s getting built-in error-correction codes (ECC), which is a testament to the delicate nature of the cells and signals at these high speeds. It’s expected to be available by the end of the year.

HBM4 is the next eagerly awaited version of high-bandwidth memory. It doubles bandwidth, channel count, and the data-bus width compared with HBM3. It’s expected to ship in 2026. “HBM3E is what is readily available today, but the HBM4 JEDEC standard has just been published, so design starts will swiftly be moving to HBM4 for the increased performance it will offer,” said Murdock.

A dynamic environment
Even as memory speed improves and relative power drops, processors are undergoing similar development. Ideally, processors and memories move forward together, with neither one being the bottleneck. But they’re developed separately, so there will always be some leapfrogging going forward.

Although dedicated NPUs have had a hard time gaining traction in high volumes, some of them promise much lower-power execution. If they catch on, that will put yet more pressure on memories used in power-constrained systems. Likewise, as data-center processors achieve higher performance, HBM will need to keep up.

And simply picking the right memory isn’t enough. Ensuring high-quality access signals is critical to operating at high speeds. “Those channels are really the most important thing from a system-performance standpoint, and we’ve got to look at signal integrity,” said Cadence’s Ferro. “If I’m a system designer, I should be able to take a GDDR6 DRAM from one vendor and drop in a GDDR6 from another vendor. But one may run at 16 GB/s, and another DRAM might have an 18‑GB/s version. You can drop that one in, and it’ll work, but can that channel handle 18 GB/s?”

Although clear trends are in play, system designers still need to do their homework to identify the most appropriate specific memory for a given system — and to ensure that the system can keep up.

— Ed Sperling contributed to this report.

Related Reading
Memory Fundamentals For Engineers
eBook: Nearly everything you need to know about memory
Choosing The Correct High-Bandwidth Memory
New applications require a deep understanding of the tradeoffs for different types of DRAM.
Choosing The Right Memory Solution For AI Accelerators
The different flavors of DRAM each fill a particular AI niche.
HBM Options Increase As AI Demand Soars
But manufacturing reliable 3D DRAM stacks with good yield is complex and costly.



Leave a Reply


(Note: This name will be displayed publicly)