LLM training is driving the need for memory subsystems that support high data rates.
The rapid rise in size and sophistication of AI/ML training models requires increasingly powerful hardware deployed in the data center and at the network edge. This growth in complexity and data stresses the existing infrastructure, driving the need for new and innovative processor architectures and associated memory subsystems. For example, even GPT-3 at 175 billion parameters is stressing the bandwidth, capacity, training time, and power of the most advanced GPUs on the market.
To this end, Cadence has shown our HBM3E memory subsystem running at 12.4Gbps at nominal voltages, demonstrating the PHY’s robustness and performance margin. The production version of our latest HBM3E PHY supports DRAM speeds of up to 10.4Gbps or 1.33TB/s per DRAM device. This speed represents a >1.6X bandwidth increase over the previous generation, making it ideal for LLM training.
Recall that HBM3E is a 3D stacked DRAM with 1024-bit wide data (16 64-bit channels). While this wide data bus enables high data transfer, routing these signals requires interposer technology (2.5D) capable of routing close to 2000 signals (data and control), including silicon, RDL, and silicon bridges.
The interposer design is critical for the system to operate at these data rates. Cadence provides 2.5D reference designs, including the interposer and package, as part of our standard IP package. As demonstrated in our test silicon, these designs give customers confidence they will meet their memory bandwidth requirements. The reference design is also a good starting point, helping to reduce development time and risk. Our expert SI/PI and system engineers work closely with customers to analyze their channels to ensure the best system performance.
Even as HBM3E delivers the highest memory bandwidth today, the industry keeps pushing forward. JEDEC recently announced that HBM4, the next version of the HBM DRAM standard, is nearing completion. JEDEC calls HBM4 an “evolutionary step beyond the currently published HBM3 standard.” They also claim HBM4 “enhancements are vital for applications that require efficient handling of large datasets and complex calculations.” HBM4 will support AI training applications, high-performance computing (HPC), and high-end graphics cards.
Leave a Reply