Home

TECHNICAL PAPERS

Designing AI Hardware To Deal With Increasingly Challenging Memory Wall (UC Berkeley)

March 25th, 2024 - By: Technical Paper Link

A new technical paper titled “AI and Memory Wall” was published by researchers at UC Berkeley, ICSI, and LBNL.

Abstract
“The availability of unprecedented unsupervised training data, along with neural scaling laws, has resulted in an unprecedented surge in model size and compute requirements for serving/training LLMs. However, the main performance bottleneck is increasingly shifting to memory bandwidth. Over the past 20 years, peak server hardware FLOPS has been scaling at 3.0x/2yrs, outpacing the growth of DRAM and interconnect bandwidth, which have only scaled at 1.6 and 1.4 times every 2 years, respectively. This disparity has made memory, rather than compute, the primary bottleneck in AI applications, particularly in serving. Here, we analyze encoder and decoder Transformer models and show how memory bandwidth can become the dominant bottleneck for decoder models. We argue for a redesign in model architecture, training, and deployment strategies to overcome this memory limitation.”

Find the technical paper here. Published March 2024.

Gholami, Amir, Zhewei Yao, Sehoon Kim, Coleman Hooper, Michael W. Mahoney, and Kurt Keutzer. “Ai and memory wall.” arXiv preprint arXiv:2403.14123 (2024).

Fig. 1: Source: UC Berkeley et al. AI and Memory Wall. “The evolution of the number of parameters of state-of-the-art (SOTA) models over the years, along with the AI accelerator memory capacity (green dots). The number of parameters in large Transformer models has been exponentially increasing with a factor of 410× every two years, while the single GPU memory has only been scaled at a rate of 2× every 2 years. The growth rate for the Transformer models is calculated by only considering the non-recommendation system models (red circles), and the GPU memory is plotted by dividing the corresponding memory size by 6 as an approximate upper bound for the largest model that can be trained with the corresponding capacity.”

Designing AI Hardware To Deal With Increasingly Challenging Memory Wall (UC Berkeley)

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Money Pours Into New Fabs And Facilities

The Rising Price Of Power In Chips

Chiplet IP Standards Are Just The Beginning

The Future Of Memory

Backside Power Delivery Gears Up For 2nm Devices

Silicon Photonics Manufacturing Ramps Up

Visa Shakeup On Tap To Help Solve Worker Shortage

X-ray Inspection In The Semiconductor Industry

Sponsors

Recent Comments

About

Navigation

Connect With Us

Designing AI Hardware To Deal With Increasingly Challenging Memory Wall (UC Berkeley)

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Money Pours Into New Fabs And Facilities

The Rising Price Of Power In Chips

Chiplet IP Standards Are Just The Beginning

The Future Of Memory

Backside Power Delivery Gears Up For 2nm Devices

Silicon Photonics Manufacturing Ramps Up

Visa Shakeup On Tap To Help Solve Worker Shortage

X-ray Inspection In The Semiconductor Industry

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored