Home

TECHNICAL PAPERS

Wafer-Scale Computing for LLMs (U. of Edinburgh, Microsoft)

February 9th, 2025 - By: Technical Paper Link

A new technical paper titled “WaferLLM: A Wafer-Scale LLM Inference System” was published by researchers at University of Edinburgh and Microsoft Research.

Abstract
“Emerging AI accelerators increasingly adopt wafer-scale manufacturing technologies, integrating hundreds of thousands of AI cores in a mesh-based architecture with large distributed on-chip memory (tens of GB in total) and ultra-high on-chip memory bandwidth (tens of PB/s). However, current LLM inference systems, optimized for shared memory architectures like GPUs, fail to fully exploit these accelerators. We introduce WaferLLM, the first wafer-scale LLM inference system. WaferLLM is guided by a novel PLMR device model that captures the unique hardware characteristics of wafer-scale architectures. Leveraging this model, WaferLLM pioneers wafer-scale LLM parallelism, optimizing the utilization of hundreds of thousands of on-chip cores. It also introduces MeshGEMM and MeshGEMV, the first GEMM and GEMV implementations designed to scale effectively on wafer-scale accelerators. Evaluations show that WaferLLM achieves 200× better wafer-scale accelerator utilization than state-of-the-art systems. On a commodity wafer-scale accelerator, WaferLLM delivers 606× faster and 22× more energy-efficient GEMV compared to an advanced GPU. For LLMs, WaferLLM enables 39× faster decoding with 1.7× better energy efficiency. We anticipate these numbers will grow significantly as wafer-scale AI models, software, and hardware continue to mature.”

Find the technical paper here. February 2025.

Authors: Congjie He, Yeqi Huang, Pei Mu, Ziming Miao, Jilong Xue, Lingxiao Ma, Fan Yang, Luo Mai
https://doi.org/10.48550/arXiv.2502.04563

Knowledge Centers
Entities, people and technologies explored

EUV’s Future Looks Even Brighter

Demand for AI chips is growing exponentially, but costs and complexity limit the technology to a handful of companies. That could soon change.

by Gregory Haley

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

Wafer-Scale Computing for LLMs (U. of Edinburgh, Microsoft)

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

EUV’s Future Looks Even Brighter

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

Linear Pluggable Optics Save Energy In Data Centers

Interconnects Approach Tipping Point

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Recent Comments

About

Navigation

Connect With Us

Wafer-Scale Computing for LLMs (U. of Edinburgh, Microsoft)

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

EUV’s Future Looks Even Brighter

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

Linear Pluggable Optics Save Energy In Data Centers

Interconnects Approach Tipping Point

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored