Scheduling Multi-Model AI Workloads On Heterogeneous MCM Accelerators (UC Irvine)


A technical paper titled “SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators” was published by researchers at University of California Irvine.


“Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware. To address such increasing demands, designing a scalable hardware architecture became a key problem. Among recent solutions, the 2.5D silicon interposer multi-chip module (MCM)-based AI accelerator has been actively explored as a promising scalable solution due to their significant benefits in the low engineering cost and composability. However, previous MCM accelerators are based on homogeneous architectures with fixed dataflow, which encounter major challenges from highly heterogeneous multi-model workloads due to their limited workload adaptivity. Therefore, in this work, we explore the opportunity in the heterogeneous dataflow MCM AI accelerators. We identify the scheduling of multi-model workload on heterogeneous dataflow MCM AI accelerator is an important and challenging problem due to its significance and scale, which reaches O(10^18) scale even for a single model case on 6×6 chiplets. We develop a set of heuristics to navigate the huge scheduling space and codify them into a scheduler with advanced techniques such as inter-chiplet pipelining. Our evaluation on ten multi-model workload scenarios for datacenter multitenancy and AR/VR use-cases has shown the efficacy of our approach, achieving on average 35.3% and 31.4% less energy-delay product (EDP) for the respective applications settings compared to homogeneous baselines.”

Find the technical paper here. Published May 2024.

Odema, Mohanad, Luke Chen, Hyoukjun Kwon, and Mohammad Abdullah Al Faruque. “SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators.” arXiv preprint arXiv:2405.00790 (2024).

Related Reading
AI Accelerator Architectures Poised For Big Changes
Design teams are racing to boost speed and energy efficiency of AI as it begins shifting toward the edge.
Multi-Die Design Pushes Complexity To The Max
Continued scaling using advanced packaging will require changes across the entire semiconductor ecosystem.
Chiplets: 2023 (EBook)
What chiplets are, what they are being used for today, and what they will be used for in the future.

Leave a Reply

(Note: This name will be displayed publicly)