Four Architectural Opportunities for LLM Inference Hardware (Google)


A new technical paper titled "Challenges and Research Directions for Large Language Model Inference Hardware" was published by Google. Abstract "Large Language Model (LLM) inference is hard. The autoregressive Decode phase of the underlying Transformer model makes LLM inference fundamentally different from training. Exacerbated by recent AI trends, the primary challenges are memory and in... » read more

System-HW Co-Design Approach Combines Mono3D DRAM, NMP, and GPU Acceleration (UCSD, Georgia Tech, UIUC, Illinois Tech)


A new technical paper titled "Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving" was published by researchers at UC San Diego, Georgia Tech, University of Illinois Urbana-Champaign and Illinois Institute of Technology. Abstract "As Large Language Models (LLMs) continue to evolve, Mixture of Experts (MoE) architecture has emerged as a preva... » read more

Tools, Models and System Support for PIM Architectures, With DRAM-Focus (ETH Zurich)


A new technical paper titled "New Tools, Programming Models, and System Support for Processing-in-Memory Architectures" was published by researchers at ETH Zurich. Abstract "Our goal in this dissertation is to provide tools, programming models, and system support for PIM architectures (with a focus on DRAM-based solutions), to ease the adoption of PIM in current and future systems. To this ... » read more