Replacing GPU Compute Dies With PNM-Enabled HBM Cubes For Long-Context Decode Attention (UCSD, Columbia, Yonsei U., NVIDIA, Samsung)


A new technical paper, "AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving," was published by researchers at UC San Diego, Columbia University, Yonsei University, NVIDIA, and Samsung. Abstract "All current LLM serving systems place the GPU at the center, from production-level attention-FFN disaggregation to NVIDIA's Rubin GPU-LPU heterogeneous p... » read more

10-Year Roadmap for AI + Hardware (UIUC, UCLA, Stanford et al.)


Researchers from University of Illinois Urbana-Champaign, UCLA, Stanford University, Nvidia, Google, et al. have released “AI+HW 2035: Shaping the Next Decade”. Abstract “Artificial intelligence (AI) and hardware (HW) are advancing at unprecedented rates, yet their trajectories have become inseparably intertwined. The global research community lacks a cohesive, long-term vision t... » read more