Replacing GPU Compute Dies With PNM-Enabled HBM Cubes For Long-Context Decode Attention (UCSD, Columbia, Yonsei U., NVIDIA, Samsung)


A new technical paper, "AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving," was published by researchers at UC San Diego, Columbia University, Yonsei University, NVIDIA, and Samsung. Abstract "All current LLM serving systems place the GPU at the center, from production-level attention-FFN disaggregation to NVIDIA's Rubin GPU-LPU heterogeneous p... » read more

Thermal Modeling For 2.5D And 3D Integrated Chiplets


A new technical paper titled "MFIT: Multi-Fidelity Thermal Modeling for 2.5D and 3D Multi-Chiplet Architectures" was published by researchers at University of Wisconsin–Madison, Washington State University, and University of Ulsan. Abstract: "Rapidly evolving artificial intelligence and machine learning applications require ever-increasing computational capabilities, while monolithic 2D d... » read more