Agentic AI Is Changing Data Center Architectures


Key Takeaways: The rise of agentic AI is shifting data centers from GPU-centric number crunching to CPU-driven orchestration, where managing long-running reasoning loops and context is just as important as raw compute. Integrating CPUs, GPUs, and stacked memory into tightly coupled multi-die architectures with varying workloads makes it much harder to ensure they will be reliable and ef... » read more

HW-Native, GPU Compiler for Large-scale ML Production Systems (UC San Diego, Meta)


A new technical paper, "TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments," was published by researchers at UC San Diego and Meta. Abstract "Modern GPUs increasingly rely on specialized hardware units and asynchronous coordination mechanisms, so performance depends on orchestrating data movement, tensor-core computation, and synchronization rather t... » read more

Replacing GPU Compute Dies With PNM-Enabled HBM Cubes For Long-Context Decode Attention (UCSD, Columbia, Yonsei U., NVIDIA, Samsung)


A new technical paper, "AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving," was published by researchers at UC San Diego, Columbia University, Yonsei University, NVIDIA, and Samsung. Abstract "All current LLM serving systems place the GPU at the center, from production-level attention-FFN disaggregation to NVIDIA's Rubin GPU-LPU heterogeneous p... » read more

SSD Emulator For Massively Parallel, GPU-Centric Storage (KAIST)


A new technical paper, "SwarmIO: Towards 100 Million IOPS SSD Emulation for Next-generation GPU-centric Storage Systems," was published by KAIST. Abstract "GPU-initiated I/O has emerged as a key mechanism for achieving high-throughput storage access by leveraging massive GPU thread-level parallelism, while recent industry trends point toward SSDs optimized for ultra-high random-read IOPS.... » read more

Systematic Analysis of CPU-Induced Slowdowns in Multi-GPU LLM Inference (Georgia Tech)


A new technical paper, "Characterizing CPU-Induced Slowdowns in Multi-GPU LLM Inference," was published by the Georgia Institute of Technology. Abstract "Large-scale machine learning workloads increasingly rely on multi-GPU systems, yet their performance is often limited by an overlooked component: the CPU. Through a detailed study of modern large language model (LLM) inference and servin... » read more

Enabling the Industry’s First GPU-Accelerated Manufacturing Platform


Discover how modern chip designs are revolutionizing the lithographic process, driving the need for innovative solutions to meet the industry's demand for shorter design cycles. This whitepaper explores the significant role of GPUs in accelerating computational lithography, offering unprecedented speed-ups for EDA tools in chip development. Learn about the collaborative efforts of Synopsys, NVI... » read more

HW-Triggered Backdoors Across Common GPU Accelerators (BIFOLD, TU Berlin, CISPA)


A new technical paper titled "Hardware-Triggered Backdoors" was published by researchers at Berlin Institute for the Foundations of Learning and Data (BIFOLD), TU Berlin and CISPA Helmholtz Center for Information Security. Abstract "Machine learning models are routinely deployed on a wide range of computing hardware. Although such hardware is typically expected to produce identical result... » read more

Case Study : Autonomous Driving AI Domain Controller


Ambarella’s CV3-AD655 autonomous driving AI domain controller combines energy-efficient compute with Imagination’s IMG BXM GPU to deliver real-time surround-view visualisation for L2++/L3 vehicles. This case study explores the shift to centralized domain controllers, why Ambarella selected IMG BXM, and how this enables greater driver awareness and system trust. Read more here.   ... » read more

Utilizing Chiplet-Locality For Efficient Memory Mapping In MCM GPUs (ETRI, Sungkyunkwan Univ.)


A new technical paper titled "Leveraging Chiplet-Locality for Efficient Memory Mapping in Multi-Chip Module GPUs" was published by researchers at Electronics and Telecommunications Research Institute (ETRI) and Sungkyunkwan University. Abstract "While the multi-chip module (MCM) design allows GPUs to scale compute and memory capabilities through multi-chip integration, it introduces memory ... » read more

Beyond BPD: Backside Clock and Signal Routing for Sub-3nm (UT Austin, Intel)


A new technical paper titled "Beyond Backside Power: Backside Signal Routing as Technology Booster for Standard Cell Scaling" was published by researchers from University of Texas at Austin and Intel. Abstract "Advances in process technology enabling backside metals and contacts offer new Design-Technology Co-Optimization (DTCO) opportunities to further enhance power, performance, and area ... » read more

← Older posts