Google Details Five Generations Of TPU Training Supercomputers


Researchers from Google and University of California, Berkeley published a technical paper titled “Google’s Training Supercomputers from TPU v2 to Ironwood: Architectural Stability, Scale, Resilience, Power Efficiency, and Sustainability Across Five Generations.” The paper summarizes five generations of Google TPUs, from TPU v2 through Ironwood, and examines how the systems evolved int... » read more

Inside the AI Accelerator: Essential IP Design Solutions: eBook


This eBook explores how next‑gen AI accelerators break past single‑chip limits using advanced IP, high‑speed interconnects, memory interfaces, and multi‑die architectures. You’ll see how optical links push bandwidth further and how built‑in security IP keeps AI data protected without slowing performance. What you'll learn: How UALink, PCIe, CXL, and Ultra Ethernet enable sca... » read more

AI Accelerators Usher In New Era For IC Test


Key Takeaways The parallelism in AI accelerators enables low latency but complicates failure isolation. HBM can account for 50% of package cost, so known-good stack assurance is critical. DFT and test cooperate to solve final test, singulated die test, SLT, and in-system test for data centers. AI accelerators are used for everything from training large language models to mak... » read more

10-Year Roadmap for AI + Hardware (UIUC, UCLA, Stanford et al.)


Researchers from University of Illinois Urbana-Champaign, UCLA, Stanford University, Nvidia, Google, et al. have released “AI+HW 2035: Shaping the Next Decade”. Abstract “Artificial intelligence (AI) and hardware (HW) are advancing at unprecedented rates, yet their trajectories have become inseparably intertwined. The global research community lacks a cohesive, long-term vision t... » read more

Optimal Heterogeneous Memory Configs for AI Tasks Under Specified Performance Metrics (Stanford, UCSC)


Researchers from Stanford University and University of California, Santa Cruz have released “Heterogeneous Memory Design Exploration for AI Accelerators with a Gain Cell Memory Compiler”. Abstract “As memory increasingly dominates system cost and energy, heterogeneous on-chip memory systems that combine technologies with complementary characteristics are becoming essential. Gain ... » read more

Co-optimization Approaches For Reliable and Efficient AI Acceleration (Peking University et al.)


A new technical paper titled "The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization" was published by researchers at Peking University and Beijing Advanced Innovation Center for Integrated Circuits. Abstract "As the CMOS technology pushes to the nanoscale, aging effects and process variations have become increasingly pronounced, posing significant reliabilit... » read more

MIT’s Survey On Accelerators and Processors for Inference, With Peak Performance And Power Comparisons


A new technical paper titled "Lincoln AI Computing Survey (LAICS) and Trends" was published by researchers at MIT Lincoln Laboratory Supercomputing Center. Abstract "In the past year, generative AI (GenAI) models have received a tremendous amount of attention, which in turn has increased attention to computing systems for training and inference for GenAI. Hence, an update to this survey is ... » read more

LLM Inference: Core Bottlenecks Imposed By Memory, Compute Capacity, Synchronization Overheads (NVIDIA)


A new technical paper titled "Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need" was published by NVIDIA. Abstract "This paper presents a limit study of transformer-based large language model (LLM) inference, focusing on the fundamental performance bottlenecks imposed by memory bandwidth, memory capacity, and synchronization overhead in distributed ... » read more

Review Paper: Wafer-Scale Accelerators Versus GPUs (UC Riverside)


A new technical paper titled "Performance, efficiency, and cost analysis of wafer-scale AI accelerators vs. single-chip GPUs" was published by researchers at UC Riverside. "This review compares wafer-scale AI accelerators and single-chip GPUs, examining performance, energy efficiency, and cost in high-performance AI applications. It highlights enabling technologies like TSMC’s chip-on-wafe... » read more

Hardware-Oriented Analysis of Multi-Head Latent Attention (MLA) in DeepSeek-V3 (KU Leuven)


A new technical paper titled "Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention" was published by researchers at KU Leuven. Abstract "Multi-Head Latent Attention (MLA), introduced in DeepSeek-V2, improves the efficiency of large language models by projecting query, key, and value tensors into a compact latent space. This architectural change reduces the KV-cache size and s... » read more

← Older posts