Hardware-Oriented Analysis of Multi-Head Latent Attention (MLA) in DeepSeek-V3 (KU Leuven)


A new technical paper titled "Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention" was published by researchers at KU Leuven. Abstract "Multi-Head Latent Attention (MLA), introduced in DeepSeek-V2, improves the efficiency of large language models by projecting query, key, and value tensors into a compact latent space. This architectural change reduces the KV-cache size and s... » read more

Connecting AI Accelerators


Experts At The Table: Semiconductor Engineering sat down to discuss the various ways that AI accelerators are being applied today with Marc Meunier, director of ecosystem development at Arm; Jason Lawley, director of product marketing for AI IP at Cadence; Paul Karazuba, vice president of marketing at Expedera; Alexander Petr, senior director at Keysight; Steve Roddy, chief marketing office... » read more

Future-proofing AI Models


Experts At The Table: Making sure AI accelerators can be updated for future requirements is becoming essential due to the rapid introduction of new models. Semiconductor Engineering sat down to discuss the challenges of future-proofing these designs with Marc Meunier, director of ecosystem development at Arm; Jason Lawley, director of product marketing for AI IP at Cadence; Paul Karazuba, vic... » read more

AI Accelerators Moving Out From Data Centers


Experts At The Table: The explosion in AI data is driving chipmakers to look beyond a single planar SoC. Semiconductor Engineering sat down to discuss the need for more computing and the expanding role of chiplets with Marc Meunier, director of ecosystem development at Arm; Jason Lawley, director of product marketing for AI IP at Cadence; Paul Karazuba, vice president of marketing at Expedera; ... » read more

Energy-Efficient Scalable Silicon Photonic Platform For AI Accelerator HW


A new technical paper titled "Large-Scale Integrated Photonic Device Platform for Energy-Efficient AI/ML Accelerators" was published by researchers at HP Labs, IIT Madras, Microsoft Research and University of Michigan. Abstract "The convergence of deep learning and Big Data has spurred significant interest in developing novel hardware that can run large artificial intelligence (AI) workload... » read more

Wafer-Scale Computing for LLMs (U. of Edinburgh, Microsoft)


A new technical paper titled "WaferLLM: A Wafer-Scale LLM Inference System" was published by researchers at University of Edinburgh and Microsoft Research. Abstract "Emerging AI accelerators increasingly adopt wafer-scale manufacturing technologies, integrating hundreds of thousands of AI cores in a mesh-based architecture with large distributed on-chip memory (tens of GB in total) and ultr... » read more

Potential of Wireless Interconnects For Improving Performance And Flexibility Of Multi-Chip AI Accelerators


A new technical paper titled "Exploring the Potential of Wireless-enabled Multi-Chip AI Accelerators" was published by researchers at Universitat Politecnica de Catalunya. Abstract "The insatiable appetite of Artificial Intelligence (AI) workloads for computing power is pushing the industry to develop faster and more efficient accelerators. The rigidity of custom hardware, however, conflict... » read more

Choosing The Right Memory Solution For AI Accelerators


To meet the increasing demands of AI workloads, memory solutions must deliver ever-increasing performance in bandwidth, capacity, and efficiency. From the training of massive large language models (LLMs) to efficient inference on endpoint devices, choosing the right memory technology is critical for chip designers. This blog explores three leading memory solutions—HBM, LPDDR, and GDDR—and t... » read more

MACs Are Not Enough: Why “Offload” Fails


For the past half-decade, countless chip designers have approached the challenges of on-device machine learning inference with the simple idea of building a “MAC accelerator” – an array of high-performance multiply-accumulate circuits – paired with a legacy programmable core to tackle the ML inference compute problem. There are literally dozens of lookalike architectures in the market t... » read more

Designing Heterogeneous AI Acceleration SoCs


A new technical paper titled "Open-Source Heterogeneous SoCs for AI: The PULP Platform Experience" was published by researchers at University of Bologna. Abstract "Since 2013, the PULP (Parallel Ultra-Low Power) Platform project has been one of the most active and successful initiatives in designing research IPs and releasing them as open-source. Its portfolio now ranges from processor co... » read more

← Older posts