Machine Intelligence on Wireless Edge Networks with RF Analog Architecture (MIT, Duke)


A new technical paper titled "Machine Intelligence on Wireless Edge Networks" was published by researchers at MIT and Duke University. Abstract "Deep neural network (DNN) inference on power-constrained edge devices is bottlenecked by costly weight storage and data movement. We introduce MIWEN, a radio-frequency (RF) analog architecture that "disaggregates" memory by streaming weights wirele... » read more

All-In-One Analog AI Accelerator With CMO/HfOx ReRAM Integrated Into The BEOL (IBM Research-Europe)


A new technical paper titled "All-in-One Analog AI Hardware: On-Chip Training and Inference with Conductive-Metal-Oxide/HfOx ReRAM Devices" was published by researchers at IBM Research-Europe. Abstract "Analog in-memory computing is an emerging paradigm designed to efficiently accelerate deep neural network workloads. Recent advancements have focused on either inference or training accelera... » read more

Review Paper: Wafer-Scale Accelerators Versus GPUs (UC Riverside)


A new technical paper titled "Performance, efficiency, and cost analysis of wafer-scale AI accelerators vs. single-chip GPUs" was published by researchers at UC Riverside. "This review compares wafer-scale AI accelerators and single-chip GPUs, examining performance, energy efficiency, and cost in high-performance AI applications. It highlights enabling technologies like TSMC’s chip-on-wafe... » read more

Mixed-Criticality SW Architectures for Centralized HPC Platforms in Software-Defined Vehicles (Daimler, TU Munich)


A new technical paper titled "Towards Mixed-Criticality Software Architectures for Centralized HPC Platforms in Software-Defined Vehicles: A Systematic Literature Review" was published by researchers at Daimler Truck AG and Technical University of Munich. Abstract "Centralized electrical/electronic architectures and High-Performance Computers (HPCs) are redefining automotive software develo... » read more

Towards Base-Station-On-Chip: Wireless Communication Kernels On A RISC-V Vector Processor (TU Dresden, CeTI)


A new technical paper titled "Towards a Base-Station-on-Chip: RISC-V Hardware Acceleration for wireless communication" was published by researchers at TU Dresden and Centre for Tactile Internet with Human-in-the-Loop (CeTI). Abstract "The evolution of 5G and the emergence of 6G wireless communication systems impose higher demands for computing capabilities and lower power consumption in the... » read more

PCM-Based IMC Technology: Overview Of Materials, Device Physics, Design and Fabrication (IBM Research-Europe)


A new technical paper titled "Phase-Change Memory for In-Memory Computing" was published by researchers at IBM Research-Europe. "We review the current state of phase-change materials, PCM device physics, and the design and fabrication of PCM-based IMC chips. We also provide an overview of the application landscape and offer insights into future developments," states the paper. Find the te... » read more

Open-Source RISC-V Cores: Analysis Of Scalar and Superscalar Architectures And Out-Of-Order Machines


A new technical paper titled "Ramping Up Open-Source RISC-V Cores: Assessing the Energy Efficiency of Superscalar, Out-of-Order Execution" was published by researchers at ETH Zurich, Università di Bologna and Univ. Grenoble Alpes, Inria. Abstract "Open-source RISC-V cores are increasingly demanded in domains like automotive and space, where achieving high instructions per cycle (IPC) throu... » read more

Hardware-Oriented Analysis of Multi-Head Latent Attention (MLA) in DeepSeek-V3 (KU Leuven)


A new technical paper titled "Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention" was published by researchers at KU Leuven. Abstract "Multi-Head Latent Attention (MLA), introduced in DeepSeek-V2, improves the efficiency of large language models by projecting query, key, and value tensors into a compact latent space. This architectural change reduces the KV-cache size and s... » read more

Arithmetic Intensity In Decoding: A Hardware-Efficient Perspective (Princeton University)


A new technical paper titled "Hardware-Efficient Attention for Fast Decoding" was published by researchers at Princeton University. Abstract "LLM decoding is bottlenecked for large batches and long contexts by loading the key-value (KV) cache from high-bandwidth memory, which inflates per-token latency, while the sequential nature of decoding limits parallelism. We analyze the interplay amo... » read more

Roadmap for AI HW Development, With The Role of Photonic Chips In Supporting Future LLMs (CUHK, NUS, UIUC, Berkeley)


A new technical paper titled "What Is Next for LLMs? Next-Generation AI Computing Hardware Using Photonic Chips" was published by researchers at The Chinese University of Hong Kong, National University of Singapore, University of Illinois Urbana-Champaign and UC Berkeley. Abstract "Large language models (LLMs) are rapidly pushing the limits of contemporary computing hardware. For example, t... » read more

← Older posts