Wafer-Scale Computing for LLMs (U. of Edinburgh, Microsoft)


A new technical paper titled "WaferLLM: A Wafer-Scale LLM Inference System" was published by researchers at University of Edinburgh and Microsoft Research. Abstract "Emerging AI accelerators increasingly adopt wafer-scale manufacturing technologies, integrating hundreds of thousands of AI cores in a mesh-based architecture with large distributed on-chip memory (tens of GB in total) and ultr... » read more

Potential of Wireless Interconnects For Improving Performance And Flexibility Of Multi-Chip AI Accelerators


A new technical paper titled "Exploring the Potential of Wireless-enabled Multi-Chip AI Accelerators" was published by researchers at Universitat Politecnica de Catalunya. Abstract "The insatiable appetite of Artificial Intelligence (AI) workloads for computing power is pushing the industry to develop faster and more efficient accelerators. The rigidity of custom hardware, however, conflict... » read more

Power Delivery Challenges in 3D HI CIM Architectures for AI Accelerators (Georgia Tech)


A new technical paper titled "Co-Optimization of Power Delivery Network Design for 3D Heterogeneous Integration of RRAM-based Compute In-Memory Accelerators" was published by researchers at Georgia Tech. Abstract: "3D heterogeneous integration (3D HI) offers promising solutions for incorporating substantial embedded memory into cutting-edge analog compute-in-memory (CIM) AI accelerators, ad... » read more

Mixed-Precision DL Inference, Co-Designed With HW Accelerator DPU (Intel)


A new technical paper titled "StruM: Structured Mixed Precision for Efficient Deep Learning Hardware Codesign" was published by Intel. Abstract "In this paper, we propose StruM, a novel structured mixed-precision-based deep learning inference method, co-designed with its associated hardware accelerator (DPU), to address the escalating computational and memory demands of deep learning worklo... » read more

DeepSeek: Improving Language Model Reasoning Capabilities Using Pure Reinforcement Learning


A new technical paper titled "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" was published by DeepSeek. Abstract: "We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates rema... » read more

Analog Accelerator For AI/ML Training Workloads Using Stochastic Gradient Descent (Imperial College London)


A new technical paper titled "Learning in Log-Domain: Subthreshold Analog AI Accelerator Based on Stochastic Gradient Descent" was published by researchers at Imperial College London. Abstract "The rapid proliferation of AI models, coupled with growing demand for edge deployment, necessitates the development of AI hardware that is both high-performance and energy-efficient. In this paper, w... » read more

New Class Of Memory: Managed-Retention Memory or MRM (Microsoft Research)


A new technical paper titled "Managed-Retention Memory: A New Class of Memory for the AI Era" was published by researchers at Microsoft. Abstract "AI clusters today are one of the major uses of High Bandwidth Memory (HBM). However, HBM is suboptimal for AI workloads for several reasons. Analysis shows HBM is overprovisioned on write performance, but underprovisioned on density and read band... » read more

Design-Space Analysis of M3D FPGA With BEOL Configuration Memories (Georgia Tech, UCLA)


A new technical paper titled "Monolithic 3D FPGAs Utilizing Back-End-of-Line Configuration Memories" was published by researchers at Georgia Tech and UCLA. Abstract "This work presents a novel monolithic 3D (M3D) FPGA architecture that leverages stackable back-end-of-line (BEOL) transistors to implement configuration memory and pass gates, significantly improving area, latency, and power ef... » read more

Design Space for the Device-Circuit Codesign of NVM-Based CIM Accelerators (TSMC)


A new technical paper/mini-review titled "Assessing Design Space for the Device-Circuit Codesign of Nonvolatile Memory-Based Compute-in-Memory Accelerators" was published by researchers at TSMC and National Tsing Hua University. Abstract "Unprecedented penetration of artificial intelligence (AI) algorithms has brought about rapid innovations in electronic hardware, including new memory devi... » read more

Geometric-Aware Model Merging Approach To Enhance Instruction Alignment in Chip LLMs (Nvidia)


A new technical paper titled "ChipAlign: Instruction Alignment in Large Language Models for Chip Design via Geodesic Interpolation" was published by researchers at NVIDIA Research. Abstract: "Recent advancements in large language models (LLMs) have expanded their application across various domains, including chip design, where domain-adapted chip models like ChipNeMo have emerged. However, ... » read more

← Older posts Newer posts →