Report: The AI Efficiency Boom


Artificial Intelligence (AI) is undergoing a fundamental transformation. While early AI models were large, compute-heavy, and dependent on cloud processing, a new wave of efficiency-driven innovations is moving AI inference—the generation of model results—to the edge. Smaller models, improved memory and compute performance, and the need for privacy, low latency, and energy efficiency are dr... » read more

Scaling GenAI Training And Inference Chips With Runtime Monitoring


GenAI’s rapid growth is pushing the limits of semiconductor technology, demanding breakthroughs in performance, power efficiency, and reliability. Training and inference workloads for models like GPT-4 and GPT-5 require massive computational resources, leading to skyrocketing costs, energy consumption, and hardware failures. Traditional optimization methods, such as static guard bands and per... » read more

System-Level Approach To Reducing HBM Cost for AI inference (RPI, IBM)


A new technical paper titled "Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure" was published by researchers at Rensselaer Polytechnic Institute and IBM. Abstract "High-Bandwidth Memory (HBM) delivers exceptional bandwidth and energy efficiency for AI workloads, but its high cost per bit, driven in part by stringent on-die reliability requirements, pose... » read more

Review Paper: Wafer-Scale Accelerators Versus GPUs (UC Riverside)


A new technical paper titled "Performance, efficiency, and cost analysis of wafer-scale AI accelerators vs. single-chip GPUs" was published by researchers at UC Riverside. "This review compares wafer-scale AI accelerators and single-chip GPUs, examining performance, energy efficiency, and cost in high-performance AI applications. It highlights enabling technologies like TSMC’s chip-on-wafe... » read more

Fully Automated Hardware And Software Design Of Processor Chips (Chinese Academy Of Sciences)


A new technical paper titled "QiMeng: Fully Automated Hardware and Software Design for Processor Chip" was published by researchers at Chinese Academy of Sciences. Abstract "Processor chip design technology serves as a key frontier driving breakthroughs in computer science and related fields. With the rapid advancement of information technology, conventional design paradigms face three majo... » read more

Inference Framework For Deployment Challenges of Large Generative Models On GPUs (Google)


A new technical paper titled "Scaling On-Device GPU Inference for Large Generative Models" was published by researchers at Google and Meta Platforms. Abstract "Driven by the advancements in generative AI, large machine learning models have revolutionized domains such as image processing, audio synthesis, and speech recognition. While server-based deployments remain the locus of peak perform... » read more

Wafer-Scale Computing for LLMs (U. of Edinburgh, Microsoft)


A new technical paper titled "WaferLLM: A Wafer-Scale LLM Inference System" was published by researchers at University of Edinburgh and Microsoft Research. Abstract "Emerging AI accelerators increasingly adopt wafer-scale manufacturing technologies, integrating hundreds of thousands of AI cores in a mesh-based architecture with large distributed on-chip memory (tens of GB in total) and ultr... » read more

Mixed-Precision DL Inference, Co-Designed With HW Accelerator DPU (Intel)


A new technical paper titled "StruM: Structured Mixed Precision for Efficient Deep Learning Hardware Codesign" was published by Intel. Abstract "In this paper, we propose StruM, a novel structured mixed-precision-based deep learning inference method, co-designed with its associated hardware accelerator (DPU), to address the escalating computational and memory demands of deep learning worklo... » read more

Pooling CPU Memory for LLM Inference With Lower Latency and Higher Throughput (UC Berkeley)


A new technical paper titled "Pie: Pooling CPU Memory for LLM Inference" was published by researchers at UC Berkeley. Abstract "The rapid growth of LLMs has revolutionized natural language processing and AI analysis, but their increasing size and memory demands present significant challenges. A common solution is to spill over to CPU memory; however, traditional GPU-CPU memory swapping ofte... » read more

Survey: HW SW Co-Design Approaches Tailored to LLMs


A new technical paper titled "A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models" was published by researchers at Duke University and Johns Hopkins University. Abstract "The rapid development of large language models (LLMs) has significantly transformed the field of artificial intelligence, demonstrating remarkable capabilities in natural language proce... » read more

← Older posts