Four-Tier Memory Hierarchy for LLM Reasoning (USC, UW)


A new technical paper, "Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning," was published by researchers at USC and University of Wisconsin-Madison. Abstract "Reasoning LLMs produce thousands of chain-of-thought tokens whose KV cache must reside in scarce GPU HBM. The dominant response -- permanently evicting low-importance tokens -- is catastrophic for reasoni... » read more

Workflow-Level Design For Trustworthy GenAI Integration in Vehicles (UOL, Denso)


A new technical paper, "Workflow-Level Design Principles for Trustworthy GenAI in Automotive System Engineering," was published by researchers at University of Oldenburg and Denso Automotive. Abstract "The adoption of large language models in safety-critical system engineering is constrained by trustworthiness, traceability, and alignment with established verification practices. We propos... » read more

Replacing GPU Compute Dies With PNM-Enabled HBM Cubes For Long-Context Decode Attention (UCSD, Columbia, Yonsei U., NVIDIA, Samsung)


A new technical paper, "AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving," was published by researchers at UC San Diego, Columbia University, Yonsei University, NVIDIA, and Samsung. Abstract "All current LLM serving systems place the GPU at the center, from production-level attention-FFN disaggregation to NVIDIA's Rubin GPU-LPU heterogeneous p... » read more

Leveraging Agentic AI Techniques to Improve Formal Verification (Infineon, et al.)


A new technical paper, "Agentic AI-based Coverage Closure for Formal Verification," was published by researchers at Infineon and the NIT Jalandhar. Abstract "Coverage closure is a critical requirement in Integrated Chip (IC) development process and key metric for verification sign-off. However, traditional exhaustive approaches often fail to achieve full coverage within project timelines.... » read more

Systematic Analysis of CPU-Induced Slowdowns in Multi-GPU LLM Inference (Georgia Tech)


A new technical paper, "Characterizing CPU-Induced Slowdowns in Multi-GPU LLM Inference," was published by the Georgia Institute of Technology. Abstract "Large-scale machine learning workloads increasingly rely on multi-GPU systems, yet their performance is often limited by an overlooked component: the CPU. Through a detailed study of modern large language model (LLM) inference and servin... » read more

RPU: A Chiplet-Based Architecture To Address The Challenges of the Modern Memory Wall (Harvard University)


Researchers from Harvard University have released “RPU -- A Reasoning Processing Unit”. Abstract “Large language model (LLM) inference performance is increasingly bottlenecked by the memory wall. While GPUs continue to scale raw compute throughput, they struggle to deliver scalable performance for memory bandwidth bound workloads. This challenge is amplified by emerging reasonin... » read more

Four Architectural Opportunities for LLM Inference Hardware (Google)


A new technical paper titled "Challenges and Research Directions for Large Language Model Inference Hardware" was published by Google. Abstract "Large Language Model (LLM) inference is hard. The autoregressive Decode phase of the underlying Transformer model makes LLM inference fundamentally different from training. Exacerbated by recent AI trends, the primary challenges are memory and in... » read more

LLM-Based AI Agent That Automates The Transistor Sizing Process (Univ. of Edinburgh)


A new technical paper titled "EEsizer: LLM-Based AI Agent for Sizing of Analog and Mixed Signal Circuit" was published by researchers at The University of Edinburgh. Abstract "The design of Analog and Mixed-Signal (AMS) integrated circuits (ICs) often involves significant manual effort, especially during the transistor sizing process. While Machine Learning techniques in Electronic Design A... » read more

Overview of Incorporating LLMs into EDA, With 3 Case Studies (TU Munich et al.)


A new technical paper titled "Large Language Models (LLMs) for Electronic Design Automation (EDA)" was published by researchers at the Technical University of Munich, University of Stuttgart, New York University, and University of Siegen. Abstract "With the growing complexity of modern integrated circuits, hardware engineers are required to devote more effort to the full design-to-manufactu... » read more

Re-Architecting AI For Power


The industry is becoming increasingly concerned about the amount of power being consumed by AI, but there is no simple solution to the problem. It requires a deep understanding of the application, the software and hardware architectures at both the semiconductor and system levels, and how all of this is designed and implemented. Each piece plays a role in the total power consumed and the utilit... » read more

← Older posts