Characterization of GPU-based Inference for Reasoning-Centric LLMs (Micron, Argonne)

By Technical Paper Link - 26 May, 2026 - Comments: 0

Researchers from Micron Technology and Argonne National Laboratory have released “Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles”. Abstract “The transition from standard generative AI to reasoning-centric architectures, exemplified by models capable of extensive Chain-of-Thought (CoT) processing, marks a fundamental paradigm shift i... » read more

Four-Tier Memory Hierarchy for LLM Reasoning (USC, UW)

By Technical Paper Link - 20 May, 2026 - Comments: 0

A new technical paper, "Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning," was published by researchers at USC and University of Wisconsin-Madison. Abstract "Reasoning LLMs produce thousands of chain-of-thought tokens whose KV cache must reside in scarce GPU HBM. The dominant response -- permanently evicting low-importance tokens -- is catastrophic for reasoni... » read more

What Do LLMs Want from Hardware

By Geoff Tate - 08 Sep, 2025 - Comments: 0

Figure 1: Noam Shazeer, Google Gemini vice president, presented this in his Hot Chips 2025 talk. Noam Shazeer is Google’s vice president of engineering for Gemini, their LLM competitor to ChatGPT. He talked recently at Hot Chips: “Predictions for the Next Phase of AI." He has worked on LLMs for a decade since inventing the transformer model in 2017. As his slide says, LLMs can take adv... » read more

Dynamic KV Cache Scheduling in Heterogeneous Memory Systems for LLM Inference (Rensselaer Polytechnic Institute, IBM)

By Technical Paper Link - 28 Aug, 2025 - Comments: 0

A new technical paper titled "Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System" was published by researchers at Rensselaer Polytechnic Institute and IBM. Abstract "Large Language Model (LLM) inference is increasingly constrained by memory bandwidth, with frequent access to the key-value (KV) cache dominating data movement. While attention sparsity red... » read more

Cache Side-Channel Attacks On LLMs (MITRE, WPI)

By Technical Paper Link - 18 May, 2025 - Comments: 0

A new technical paper titled "Spill The Beans: Exploiting CPU Cache Side-Channels to Leak Tokens from Large Language Models" was published by researchers at MITRE and Worcester Polytechnic Institute. Abstract "Side-channel attacks on shared hardware resources increasingly threaten confidentiality, especially with the rise of Large Language Models (LLMs). In this work, we introduce Spill The... » read more

tag: KV cache

Characterization of GPU-based Inference for Reasoning-Centric LLMs (Micron, Argonne)

Four-Tier Memory Hierarchy for LLM Reasoning (USC, UW)

What Do LLMs Want from Hardware

Dynamic KV Cache Scheduling in Heterogeneous Memory Systems for LLM Inference (Rensselaer Polytechnic Institute, IBM)

Cache Side-Channel Attacks On LLMs (MITRE, WPI)

Trending Articles

Chip Industry Week In Review

Executive Outlook: Agentic AI’s Impact On Chip Design

Chip Industry Week In Review

I/O Design Challenges Grow In AI Data Centers And HPC Clusters

Chip Industry Week In Review

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2026

Advanced Packaging Limits Come Into Focus

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

When Semiconductor Materials Misbehave

TSMC Tech Symposium 2026, By The Numbers

Silicon Photonics Lights The Way To More Efficient Data Centers

Memory Wall Gets Higher

Sponsors

Recent Comments

About

Navigation

Connect With Us

tag: KV cache

Characterization of GPU-based Inference for Reasoning-Centric LLMs (Micron, Argonne)

Four-Tier Memory Hierarchy for LLM Reasoning (USC, UW)

What Do LLMs Want from Hardware

Dynamic KV Cache Scheduling in Heterogeneous Memory Systems for LLM Inference (Rensselaer Polytechnic Institute, IBM)

Cache Side-Channel Attacks On LLMs (MITRE, WPI)

Trending Articles

Chip Industry Week In Review

Executive Outlook: Agentic AI’s Impact On Chip Design

Chip Industry Week In Review

I/O Design Challenges Grow In AI Data Centers And HPC Clusters

Chip Industry Week In Review

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2026

Advanced Packaging Limits Come Into Focus

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

When Semiconductor Materials Misbehave

TSMC Tech Symposium 2026, By The Numbers

Silicon Photonics Lights The Way To More Efficient Data Centers

Memory Wall Gets Higher

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored