Characterization of GPU-based Inference for Reasoning-Centric LLMs (Micron, Argonne)

By Technical Paper Link - 26 May, 2026 - Comments: 0

Researchers from Micron Technology and Argonne National Laboratory have released “Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles”. Abstract “The transition from standard generative AI to reasoning-centric architectures, exemplified by models capable of extensive Chain-of-Thought (CoT) processing, marks a fundamental paradigm shift i... » read more

Large-scale, SRAM-based LLM Inference Deployment (Groq)

By Technical Paper Link - 21 May, 2026 - Comments: 0

A new technical paper, "SHIP: SRAM-Based Huge Inference Pipelines for Fast LLM Serving," was published by researchers at Nvidia, with work done while at Groq. Abstract "The proliferation of large language models (LLMs) demands inference systems with both low latency and high efficiency at scale. GPU-based serving relies on HBM for model weights and KV caches, creating a memory bandwidth b... » read more

A Detailed Evaluation of A Production Server With High-End MRDIMM Main Memory (BSC, Micron, Intel, UPC)

By Technical Paper Link - 05 May, 2026 - Comments: 0

A new technical paper, "Performance and Energy Benefits of MRDIMMs," was published by researchers at Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, Micron and Intel Corporation. Abstract "Multiplexed Rank DIMMs (MRDIMMs) have recently emerged as memory devices that enable higher bandwidth without increasing DRAM chip frequencies. This paper presents a detailed perf... » read more

Microarchitecture Tailored to 3D-Stacked Near-Memory Processing LLM Decoding (U. of Edinburgh, Peking U., Cambridge et al.)

By Technical Paper Link - 28 Apr, 2026 - Comments: 0

A new technical paper, "Rethinking Compute Substrates for 3D-Stacked Near-Memory LLM Decoding: Microarchitecture-Scheduling Co-Design," was published by researchers at University of Edinburgh, Peking University, University of Cambridge, University of Chinese Academy of Sciences, and the Hong Kong University of Science and Technology. Abstract "Large language model (LLM) decoding is a majo... » read more

New Automotive Architectures Are Shaking Up Processor And Memory Choices

By Ann Mutschler - 05 Mar, 2026 - Comments: 0

Key Takeaways Assisted and autonomous driving require more data from more sensors, and much faster processing of some of that data. The shift to software-defined vehicles and centralized intelligence makes it easier to identify where the most advanced processors and memories are required, and where older and less expensive technologies can be deployed. Technologies that were largely ... » read more

An FPGA-based Accelerator Addressing Bottlenecks in GNN Preprocessing (KAIST et al.)

By Technical Paper Link - 26 Feb, 2026 - Comments: 0

A new technical paper "AutoGNN: End-to-End Hardware-Driven Graph Preprocessing for Enhanced GNN Performance" was published by researchers at KAIST, Panmnesia, Peking University, Hanyang University, and Pennsylvania State University. Abstract "Graph neural network (GNN) inference faces significant bottlenecks in preprocessing, which often dominate overall inference latency. We introduce Au... » read more

Impact Of On-Chip SRAM Size And Frequency On Energy Efficiency And Performance of LLM Inference (Uppsala Univ.)

By Technical Paper Link - 03 Jan, 2026 - Comments: 0

A new technical paper titled "Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling" was published by researchers at Uppsala University. Abstract "Energy consumption dictates the cost and environmental impact of deploying Large Language Models. This paper investigates the impact of on-chip SRAM size and operating frequency on the energy efficiency and per... » read more

AI Workloads at the Edge: Ensuring Performance, Privacy, and Security

By Ann Mutschler - 17 Dec, 2025 - Comments: 0

Experts At The Table: Semiconductor Engineering gathered a group of experts to discuss why some AI workloads are better suited for on-device processing to achieve consistent performance, avoid network connectivity issues, reduce cloud computing costs, and ensure privacy. The panel included Frank Ferro, group director in the Silicon Solutions Group at Cadence; Eduardo Montanez, vice president a... » read more

Boosting Memory Bandwidth Availability By Salvaging Idle I/O Bandwidth Resources (Georgia Tech)

By Technical Paper Link - 03 Dec, 2025 - Comments: 0

A new technical paper titled "Pushing the Memory Bandwidth Wall with CXL-enabled Idle I/O Bandwidth Harvesting" was published by researchers at Georgia Institute of Technology. Abstract "The continual increase of cores on server-grade CPUs raises demands on memory systems, which are constrained by limited off-chip pin and data transfer rate scalability. As a result, high-end processors ty... » read more

Optimizing AI Workloads For Edge Computing

By Ann Mutschler - 03 Dec, 2025 - Comments: 0

Experts At The Table: Semiconductor Engineering gathered a group of experts to discuss how some AI workloads are better suited for on-device processing to achieve consistent performance, avoid network connectivity issues, reduce cloud computing costs, and ensure privacy. The panel included Frank Ferro, group director in the Silicon Solutions Group at Cadence; Eduardo Montanez, vice president an... » read more

← Older posts

tag: memory bandwidth

Characterization of GPU-based Inference for Reasoning-Centric LLMs (Micron, Argonne)

Large-scale, SRAM-based LLM Inference Deployment (Groq)

A Detailed Evaluation of A Production Server With High-End MRDIMM Main Memory (BSC, Micron, Intel, UPC)

Microarchitecture Tailored to 3D-Stacked Near-Memory Processing LLM Decoding (U. of Edinburgh, Peking U., Cambridge et al.)

New Automotive Architectures Are Shaking Up Processor And Memory Choices

An FPGA-based Accelerator Addressing Bottlenecks in GNN Preprocessing (KAIST et al.)

Impact Of On-Chip SRAM Size And Frequency On Energy Efficiency And Performance of LLM Inference (Uppsala Univ.)

AI Workloads at the Edge: Ensuring Performance, Privacy, and Security

Boosting Memory Bandwidth Availability By Salvaging Idle I/O Bandwidth Resources (Georgia Tech)

Optimizing AI Workloads For Edge Computing

Trending Articles

Chip Industry Week In Review

Executive Outlook: Agentic AI’s Impact On Chip Design

Chip Industry Week In Review

Agentic AI Is Changing Data Center Architectures

I/O Design Challenges Grow In AI Data Centers And HPC Clusters

Knowledge Centers
Entities, people and technologies explored

Related Articles

Advanced Packaging Limits Come Into Focus

Startup Funding: Q1 2026

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

When Semiconductor Materials Misbehave

TSMC Tech Symposium 2026, By The Numbers

CPO Is Extending The Limits Of What’s Possible In AI Data Centers

Silicon Photonics Lights The Way To More Efficient Data Centers

Sponsors

Recent Comments

About

Navigation

Connect With Us

tag: memory bandwidth

Characterization of GPU-based Inference for Reasoning-Centric LLMs (Micron, Argonne)

Large-scale, SRAM-based LLM Inference Deployment (Groq)

A Detailed Evaluation of A Production Server With High-End MRDIMM Main Memory (BSC, Micron, Intel, UPC)

Microarchitecture Tailored to 3D-Stacked Near-Memory Processing LLM Decoding (U. of Edinburgh, Peking U., Cambridge et al.)

New Automotive Architectures Are Shaking Up Processor And Memory Choices

An FPGA-based Accelerator Addressing Bottlenecks in GNN Preprocessing (KAIST et al.)

Impact Of On-Chip SRAM Size And Frequency On Energy Efficiency And Performance of LLM Inference (Uppsala Univ.)

AI Workloads at the Edge: Ensuring Performance, Privacy, and Security

Boosting Memory Bandwidth Availability By Salvaging Idle I/O Bandwidth Resources (Georgia Tech)

Optimizing AI Workloads For Edge Computing

Trending Articles

Chip Industry Week In Review

Executive Outlook: Agentic AI’s Impact On Chip Design

Chip Industry Week In Review

Agentic AI Is Changing Data Center Architectures

I/O Design Challenges Grow In AI Data Centers And HPC Clusters

Knowledge Centers Entities, people and technologies explored

Related Articles

Advanced Packaging Limits Come Into Focus

Startup Funding: Q1 2026

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

When Semiconductor Materials Misbehave

TSMC Tech Symposium 2026, By The Numbers

CPO Is Extending The Limits Of What’s Possible In AI Data Centers

Silicon Photonics Lights The Way To More Efficient Data Centers

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored