Moving AI Workloads To The Edge

By Ann Mutschler - 06 Nov, 2025 - Comments: 0

Experts At The Table: Semiconductor Engineering gathered a group of experts to discuss how some AI workloads are better suited for on-device processing to achieve consistent performance, avoid network connectivity issues, reduce cloud computing costs, and ensure privacy. The panel included Frank Ferro, group director in the Silicon Solutions Group at Cadence; Eduardo Montanez, vice president an... » read more

Heterogeneous System With Specialized HW For Disaggregated LLM Inference (Princeton Univ., Univ. of Washington)

By Technical Paper Link - 14 Oct, 2025 - Comments: 0

A new technical paper titled "SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference" was published by researchers at Princeton University and University of Washington. Abstract "Large Language Models (LLMs) have gained popularity in recent years, driving up the demand for inference. LLM inference is composed of two phases with distinct characteristics: a compute-boun... » read more

A Fundamental Rethinking Of Memory Hierarchy Design (Stanford University)

By Technical Paper Link - 05 Sep, 2025 - Comments: 0

A new technical paper titled "The Future of Memory: Limits and Opportunities" was published by researchers at Stanford University and an independent researcher. Abstract "Memory latency, bandwidth, capacity, and energy increasingly limit performance. In this paper, we reconsider proposed system architectures that consist of huge (many-terabyte to petabyte scale) memories shared among large ... » read more

Scaling DRAM Technology To Meet Future Demands: Challenges And Opportunities

By Rambus - 13 Aug, 2025 - Comments: 0

Since the invention of the 1T1C bit cell more than 50 years ago, DRAMs have become the main memory of choice for processors in computer systems and many consumer electronics devices. As new use computing paradigms have been created, including 3D graphics, cloud computing, smart phones, and AI processing, specialized processors and DRAM memories have been developed that are optimized for these u... » read more

What’s Different About HBM4

By Ed Sperling - 11 Aug, 2025 - Comments: 0

Memory bandwidth is limiting the flow of huge datasets that are needed to train AI models. There is much more data to process, store, and retrieve, but the speed at which that data moves through high-bandwidth memory (HBM) stacks is significantly lower than the speed at which data can be processed. Frank Ferro, group director for product management at Cadence, talks about the new HBM4 standard,... » read more

Expanding Server Memory Capabilities With Multiplexed Rank DIMM (MRDIMM) Technology

By Rambus - 16 Jul, 2025 - Comments: 0

The scaling of computational power within a single, packaged semiconductor component continues to rise following a Moore’s law type curve enabling new and more capable applications including machine learning (ML), generative artificial intelligence (AI), and training and deployment of large language models (LLM). On-demand lifestyle applications like language translation, direction finding, a... » read more

Detailed Study of Performance Modeling For LLM Implementations At Scale (imec)

By Technical Paper Link - 07 Jul, 2025 - Comments: 0

A new technical paper titled "System-performance and cost modeling of Large Language Model training and inference" was published by researchers at imec. Abstract "Large language models (LLMs), based on transformer architectures, have revolutionized numerous domains within artificial intelligence, science, and engineering due to their exceptional scalability and adaptability. However, the ex... » read more

Hardware-Oriented Analysis of Multi-Head Latent Attention (MLA) in DeepSeek-V3 (KU Leuven)

By Technical Paper Link - 04 Jun, 2025 - Comments: 0

A new technical paper titled "Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention" was published by researchers at KU Leuven. Abstract "Multi-Head Latent Attention (MLA), introduced in DeepSeek-V2, improves the efficiency of large language models by projecting query, key, and value tensors into a compact latent space. This architectural change reduces the KV-cache size and s... » read more

Arithmetic Intensity In Decoding: A Hardware-Efficient Perspective (Princeton University)

By Technical Paper Link - 03 Jun, 2025 - Comments: 0

A new technical paper titled "Hardware-Efficient Attention for Fast Decoding" was published by researchers at Princeton University. Abstract "LLM decoding is bottlenecked for large batches and long contexts by loading the key-value (KV) cache from high-bandwidth memory, which inflates per-token latency, while the sequential nature of decoding limits parallelism. We analyze the interplay amo... » read more

Lines Blurring Between Supercomputing And HPC

By Ann Mutschler - 27 Feb, 2025 - Comments: 1

Supercomputers and high-performance computers are becoming increasingly difficult to differentiate due to the proliferation of AI, which is driving huge performance increases in commercial and scientific applications and raising similar challenges for both. While the goals of supercomputing and high-performance computing (HPC) have always been similar — blazing fast processing — the mark... » read more

← Older posts Newer posts →

tag: memory bandwidth

Moving AI Workloads To The Edge

Heterogeneous System With Specialized HW For Disaggregated LLM Inference (Princeton Univ., Univ. of Washington)

A Fundamental Rethinking Of Memory Hierarchy Design (Stanford University)

Scaling DRAM Technology To Meet Future Demands: Challenges And Opportunities

What’s Different About HBM4

Expanding Server Memory Capabilities With Multiplexed Rank DIMM (MRDIMM) Technology

Detailed Study of Performance Modeling For LLM Implementations At Scale (imec)

Hardware-Oriented Analysis of Multi-Head Latent Attention (MLA) in DeepSeek-V3 (KU Leuven)

Arithmetic Intensity In Decoding: A Hardware-Efficient Perspective (Princeton University)

Lines Blurring Between Supercomputing And HPC

Trending Articles

Chip Industry Week In Review

Executive Outlook: Agentic AI’s Impact On Chip Design

Chip Industry Week In Review

Agentic AI Is Changing Data Center Architectures

I/O Design Challenges Grow In AI Data Centers And HPC Clusters

Knowledge Centers
Entities, people and technologies explored

Related Articles

Advanced Packaging Limits Come Into Focus

Startup Funding: Q1 2026

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

When Semiconductor Materials Misbehave

TSMC Tech Symposium 2026, By The Numbers

CPO Is Extending The Limits Of What’s Possible In AI Data Centers

Silicon Photonics Lights The Way To More Efficient Data Centers

Sponsors

Recent Comments

About

Navigation

Connect With Us

tag: memory bandwidth

Moving AI Workloads To The Edge

Heterogeneous System With Specialized HW For Disaggregated LLM Inference (Princeton Univ., Univ. of Washington)

A Fundamental Rethinking Of Memory Hierarchy Design (Stanford University)

Scaling DRAM Technology To Meet Future Demands: Challenges And Opportunities

What’s Different About HBM4

Expanding Server Memory Capabilities With Multiplexed Rank DIMM (MRDIMM) Technology

Detailed Study of Performance Modeling For LLM Implementations At Scale (imec)

Hardware-Oriented Analysis of Multi-Head Latent Attention (MLA) in DeepSeek-V3 (KU Leuven)

Arithmetic Intensity In Decoding: A Hardware-Efficient Perspective (Princeton University)

Lines Blurring Between Supercomputing And HPC

Trending Articles

Chip Industry Week In Review

Executive Outlook: Agentic AI’s Impact On Chip Design

Chip Industry Week In Review

Agentic AI Is Changing Data Center Architectures

I/O Design Challenges Grow In AI Data Centers And HPC Clusters

Knowledge Centers Entities, people and technologies explored

Related Articles

Advanced Packaging Limits Come Into Focus

Startup Funding: Q1 2026

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

When Semiconductor Materials Misbehave

TSMC Tech Symposium 2026, By The Numbers

CPO Is Extending The Limits Of What’s Possible In AI Data Centers

Silicon Photonics Lights The Way To More Efficient Data Centers

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored