Systems & Design

SPONSOR BLOG

Reducing Avoidable Memory Trips In HBM Systems

Last-level cache helps manage data movement and reduces pressure on the external memory subsystem.

June 25th, 2026 - By: André Bonnardot

Picture a highway during rush hour. When a road has limited capacity, traffic backs up quickly because only so many cars can move through at once. Adding more lanes increases capacity, but it does not always guarantee a smoother commute. If cars keep flooding onto the highway, if exits are poorly placed, or if drivers have to stay on the road for long distances, congestion can still build. More lanes help, but the system still depends on how efficiently traffic moves.

Memory systems face many of the same challenges. High-bandwidth memory (HBM) enables advanced AI accelerators and high-performance systems-on-chip (SoCs) to move large data sets quickly.

When bandwidth is not enough

This is where memory hierarchy becomes important. Even when total throughput is high, bandwidth determines how much data can move, while latency determines how quickly the system can respond. However, increased memory bandwidth does not eliminate delays. Each round trip to external memory adds time before the compute engine can continue, creating idle cycles that can become a performance bottleneck. When data is fetched suboptimally, HBM systems can hide inefficiencies in bandwidth headroom while still suffering from poor data reuse, unpredictable access patterns, and repeated trips outside the compute die.

A practical answer is to keep more reusable data on chip. A last-level cache (LLC) provides a solution because it sits between compute engines and external memory, as shown in Figure 1. CPUs, GPUs, NPUs, and other accelerators typically include their own local caches to reduce access latency for frequently used data. However, when data must be shared across engines or exceeds the capacity of the smaller caches, the LLC provides a common cache layer that can satisfy those requests before they reach external memory.

Fig. 1: An LLC keeps reusable data closer to compute. (Source: Arteris)

When the requested data is found in on-chip cache, the compute engine avoids the longer trip to external memory, reducing wait cycles and off-chip traffic. HBM provides the high-throughput movement required by large data sets. In these systems, an LLC improves locality by reducing how often requests have to reach external memory.

Using HBM more efficiently

Table 1 below shows three representative tiers in the memory hierarchy. GDDR5, HBM3E, and on-chip SRAM used as an LLC each play a different role in moving data through a modern SoC. Comparing them side by side helps illustrate the tradeoffs involved.

Memory Type	Where It Sits in the System	Primary Role	Engineering Takeaway
GDDR5	External memory subsystem	Provides external memory bandwidth for graphics processors and accelerators	Delivers significantly more bandwidth than traditional DRAM, but data still travels over relatively long paths to reach the compute engines
HBM3E	In the same package as the SoC, connected through a high-bandwidth interface	Provides extremely high-throughput memory access for AI, HPC, and data-intensive workloads	Dramatically increases available bandwidth and reduces latency compared to traditional external memory, but data must still leave the compute die and return
On-chip SRAM used as LLC	On the compute die, close to CPUs, GPUs, NPUs, and other accelerators	Stores frequently accessed or latency-sensitive data	Fastest access path in the hierarchy; reduces trips to external memory and helps convert available bandwidth into usable system performance

Table 1: How memory tiers fit into the SoC data path. (Source: Arteris)

CodaCache by Arteris is a configurable LLC IP solution designed for complex SoCs. It helps designers place a high-performance cache layer between processing elements and external memory resources. CodaCache sits in the LLC path between upstream interconnect traffic and downstream memory access, using on-chip SRAM for cache storage. Its role is to help keep high-value data closer to the initiators that request it.

This approach is useful in complex SoCs where data reuse, irregular access patterns, latency sensitivity, or contention among multiple compute engines can affect performance. In these situations, keeping more accesses local can reduce pressure on the external memory subsystem.

By handling more requests on chip, this cache layer helps manage data movement efficiently and maintain overall performance. Figure 2 illustrates the difference between a cache hit and a cache miss, where a hit follows a shorter on-chip path while a miss must travel to external memory.

The Arteris analysis in the figure below shows that adding an LLC can reduce average memory latency from 83 ns to 67 ns.

Fig. 2: LLC hits avoid longer external memory access. (Source: Arteris)

The diagram above highlights the performance impact of adding an LLC to the memory subsystem. An LLC with a 25% hit rate can reduce average memory latency by more than 20%. This demonstrates how even a modest cache hit rate can improve system responsiveness and memory access efficiency.

Reducing power beyond latency

The shorter hit path also matters for power. The advantage comes from reducing the number of accesses that reach external memory. Every HBM transaction requires activity across the memory subsystem, and sustained HBM use can consume significant power.

If the requested data is already present in CodaCache, the access can be satisfied on chip.
Cache hits avoid unnecessary HBM accesses.
Fewer external memory transactions reduce PHY and memory subsystem activity.
Lower demand on the memory path can improve total system efficiency.

In HBM-based AI SoCs, a high CodaCache hit rate can reduce HBM traffic and limit the cycles compute engines spend waiting for data to return from external memory. Even with wider data paths, a read request still has to leave the compute die, pass through the interface, reach the stack, retrieve the data, and send it back.

CodaCache last-level cache IP from Arteris supports SoC performance and power efficiency by reducing effective memory latency, memory bandwidth demand, and memory subsystem activity. The standalone LLC solution is designed to work in conjunction with FlexGen smart NoC IP or Ncore cache-coherent interconnect IP.

The HBM era is not just about building bigger memory paths. It is about making sure compute engines are not left waiting for data. The systems that perform best will not simply be the ones with the most bandwidth. They will use that bandwidth wisely.

André Bonnardot

(all posts)
André Bonnardot is Senior Manager of Product Management at Arteris, where he leads Cache Controller solutions and next-generation Network-on-Chip products. Formerly CEO of a semiconductor startup specializing in GaN epitaxy, he brings deep expertise in SoCs and microelectronics. His career includes design and leadership roles at Alcatel, Siemens, Infineon, and Intel. Bonnardot holds a master’s degree in Electronics from ENSERG Engineering School in Grenoble and an Executive MBA from KEDGE Business School.

Reducing Avoidable Memory Trips In HBM Systems

When bandwidth is not enough

Using HBM more efficiently

Reducing power beyond latency

André Bonnardot

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2026

Advanced Packaging Limits Come Into Focus

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

When Semiconductor Materials Misbehave

TSMC Tech Symposium 2026, By The Numbers

Silicon Photonics Lights The Way To More Efficient Data Centers

Memory Wall Gets Higher

Sponsors

Recent Comments

About

Navigation

Connect With Us

Reducing Avoidable Memory Trips In HBM Systems

When bandwidth is not enough

Using HBM more efficiently

Reducing power beyond latency

André Bonnardot

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2026

Advanced Packaging Limits Come Into Focus

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

When Semiconductor Materials Misbehave

TSMC Tech Symposium 2026, By The Numbers

Silicon Photonics Lights The Way To More Efficient Data Centers

Memory Wall Gets Higher

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored