Home

TECHNICAL PAPERS

Making high-capacity data caches more efficient

Cache-management scheme that they say improves the data rate of in-package DRAM

October 24th, 2017 - By: Technical Paper Link

Source: Researchers from MIT, Intel, and ETH Zurich
Xiangyao Yu (MIT), Christopher J. Hughes (Intel), Nadathur Satish (Intel) Onur Mutlu (ETH Zurich), Srinivas Devadas (MIT)

Technical Paper link

MIT News article

As the transistor counts in processors have gone up, the relatively slow connection between the processor and main memory has become the chief impediment to improving computing performance but now, researchers from MIT, Intel, and ETH Zurich have created a cache-management scheme that they say improves the data rate of in-package DRAM caches by 33 to 50 percent.

4 comments

Kev says:

November 21, 2017 at 11:40 pm

Sticking a bigger/better cache on a CPU doesn’t fix the fundamental problems with the architecture. Really you want to do processor-in-memory (PiM). Most of the NVMe cards for the storage have ARM (or similar) CPUs managing the storage, and it’s a lot quicker to ask them to do the work than shoveling the data back and forth to a central X86 CPU.

This gets worse as you start die-stacking memory for density, but don’t get any more I/O pins (as with pins on a PCIe card), so you need to move the memory controller and compute into the die stacks.

Surfaces and edges are a dimension down from volumes and areas.

tom andreson says:

August 8, 2018 at 10:47 am

I have checked Apple iPhone Support and read something about the way of making the high capacity data caches. It is very important and the efficient as well for the user. We must know about this in detail.

Jacob Monberg says:

April 8, 2019 at 1:12 pm

Cache management schemes become more important as core counts go up, and when chiplet CPUs create more complex overall caches.

Chiplet CPUs could really benefit from L4 cache. For example if we have 64 core (128 thread) CPU that 4 chiplets, each having two 4 core clusters that both have their own L3 eviction caches, things become complex.

It would mean 64 x L1 data, 64 x L1 instruction, 64 x L2, 8 or 16 L3 eviction caches. We could add 512mb L4 cache and maybe create hybrid L3 that is not normal cache or eviction cache, but hybrid. L4 could act as “separate entity” that could have some logic to predict which data to preload.

I would like to see more research on caching. In the next decade things are going to change pretty drastically. APUs are becoming more popular and powerful, which complicates and/or adds possibilities on L4 caches.

Stacked chips could change things and allow much bigger L3 caches, which could benefit from very different schemes. And also on GPU front, machine learning/AI and GPGPU might benefit from changing caches.

Caches have been my hobby for a long time. I have used countless hours on wondering possible improvements. Larger shared caches with higher latency have almost certainly possible optimizations available. Just don’t have quite enough “transistor level” knowledge to be sure if those are feasible in real life.

Anyway, this article made me happy.

Leonard Anderson says:

December 18, 2019 at 2:58 am

I know AMD is changing the cache structure for Epyc Milan, But , they are also working with Micron and samsung to make HBM2e memory stacks on nvme pcie drives that are direct parallel to the cpu 8 or 16 pcie lanes bypassing the pcie buss .so on the server rack these memory drives would be set between each NVME m.2 storage port, so the NVME pcie m.2 memory drive port could be 3 or 4 stacked drive connectors with standoffs between each module allowing for heat dissipation. this would do away with the much slower Dram slots, this new approach is expected to give a 10x speed performance to push massive bandwidth at blazing speeds. as forrest Norrod recently stated that nvme is the future for new hardware applications, i see no reason that this is not absolutely doable.

Knowledge Centers
Entities, people and technologies explored

Shift Left Is The Tip Of The Iceberg

A transformative change is underway for semiconductor design and EDA. New languages, models, and abstractions will need to be created.

by Brian Bailey

Partitioning In The Chiplet Era

Understanding how chiplets interact under different workloads is critical to ensuring signal integrity and optimal performance in heterogeneous designs.

by Ann Mutschler

NAND Flash Targets 1,000 Layers

New techniques go beyond improved deposition and etching, but challenges stack up, too.

by Bryon Moyer

3.5D: The Great Compromise

Pros and cons of a middle-ground chiplet assembly that combines 2.5D and 3D-IC.

by Ed Sperling

AI’s Role In Chip Design Widens, Drawing In New Startups

Focus is on letting engineers do much more with the same or fewer resources — and less drudgery.

by Karen Heyman

What Comes After HBM For Chiplets

The standard for high-bandwidth memory limits design freedom at many levels, but that is required for interoperability. What freedoms can be taken from other functions to make chiplets possible?

by Brian Bailey

Memory Fundamentals For Engineers

eBook: Nearly everything you need to know about memory, including detailed explanations of the different types of memory; how and where these are used today; what's changing, which memories are successful and which ones might be in the future; and the limitations of each memory type.

by The SE Staff

Why Small Fab And Assembly Houses Are Thriving

Megafabs churning out the most advanced processors are not the only game in town.

by Bryon Moyer