Will Wide I/O Reduce Cache?

Stacking promises to improving the performance and bandwidth of memory subsystems, but it’s not clear what impact it will have on cache.


By Ann Steffora Mutschler
In an ideal world, all new SoC technologies would make the lives of design engineers easier. While this may be true of some techniques, it is not the case with one advanced memory interface technology on the horizon, Wide I/O.

There are claims that Wide I/O could reduce cache, but so far this is not widely understood. In fact, exactly how Wide I/O will be used, what the benefits will be and when it will become a mainstream technology are hazy at best

Marc Greenberg, director of marketing for Cadence’s SoC realization group believes Wide I/O will reduce cache but cautions that there is no one answer for that because every system is going to do it a little bit differently. “In some cases you might say some of the L2 or L3 cache could move into a Wide I/O device. That’s certainly a possibility. Or maybe not all of it, but perhaps some of it—maybe none of the L2 but all of the L3. It’s also possible that the Wide I/O becomes sort of an L4 cache to some other, even more distant memory; it becomes a new layer in the memory hierarchy,” he said.

Greenberg believes all of these options will likely be seen in different chips.

Cadence's Greenberg: No simple answers.

“The real thing about cache is that you want to keep small, fast memory close by, and then slower, larger memories farther away. Unless you have a super-fast memory off-chip, Wide I/O will not remove cache from on-chip. The fastest off-chip memory today is still much, much slower than on-chip SRAM, so you’ll always have cache on-chip as far as possible,” said Prasad Saggurti, product marketing manager and senior staff for embedded memory in Synopsys’ test and repair group. “As you need to go to larger sizes—if you go to L2 cache, even that tends to be on-chip. You could have a situation wherein instead of doing DRAM and using that as a L3 or so on being the main memory, you might have an intermediate that replaces DRAM or complements DRAM by having a Wide I/O to regular memory and then have this.”

Synopsys' Saggurti: Cache will always be on-chip.

In general, Wide I/O is seen as a way to take care of I/O speeds so instead of going to a DRAM through a high-speed serial interface, Wide I/O could be used to reduce latency.
Early adopters of Wide I/O have been in the mobile space for cell phones and tablets. In that case, the Wide I/O is replacing the main memory, observed Cadence’s Greenberg.

“There have been people hinting at not being able to stack enough DRAM on top of, perhaps, a tablet processor, so you might want to have another tier of RAM further out in memory. In that case, the Wide I/O becomes either an L3 or L4 to some even more distant memory.”

Steve Hamilton, applications architect at Sonics, stressed that stacking could theoretically maximize L2 and L3 caches, even though that is not likely any time soon. He said there are some people looking at using through-silicon vias to place a denser memory close to processors. A bigger cache can be placed in the same space, but there are a number of reasons—both physical and economic—why that does not yet make sense.

“This would require a custom memory chip to perfectly match the floorplan of the SoC,” said Hamilton. “Economics don’t support that. Managerially, you would then need to coordinate two custom chip developments to intercept at some point. That adds risk. Then there are restrictions on where the TSV columns could be placed on a die that we don’t fully understand yet. The dies expand and contract in operation due to heating. That may stress the connections or crack the die if it’s not engineered correctly. We don’t have enough experience yet to know those rules.”

Sonics' Hamilton: Unlikely to reduce cache.

It makes a lot more sense to start with a single common interface point, as this allows for mechanical expansion. By defining it as a physical standard, just as other interfaces have done, it allows independent manufacturers such as the DRAM and SoC vendors to do their own thing. The common standard also amortizes the development costs over a larger set of applications. So something like Wide I/O is a perfect starting point for TSV technology, he said.

But when it comes specifically to Wide I/O, Hamilton doesn’t believe the technology will reduce cache at all. “Wide I/O provides a wider (4-channels) interface to DRAM, but operates at a lower frequency than DDR3. Wide I/O also has some painful restrictions on page access rates. So the total bandwidth is only slightly improved. Worse, this is just I/O bandwidth. The actual access time to DRAMs (latency) is not changing at all. Caches are used to minimize read latency and increase memory bandwidth (for an access stream having locality). So as long as there is external DRAM of any type (with high latency) there will be caches.”

This is the most frustrating thing for SoC developers who need low latency, high bandwidth, and low power from DRAM, and not high density. Meanwhile, DRAM vendors keep marching down the path they understand—more density with each generation. They only reluctantly have moved to newer specs that increase I/O bandwidth. But these specs are increasingly hard to use. They rely, for example, on access in larger chunks than processors may need, while doing nothing to address latency.

“As long as the server guys—who need density—are the majority of demand there is not sufficient motivation for the DRAM vendors to optimize for what the mobile folks need,” Hamilton believes.

Interestingly, eDRAM does have the potential to reduce caching. “Processors generally use private L1 caches, and share L2 caches across small (2 to 4) clusters. eDRAM radically improves latency (and bandwidth, and power). So it becomes possible to consider eliminating or reducing L2 caching when eDRAM is used,” he noted.