Solving Memory Subsystem Bottlenecks In 3D Stacks

But questions emerge about just how to get there and whether TSVs will be a long-term solution or a research project.


In today’s do-or-die market environment, many SOC makers strive to differentiate their product based upon the rate at which it performs processing. Closely coupled are power concerns that have led to dominance of a multi-core approach, while economic considerations have resulted in the dominance of the Unified Memory Architecture, where all the processors share access to external DRAM. Stacking techniques promise to solve the bottleneck issues that plague today’s memory subsystems.

Memory access performance is the key design constraint system architects grapple with as it is the service rate of memory access—specifically, the external bulk DRAM target—that determines the rate of processing in the system, explained Steve Hamilton, applications architect at Sonics. “A typical design today will re-use most of the processors from a previous design. Perhaps one or two of them will be slightly modified. And maybe one or two newly developed processors will be included in the design. But the bulk of the design effort on the new chip will be directed at re-targeting the memory subsystem to service the performance goals of the collection of processors.”

With performance the primary consideration, specifically increased bandwidth, this has already led to faster DRAM speeds as well as using multiple DRAMs in parallel using multi-channel.

“But there is a limit to multi-channel because of the number of pins used, and the additional cost of each DRAM,” Hamilton noted. “So to go further, you need a denser die-to-die interconnection capability.”

Many believe that denser die-to-die interconnect will be through-silicon via (TSV) technology, combined with Wide I/O—four channels of what looks very similar to LP DDR2 DRAM in parallel. As such, Wide I/O uses a lot of signals to connect between the DRAM and the controller, and actually does not require a TSV. Hamilton said there is also the more mature micro-bumps that some companies are leveraging to implement Wide I/O. Still, he said, TSVs have the potential to provide a very effective high-density die-to-die connectivity, making them a natural choice for something like Wide I/O.

“Right now the architect’s concern with TSVs (and Wide I/O as well) is cost. It is not yet proven that TSVs can be done with high-enough yield and low-enough cost to become mainstream. It requires the wafers to be thinned down so the vias can go through. This makes the wafers fragile. Sometimes when you are adding a die to the stack it breaks and you have to start over. That yield loss drives up cost. So they are still tuning the process to find the right equipment and the right steps and the right vendor-to-vendor relationships to make it work repeatedly,” he pointed out.

Stacking also gives the architect another degree of freedom in terms of how the memory subsystem is built—and which type of memory mix is going to work for that secure application.

“It gives a greater degree of flexibility where they can ask, ‘What is our application, price point, do we want to go for stacking, do we want to use a wide or narrow memory bus,’” said Neil Hand, group director of marketing for the SoC realization group at Cadence. “From an architecture exploration perspective it gives them a fairly significant new degree of freedom that they can play with.”

Source: Cadence

Challenges as well as benefits

Along with the benefits of new techniques like stacking there are challenges, particularly since the technology is still in early development phases for many companies. Joe Rash, senior director of business development at Open-Silicon, said there are projects ongoing today where it is more cost effective to take a known good die from a company like Micron and stack it onto a logic chip using more wire bond and a redistribution layer—not a TSV—but more standard techniques that are done today. The alternative is to use a 6T SRAM or embedded SRAM or embedded DRAM.

He explained that embedded SRAM makes the single-chip solution a very large die, which means it is not as cost effective from a yield standpoint. “You could have a very large single die with a big SRAM that accomplishes that task. You could have a smaller single die with embedded DRAM. But eDRAM is only offered from a few limited suppliers, and even where it is offered there is a minimum volume run rate and certain other constraints to using eDRAM. Sometimes we are seeing that the most cost-effective solution is to use the smallest logic die and stack a DRAM chip.”

In addition, LPDDR2, which is the standard for DRAM today, is basically running out of bandwidth compared to what the devices need.

“If we want devices that are capable of 4K video and 3D gaming, there just isn’t enough bandwidth in an LPDDR2 device, so something has to change, said Marc Greenberg, director of marketing for Cadence’s SoC realization group. “The industry has had a few different starts on this – the first start was to look at a serial connection to DRAM, which may still happen in some marketplaces and not in others. It’s still an open discussion. Wide I/O is certainly a discussion. The other discussion is simply extending the bandwidth of the traditional parallel interface. Right now, all of the options are on the table.”

Other issues that come up often deal with latency and throughput/bandwidth, noted Navraj Nandra, senior director of analog/MSIP at Synopsys. “We have some customers that are showing off their products where form factor—for a camera or some kind of mobile device—is important. It needs to be very small, but the memory requirements are significant because you’re moving lots of pixels around if it is a high-performance digital camera. It all comes down to how much memory bandwidth can you have in your stacked system in a small-enough form factor but still maintain some low latency with some throughput, because throughput and latency you can trade one against the other.”

Interestingly, Nandra said there are some stopgap technologies being used on the way to Wide I/O and TSV. One of these is LPDDR3—low-power double data rate type memory that is twice the speed of LPDDR2 with the goal of achieving the bandwidth and speed but not having to go to an expensive package technology that Wide I/O would require.

In terms of conventional memories, high-speed memory interface DDR4 looks to be useful when it’s in the 3.5 to 4 Gbps region, he continued. “Our simulations that we’ve done at the system level, which includes a lot of the package artifacts at those high speeds, show us that you’ve got to add quite a bit redundancy in the memory controller in order to handle all the skews that you’re getting at the high speeds in the package. When you add extra redundancy you reduce the available bandwidth that you have to actually use the information that you are interested in so you’ve got to increase the speed.”

This isn’t just talk. There is activity in JEDEC to define some of these things, especially when it comes to high-speed memory interfaces both in LPDDR3 and LPDDR4 which may somewhat delay the introduction of more sophisticated technologies like Wide I/O.

Market drivers
Driving Wide I/O and TSV technologies are strong economic incentives.

“The volume driver for DRAMs today is server blades. The server farm operators want to pack a bunch of servers in a small space. They don’t want to spend much on electricity or air conditioning, and each server needs a lot of DRAM. Wide I/O has the lower voltages and clock rates to address the power-density issues, and TSVs should allow multiple DRAM die to be stacked into a very small form factor,” Sonics’ Hamilton said.

Mobile devices are the second biggest volume driver for DRAMs. While they may use fewer DRAMs per device, there are a lot more devices. “Mobile devices have the same requirements of its main memory—packaging density and low power. Micro-bumps can support a single DRAM die in mobile devices. But increasingly, handheld devices have become small computers and need more DRAM. TSVs allows a stack to include multiple DRAMS, flash dies, and the SoC. So there are big markets driving the manufacturers to work out the kinks in TSVs. They are not there yet. But the progress is looking quite promising,” he concluded.