Experts at the table, part 1: DDR5 spec being defined; new SRAM under development.
Semiconductor Engineering sat down to discuss future memory with Frank Ferro, senior director of product management for memory and interface IP at Rambus; Marc Greenberg, director of product marketing at Synopsys; and Lisa Minwell, eSilicon‘s senior director of IP marketing. What follows are excerpts of that conversation. To view part two, click here.
SE: We’re seeing a number of new entrants in the memory market. What are the problems they’re trying to address, and is this good for chip design?
Greenberg: The memory market is fracturing into High-Bandwidth Memory (HBM), HMC, and even flash on memory bus. DRAM has been around for many years. The others will be less predictable because they’re new.
Minwell: The challenge is bandwidth. The existing memory interface technologies don’t give us the bandwidth that we need. Along with that, with additional power we’re having to go into stacking. That’s being driven by high-bandwidth memory. But there’s also a need to have embedded SRAM on chip in large enough quantities so there is low latency. We are assisting other IP providers with some new SRAM technologies, as well, which would bring in more density and lower power on the embedded front. There is stacking and external memory and the communication to that, and you have different types of memories being brought on in the embedded space. And further on, there are larger SRAMs for L3 caches, where they are possible.
Ferro: What it boils down to is better efficiency from a latency standpoint in the memory hierarchy. You have more bandwidth needs, but how do you get that bandwidth more efficiently? Everyone has been using DDR, and maybe getting HBM as another layer in the hierarchy. Right now there’s a big gap with flash. There is a lot of activity trying to fill the gap between DDR and flash with RRAM or XPoint. I don’t know if anyone has been particularly successful yet. But people also are looking at different server architectures to fill that gap. That gets more into the system challenges. There are all these multiple processors, and the question is how do we utilize memory more efficiently. That’s the big bottleneck right now.
SE: Let’s start inside the chip and work out. SRAM has been the workhorse for years. Will that continue?
Minwell: Yes, it will continue. We’re seeing very good qualities in finFET technology. Even as we look at 10nm and below, it’s scaling nicely. There are also some new bitcells and architectures. A typical SRAM bitcell size is six transistors. There are different architectures that may use less than six transistors, for example. Therefore, you would get more density and less power consumption. It’s about trying to fit more SRAM in the same real estate.
Ferro: You’re going to need some fast local memory. At the extreme level, for an MCU you have a very small amount of ROM and RAM that you have to fit everything into. The ability to expand that and not go off-chip will require SRAM. As you get bigger CPUs, that’s more about caches than SRAM.
Greenberg: The SRAM and local caches will always be there. The other part of this picture is how much, how big, and where do I go? You need to do architectural analysis to determine how much memory should I have at each level in the memory hierarchy, and also where it should be connected in the system.
SE: How many layers of cache do we need? Most companies have stopped at L3.
Greenberg: Sometimes people will talk about HBM as L4 cache. I haven’t seen L4 on cache on chip.
Ferro: That’s a big L4.
SE: Is there a way to use level 1 through level 3 cache more effectively?
Minwell: The standard IP that is available off the shelf builds SRAM to about a 1 megabit size. We’re seeing customers that want an L3 cache that is quite large—up to 16 megabits. When you’re building that on chip, you have to provide enough routing features to be able to efficiently route over this big monster. Once you do that, are you really saving enough area?
SE: Is there anything that ultimately will replace SRAM?
Ferro: Embedded DRAM has been kicking around for a long time. There are technical advantages to embedded DRAM, but the economics don’t seem to work well. The size is too big and the cost is too high. If it’s vertically integrated, then embedded DRAM could work because you don’t necessarily care. If I sell a chip that’s bigger than a competitor’s chip, I’m going to lose. But if it’s all vertical, maybe you can take advantage of power and performance savings with embedded DRAM. But we don’t see it.
Minwell: We’re seeing a couple of different ways people are managing this with 2.5D. There are some who want to put their toe in the water with HBM. They want the same PHY that supports interacting with the HBM stack, but also does chip-to-chip connectivity. We’re looking at some interesting architectures now that provide flexibility and almost application-specific packaging. Maybe that would lean toward partitioning the cache. The problem is that with L1 cache, there is still too much latency if it’s off-chip.
Greenberg: It’s just adding more bandwidth into the system somewhere else. It’s not happening instead of cache.
SE: So you’re adding granularity in terms of power and performance. You’re either moving it into cache or through another I/O channel and prioritizing what is more important, right?
Greenberg: You want to move the data the shortest distance that you can. The more local the data, the better it’s going to be. The challenge is predicting what that will be. Sometimes if you’re processing random data, you don’t know. You can’t keep it all close to the CPU. Figuring out what goes where and then allocating it appropriately is important.
Minwell: From a GPU perspective, it’s how to best partition the memory hierarchy and be able to feed those GPUs.
SE: Let’s move outside the chip. Everyone knows the price curve for DRAM. Will something replace it?
Ferro: DDR has a lot of life in it. Right now DDR5 is a work in progress.
SE: Just to be clear, that’s DDR5, not LP-DDR5, right?
Ferro: The spec is in progress in JEDEC right now. In general, we’ve demonstrated traditional DRAM interfaces running at 6.4 GB/sec on today’s standards. JEDEC is looking at those kinds of proposals. You probably can squeeze out 10 GB/sec or 12 GB/sec on a single-ended signal interface with some care around how the DIMMs are made. So we have a long way to go.
Greenberg: You can look at a graphics chip as an example of that. They’re already up in that speed range in commercial products.
Minwell: And in networking and communications, you wouldn’t think of this, but LP-DDR is a source for us because they’re providing wider buses—64 by 128—so that’s another way to fit in that gap of a DDR solution before going to high-bandwidth memory.
SE: If we move to DDR5, does that require moving to 16/14nm?
Greenberg: Memory manufacturers will go wherever it’s most efficient. They make up their own nodes.
Minwell: They usually don’t communicate that to us.
Greenberg: Density is part of the problem. As you try to scale you run into some silicon effects in DRAM that create system-level problems. There are issues with being able to hold the data in cells long enough, which affects write/recovery time. Those affect all of the memory manufacturers, so there are things that will need to be done in the next generation of DRAM to support larger devices.
SE: When can we expect to see the DDR5 specification?
Ferro: By 2020 you will probably see DDR5 memory.
Minwell: Yes, somewhere in that time frame. And if you look at the Gen 2 of high-bandwidth memory, it will be in mass production by the end of this year. That’s finally getting somewhere.
What’s Next For DRAM?
As DRAM scaling runs out of steam, vendors begin looking at alternative packaging and new memory types and architectures.
New Memory Approaches And Issues
What comes after DRAM and SRAM? Maybe more of the same, but architected differently.
ReRAM Gains Steam
New memory finds a lucrative niche between other existing memory types as competition grows.