New materials, new architectures and higher density have limited what can be done with DRAM, but it’s still king (Experts At The Table Part 3)
Semiconductor Engineering sat down to talk about DRAM’s future with Frank Ferro, senior director of product management at Rambus; Marc Greenberg, group director for product marketing at Cadence; Graham Allan, senior product marketing manager for DDR PHYs at Synopsys; and Tien Shiah, senior manager for memory marketing at Samsung Electronics. What follows are excerpts of that conversation. Part one of this discussion is here. Part two is here.
(L-R): Frank Ferro, Graham Allan, Tien Shiah, Marc Greenberg. Photo credit: Susan Rambo/Semiconductor Engineering
SE: Is density still a big challenge for DRAM? And is it about shrinking, or is it more about layering and packaging, which is what’s happening with HBM?
Allan: The need for increases in capacity outpaces the DRAM die’s ability to keep up. The DRAM die itself probably goes through a three or four year cycle to go, for example, from a 16 gigabit to 32 gigabit. That’s a giant step.
Shiah: We’re looking at additional stacks and layers of HBM, in addition to bit density per die. We’re looking at both options. The challenge we have is in terms of the overall thickness of additional layers. We’re at a point where we’re going to be at the height of a standard wafer. It becomes problematic if you need a stack that’s thicker than a standard wafer.
Allan: And 3D stacking for DDR4 and eventually DDR5, as well. And then increased capacity beyond that, you’re taking it on the DIMM and adding all the RC buffers and the data buffers. Registered DIMMS of 3D stacked devices is probably where you’re going to see the sweet spot for DDR5 for very very high capacity requirements. You can get 128 to 256 gigabytes, and maybe 512 gigabytes in the not-to-distant future, in one DIMM card. And that’s just DRAM.
SE: What does that do to the PHY? Does it make it more difficult to develop?
Allan: No, because when you’re talking to those you’re just talking to one point load. You’re talking to the RCD (register clock driver) chip and the data buffers. All ranks for the data channels are hidden behind the buffers, and the address fan-out is handled by the RCD chip. For us, it makes our job easier.
Greenberg: It’s all about capacity, cost, speed and power.
SE: Is off-chip DRAM always going to be the go-to solution, or will more of this be done on-chip, maybe with in-memory computing?
Greenberg: There needs to be a fundamental shift in what a computer looks like for compute in memory to really take off. There are companies using analog properties of memory cells to do interesting things. Those technologies are still very early in their development, but they’re really interesting. If those companies are successful, then we may see that shift. There is one company that used the analog property of a bit cell to do computation in the analog domain. You can fit a whole lot of flash bit cells on a die, and if every one of them can be a multiplier then you can start doing interesting things with that. But we’re still early in the technology to do things like that. It would be a fundamental shift in the architecture of the computer. People don’t like change. If you can prove the benefit of that, people will do it. But where we are today with the classical architecture of a CPU, cache on the die, memory off the die and storage beyond that, that won’t go away anytime soon.
Ferro: You can put in embedded DRAMs. Those have been around for a long time. But if you look at the knee of the curve on a silicon die, if you put that on-die it’s going to break the cost curve. If you’re vertically integrated and don’t care as much about selling that, you can use embedded memory. But, in general, the problems are the same.
Allan: There have been many attempts to do that over the years. In the end, there is just not enough volume to drive it. That’s why those process offerings evaporate.
Ferro: If you look at the metrics on embedded DRAM, though, in terms of performance and power, it’s extremely efficient.
Allan: The one area we’re starting to see some of the limits of embedded SRAM is leakage power. Every SRAM cell is two little paths that can leak current between the power and ground. If you put too many of those on your die, then all of a sudden your die is dominated by leakage power. So there are theoretical limits of how much memory you can put on an SoC, and not just related to the area.
SE: It’s the memory that’s leaking?
Allan: Yes, the SRAM cell itself.
SE: Is there any way to fix that? We’re certainly seeing transistors moving from finFETs to nanosheets.
Greenberg: There are people talking about putting carbon nanotubes down on dies. Some people are very bullish about that.
Allan: I’ve heard about what’s going to replace DRAM and SRAM for my whole career. It hasn’t happened, and I don’t see it happening.
SE: Getting down to 1x, 1y, 1z through scaling has been a challenge, though. Will that change the dynamics of this?
Shiah: If you look at the computing performance with Moore’s Law, in the past it has been about increasing clock speed. We’ve seen clock speed sliding out for the past 10 years. It’s more about multi-core architectures, and you need a lot of memory to feed those cores. That’s helping to drive the momentum toward increasing speed through more memory bandwidth. And in regard to high-end compute architectures for AI, you see two prevailing architectures now. One is near-compute memory, which is HBM-based. The other is in-compute memory, which is primarily SRAM-based. The companies looking at SRAM-based solutions for AI are finding they don’t have enough capacity for the SRAM. They’re looking at ways to incorporate HBM to get the capacity.
SE: We hear more about new materials coming into new devices. What impact does higher mobility in the materials have on memory?
Greenberg: I haven’t seen anyone wanting to make DRAM out of anything but silicon. But we do have these novel non-volatile memory architectures. Those often use interesting elements from the periodic table. There was an effort a few years ago to basically try out every element in the periodic table and every combination. DRAM has been dialed in for 30 years. The process of making them is so well understood, and we have gone so far down that track, that any other technology has to almost get as far to have a chance. But those other technologies are coming. If you talk to the MRAM vendors, and some of the other non-volatile memory vendors, they have been at it for 20 years. People say it takes about 20 years to really develop a memory technology and bring it to market. They’re getting there. But they have a very tough challenge going against DRAM. That’s been used by a very large number of people in very large markets for many years. It’s hard to improve upon something that already has been improved upon so much.
Allan: It’s a very resilient technology. The biggest change in the DRAM market is the amount of non-DRAM logic for other functions that is now on the die. Another example, just recently, is for DRAM to incorporate on-die ECC. That’s something people thought about for awhile, but the cost benefit never quite tipped the scale of it actually working. Now, when you put ECC on the DRAM, and you’re putting on roughly 12.5% storage space, you’re increasing the die by quite a bit. You’re getting a lot fewer die out of the wafer. But if you’re able to put those diced die into a package and sell them, because if you have a failing bit to the refresh testing, now all of a sudden you can correct for that failure. So you’re recovering extra yield, and overall the cost is starting to work into your benefit.
SE: What’s the driver for that?
Allan: You’re getting a higher margin on every die you produce. You’re able to rescue silicon that you otherwise would have scrapped.
Ferro: That’s what they used to use for audio-grade DRAM.
Allan: Yes. If the DRAM had a failing row, you’d sell it as half a DRAM for use in an answering machine. That’s one of the interesting things about HBM, too. Every time you build a new SoC that uses HBM, you’re typically going to go through this pipe-cleaning process. So you go through this mechanical evaluation of your whole assembly process. You’re putting your SoC on there and your DRAMs on there, and you see whether it’s reliable after you’ve manufactured it. You’d really like HBM scrap for that.
Ferro: Yes, yields may be good on TSVs, but it’s multiplicative.
Shiah: There’s a lot of repair capability, but it can only be stretched so far.
SE: So backing up a step, where is the bottleneck. Is the DRAM running faster than the pipeline to that DRAM?
Allan: There’s a relationship there that’s hidden from view. There’s a cycle time inside the DRAM core. For DDR3, we got to a burst length of 8, and you could mask that internal cycle time for a read or write operation. When we got to DDR4, we really didn’t want to go to a burst length of 16 and maintain that. If you want to double the speed, you’re not reducing that core access time in half. That core access time is pretty much a constant. As you shrink the size of the DRAM, you’re actually increasing the amount of time offsetting that. So you’ve got one thing getting faster and one thing getting slower. So with DDR4 they can up with this bank group idea. As long as you’re going between bank groups you weren’t interfering with these two read accesses. That’s how DDR4 managed to stay at a burst length of 8. Then along came DDR5, and they made the DIMMs dual channel. So instead of making them 172-bit channels with 64 bits of data and 8 bits of ECC, they became dual 32-bit channels with their own ECC. That way you can allow the burst length of 16 and hide that internal access time in the DRAM. But it’s now 16 x 40, so you still get your 64 bytes of data.
Greenberg: As you start getting down to the single-digit nanometer chips, those chips are all about computation. Then there’s an edge on the outside where you have to get all of that data in or out of the chip. I haven’t seen much optimization of materials in the single-digit processes.
Ferro: Ultimately it boils down to the application. You want to optimize on that curve with processing power and memory bandwidth. So you have a roofline model where some processors don’t look good for a particular application, so they’ve tuned the curve to maximize the throughput of, say, a TPU with the memory bandwidth. They don’t care about the materials. You want to optimize around what’s available.
Related Articles
DRAM Knowledge Center
Top stories, videos, special reports, technical papers and white papers on DRAM
HBM2 Vs. GDDR6: Tradeoffs In DRAM
Part 1: Choices vary depending upon application, cost and the need for capacity and bandwidth, but the number of options is confusing.
DRAM Tradeoffs: Speed Vs. Energy
Part 2: Which type of DRAM is best for different applications, and why performance and power can vary so much.
HBM2E: The E Stands For Evolutionary
The new version of the high bandwidth memory standard promises greater speeds and feeds and that’s about it.
Using Memory Differently To Boost Speed
Getting data in and out of memory faster is adding some unexpected challenges.
Latency Under Load: HBM2 Vs. GDDR6
Why choosing memory depends upon data traffic.
Leave a Reply