DRAM Tradeoffs: Speed Vs. Energy

Experts at the Table: Which type of DRAM is best for different applications, and why performance and power can vary so much.

popularity

Semiconductor Engineering sat down to talk about new DRAM options and considerations with Frank Ferro, senior director of product management at Rambus; Marc Greenberg, group director for product marketing at Cadence; Graham Allan, senior product marketing manager for DDR PHYs at Synopsys; and Tien Shiah, senior manager for memory marketing at Samsung Electronics. What follows are excerpts of that conversation. Part one of this discussion is here.


(L-R): Frank Ferro, Graham Allan, Tien Shiah, Marc Greenberg. Photo credit: Susan Rambo/Semiconductor Engineering

SE: How does on-chip memory compare with off-chip memory?

Ferro: That’s the most efficient. There are some accelerators that use all SRAM. That’s really expensive, but what they can do is cache it carefully and try to minimize the data movement.

Allan: There’s another bottleneck that SoC designers are starting to struggle with, and it’s not just about bandwidth. It’s bandwidth per millimeter of die etch. So if you have a bandwidth budget that you need for your SoC, a very easy exercise is to look at all the major technologies you can find. If you have HBM2E, you can get on the order of 60+ gigabytes per second per millimeter of die edge. You can only get about a sixth of that for GDDR6. And I can only get about a tenth of that with LPDDR5. So if you have a very high bandwidth requirement, and you don’t have a huge chip, you don’t have a choice. You’re going to run out of beachfront on your SoC if you put down anything but an HBM interface. Some of these things can help the customer make these decisions. You can either re-evaluate how much bandwidth you need. You can find chips with 12 GDDR6 interfaces. Three sides of the die are memory interfaces. You can actually put two DDR5 interfaces between an HBM PHY—they have a space because you have to pattern on the DRAM. The HBM can be considered almost a cache, so you can put two DDR interfaces between them and you have HBM in the package and on the other side of the package on the PCB you have four DIMM slots—all for what it would take to put down four to six GDDR6 interfaces.

SE: Can a lot of this be planned out ahead of time. One of the challenges is there is so much complexity with loads and use cases. The loads change depending on who’s using it, when they’re using it and what device is involved. How do you assess what gets used where and how does that affect which memories should be used?

Greenberg: There are a few things you always want to look at, such as how much energy it’s going to take. That translates into power. HBM is the clear winner on energy per bit. And then you have to look at the cost of the memory and how difficult it is to manufacture. And you need to look at how much capacity you need. Not every memory technology can implement the same amount of memory capacity on a channel. It becomes very personal to every chip. The architect of that chip has to know what are they shooting for. We can help, but they need to know what they’re shooting for.

Allan: A great example of that is the SSD market. You start off with a relatively small amount of SSD, because they don’t have huge capacity. The amount of DRAM you need is a buffer proportional to the size of the SSD and the controller. So if you have a relatively small one, you can get away with LPDDR4 today. But you can’t get a lot of capacity from LPDDR4. They’ll try to stuff in two ranks of LPDDR4, and some of these guys will try to stuff in four ranks of LPDDR4. That has no target rank determination at all. That’s extra stuff that just kills the performance. If you move beyond that, you start to see DIMMs being used in enterprise-class solutions. So they’re all storage, but depending on which side of the application class you’re on, that will determine which memory is most appropriate for you. Another issue with DDR is all of the reliability and seviceability. Those are the kinds of features that servers require, which is why they’re all built into DDR5. But they’re starting to trickle into some of these other components, as well. LPDDR5 inventing Link ECC is a huge move. We’ve never seen that in a mobile memory before. And for the previous generation, we’re were all developing products where you had to dedicate some of the memory to it and some of the bandwidth to the checking of it, but with LPDDR5 you don’t have to do that anymore.

Ferro: With reliability, that narrows the choices because you also have to bring in the process node. So how many choices do you have? We’re talking about the networking having high reliability requirements, and then other applications like for DARPA, and then automotive. So now you’ve narrowed you choices down. Which process node is automotive-grade qualified. LP led the way there. It was a good way for the DRAM vendors to break away from the mobile market in qualifying LPDDR for automotive.

SE: What happens when you start dropping the voltage down on designs? There is talk of significantly reducing the voltage at 5nm and 3nm.

Allan: The DRAMs for a long time had one voltage for the core and the I/O. And then along comes something like LPDDR4x, and everyone questioned why didn’t they do that a long time ago. Now that the break is finally made, where the memory core and the supporting circuitry is operating at one voltage and the I/O has the ability to change to something different, there are a lot of options. With LPDDR5, you can have different power supplies, depending on the speed you’re trying to hit. You can vary that in real-time in the end applications to save power. You can target the power based on what you’re actually doing, rather than dialing down the frequency. You can play with the power, with termination settings, and with some of the timing settings. All of these things can optimize the power for every specific state of operation of that interface.

Shiah: We tend to be going down in terms of voltage and power requirements with every new generation. If you look at GDDR5, typically that was 1.55 volts. With GDDR6, the I/O is going down to 1.35 volts. And with LPDDR4, it’s typically down to 1.1 volts. HBM is typically 1.2 volts. But in terms of edge space and performance, if you look at the highest-performing GPU graphics card today, it’s a layout of about 12 GDDR chips. GDDR5 at the last generation was GTX 1080Ti. Now it’s RTX 2080 GDDR6. But for the longest time, the highest-performing card was 12 gigabytes, which is 12 of the GDDR5 chips laid around the GPU, giving you 384 gigabytes per second in that very high-power-consuming, very real-estate-expensive configuration. If you look at HBM, with the new generation of GPUs, two stacks of HBM2 on that GPU will give you 0.5 terabytes per second. Typically, the higher-end GPUs have four stacks, which is over a terabyte per second, versus the fully laid out GDDR configuration, giving you 384 gigabytes per second. So if you want the highest bandwidth, highest-performing, lowest-power, least amount of real estate, HBM is the way to go.

SE: If you drop the power on that even further, does it affect reliability?

Shiah: Yes, it starts affecting the signals. If we look at the next generation, HBM3, the spec for the I/Os is even lower power.

Ferro: It affects the signal integrity on the I/O. We’ve done very low swing I/Os for LPDDR, and non-standard ones going in 0.5-volt swings. As you lower the I/O voltage, clearly the challenge comes on the physical design side of the PCB, or even on the interposer. You think interposer channels should work perfectly, but they’re silicon and they’re very resistive, and that changes the characteristics of what you’re doing. It moves the chip design easier, but it makes the system harder as you lower the I/O voltage. That doesn’t include the core voltage.

Allan: We get very little of the timing budget as the IP suppliers to the SoC. We all have to help our customers define the timing budget.

SE: When did that start?

Allan: It’s not that new. It started at about 800 megabits per second, because that’s when you could really get into trouble if you just continued doing what you always did. You couldn’t get to your target frequency of 800 megabits per second. Now, it’s absolutely critical. You’re talking about voltages and timing numbers, but they can never be an accurate representation of what happens in the system because they always have to make all of the worst-case assumptions happen at the same time. You have to temper that, and there are techniques you can use to statistically apply these things to make sure you aren’t going overboard. Otherwise, you’re just adding more cost into the system to correct for it. You might put more capacitors in the package, or have more decoupling on your SoC. You might need more power and ground balls to properly isolate against crosstalk on the package balls. The voltage and the power consumption issues come into play there, as well, because these signals are transitioning, and those signal transitions are taking a big chunk of that timing budget. We get about 30% of the UI of the PHY in the SoC. The DRAMs get the lion’s share, because they design their own standards. And then the channel gets the rest.

Greenberg: It’s always a question of how much do you want to pay for the memory. Anything can be done, but the industry comes together in JEDEC and makes a collective decision about what’s best for everybody. Hopefully, we minimize the cost/function for everyone, and we do okay on that. On the timing budget issues, we have gone from a place 10 years ago with 800 megabits per second where we gave customers a spreadsheet. It was a one-page spreadsheet. We now have a whole team of people who are responsible for the timing budget, and they use tools like timing solvers for 3D electrical field modeling problems to figure out how you get a signal from one place to another place with an appropriate timing budget. The level of complexity has gone from one person with a spreadsheet to a full team of people.

Allan: It does allow us to differentiate, too. Memory interfaces were all interfacing to standard products, and your products have to interoperate with other products. But we don’t have any standards. We’re on the other end of the channel. We’re not compliant with JEDEC standards. We’re compliant interfacing to DRAMs that are compliant with JEDEC standards. For what we do there are no standards, so we have to make decisions about what is the optimal area and power consumption for a variety of customers, because ideally we want to be able to build something once and sell it to a number of customers.

SE: There are different ways of adding memory into a device. One is to put pillars directly onto a chip, where you don’t have to run through an interposer. How is that working? Can we get to the point where, if we place memory differently, we can reduce latency and performance bottlenecks?

Ferro: There are big, vertically integrated companies attacking that problem with DRAM. Even in the specs we’re doing on HBM, people are asking to sort through 2.5D and 3D designs. We haven’t seen any 3D designs yet. But the PHYs are rated for much higher temperatures than the DRAM, so you have the lowest common denominator.

SE: What happens if the DRAM gets hot?

Allan: The charge leaks out of the cell, so you have to refresh. This is why the refresh requirements for DRAMs go up as the DRAMs heat up. You see a lot of DRAMs with thermal sensors on them to keep track of that.

Ferro: This is why Wide I/O looked like a great solution. But you need a company to lead the charge there, and you need the volume to drive it. No one has been able to drive this to a standard. You need volume to drive that cost down.

Allan: But the good news is TSVs are becoming mature. They’re used in HBM, they’re used in 3DS DDR4.

Related Stories
HBM2 Vs. GDDR6: Tradeoffs In DRAM
Part 1: Choices vary depending upon application, cost and the need for capacity and bandwidth, but the number of options is confusing.
Memory Options And Tradeoffs
What kinds of memories work best where and why.
Memory Tradeoffs Intensify In AI, Automotive Applications
Why choosing memories and architecting them into systems is becoming much more difficult.
GDDR6 – HBM2 Tradeoffs
What type of DRAM works best where.
Latency Under Load: HBM2 Vs. GDDR6
Why choosing memory depends upon data traffic.
Target: 50% Reduction In Memory Power
Is it possible to reduce the power consumed by memory by 50%? Yes, but it requires work in the memory and at the architecture level.
Hybrid Memory
Tech Talk: How long can DRAM scalIng continue?



1 comments

Kevin Cameron says:

If one calculates the Silicon area used, maybe it’s cheaper to skip the DRAM and just die-stack CPUs with NVM. Given that CPU performance is mostly about not missing cache, maybe having more CPUs/caches is more important than DRAM performance.

But why would folks selling DRAM suggest that solution 😉

Leave a Reply


(Note: This name will be displayed publicly)