DDR5, NVDIMMs, SGRAM, 3D XPoint add more options, but solutions may be a mix and much more complex.
DRAM extensions and alternatives are starting to show up inside of data centers as the volume of data being processed, stored and accessed continues to skyrocket. This is having a big impact on the architecture of data centers, where the goal now is to move processing much closer to the data and to reduce latency everywhere.
Memory has always been a key piece of the Von Neumann compute architecture. What’s typically deployed inside data centers today are DDR4-based DIMMs, as well as some legacy DDR3, which is directly attached to the CPU. But those are far too slow for many of the new applications inside of data centers, including artificial intelligence, machine learning and deep learning, so work is underway to improve the speed at which that data can be accessed without a huge impact on the power budget.
DDR5 will offer some significant improvement over previous DRAM verions. For years, memory makers insisted there would be no follow-on to DDR4. That changed once streaming video and almost ubiquitous image processing came into view with the introduction of smartphones and other mobile devices. According to JEDEC, DDR5 will provide double the bandwidth and density over DDR4, and allow better channel efficiency.
But DDR5 alone will likely not be sufficient. The amount of data being generated by sensors in a variety of connected devices across multiple markets is exploding, and so is the demand for compute power to make sense of all of that data. Machine learning/AI/deep learning, separately and as extensions of markets filled with connected devices, are only adding to the need for additional types of memories.
One option is GDDR5, which is a type of synchronous graphics random-access memory (SGRAM). GDDR5 was developed for use in graphics cards, game consoles, and high-performance computation. Nvidia uses this technology in its graphics cards, for example, to help with deep learning and machine learning applications.
“If you look at machine learning and deep learning it’s basically two things – a data set that must be trained, and the resulting model, which is able to then be deployed and make inferences,” said Sarvagya Kochak, senior product marketing manager at Rambus. “The training of the data is extremely intensive in terms of the compute resources that it needs, which is why heavy-duty GPUs make a good match for them. These are low-precision, floating-point operations that GPUs are better suited for versus CPUs. FPGAs also can be used, and some companies do that. They offer better power consumption figures. A lot depends on the type of model that is being worked on.”
All of those architectures have one thing in common. There is a huge amount of data that needs to be processed and stored, and it needs to be accessed extremely quickly. So far, no memory type serves all needs, and inside of data centers that has prompted some unusual combinations of technologies.
“We are starting to see these applications with people trying to do AI-type things, deep learning, and super computing, and that is where we are seeing a lot of interest in HBM2 (high-bandwidth memory 2),” said Marc Greenberg, product marketing group director for Cadence’s IP Group. “One stack of HBM2 devices gives you terabits per second of memory bandwidth, which is just huge. But with the technology that we have today, you only get to store 4 gigabytes in a single stack of HBM2. In the grand scheme of things, that’s not a lot of data, so you won’t be able to store a very large database in the HBM. But you might store an index into the large database in the HBM. That kind of architecture could include, for example, some small number of gigabytes of HBM, which could be accessed very quickly, and then a larger amount of memory could be stored out on the DRAM bus, either as DRAM or NVM. So you’re really using both solutions there.”
Fig. 1: HBM2 configurations. Source: SK Hynix
While the general memory trend for more bandwidth and more capacity still holds, how to get there is changing.
“We have bandwidth being provided at very very high rates by the HBM2 technology, which at present is being used for graphics applications, for networking switch-type applications, and supercomputing/high-performance computing types of applications,” Greenberg said. “You’re not going to find that yet in a mainstream server. The mainstream server is much more about capacity. If you want to run a large number of virtual machines or some giant database application or something like that on those servers, then you need lots of memories stored close to the CPU. Storing close to the CPU is important. You can put terabytes and terabytes of data on an SSD or a disk, but it is kind of far away in terms of the amount of time it takes to access that data. As a result, one trend now is to try to always put more memory on the DRAM bus, which can be done with DDR5, or by potentially putting some flash memory on the DRAM bus. If you want to put some flash on the DRAM bus, there are a few different ways, including the NVDIMM-N protocol.”
Limits and solutions
There are technical hurdles to overcome at every turn, however. For example, how much DRAM can be directly attached to a CPU is limited, and that remains a big issue for throughput and overall processing performance.
“That’s problem number one,” Kochak said. “That fundamentally has been the biggest challenge because the rate of scaling of DRAM devices hasn’t really kept up. It’s been difficult for large memory manufacturers like Samsung, SK Hynix and Micron to get the densities higher and higher. Right now, 4- and 8-gigabit devices are shipping in mass production in high volumes. 16 gigabit should be on the market in a year or two, but this poses a pretty significant challenge because if you look at it from a CPU perspective, CPU speeds haven’t gone up over the last decade. They have more or less leveled off because transistor speeds cannot be bumped up more than they are right now.”
To compensate, CPU vendors are adding more cores inside of processors. But that adds its own set of issues.
“The problem with this approach is that within the CPU is an integrated memory controller, which interfaces with the DRAM DIMMs,” Kochak said. “With older CPUs, there would have been two memory controllers. Now there are four memory controllers. Soon there will be six memory controllers, and the projection is that there may even be eight memory controllers per CPU. The reason why there are more memory controllers is because there needs to be more data to feed all of the cores, so if there are fewer memory controllers, the data can’t be fed from the DRAM to the cores. If the cores are hungry, there’s no point in having them. As the number of cores goes up, there is a requirement for more bandwidth from the DRAM into the system.”
So a lot of memory is needed, because fundamentally the bandwidth is going up. But as the bandwidth continues to go up, even more memory is needed. “If you look at the rate at which memory has been increasing, as well as the memory attach rate per CPU, it’s not a pretty picture,” he said.
Intel has tried to solve some of these capacity-related issues with 3D XPoint, a phase-change memory that is supposed to provide flash-like capacity with DRAM-like performance.
“From a use-case perspective, just putting a nonvolatile memory (NVM) device on the memory channel doesn’t mean anything in the system,” Kochak said. “You have to be able to have that infrastructure in place to use it and make the host system, the platform, and the applications aware that there is this nonvolatile memory that you can do cool stuff with.”
Persistent memory
This is where NVDIMM enters the picture. A non-volatile DIMM (NVDIMM) operates as standard DRAM while also having the persistence of flash. A typical NVDIMM module incorporates DRAM, flash, control logic and an independent power source to retain critical in-memory data through unexpected power loss events, system crashes or planned shutdowns. That makes it part of a class of devices known as persistent memory.
Fig. 2: NVDIMM module. Source: HPE
NVDIMM-N puts flash on the DRAM bus as a back-up mechanism in case of power loss, Greenberg said. “A lot of these servers need to be able to restore their state if the power goes out. If you imagine it’s some sort of machine that has some financial capacity associated with it and the transaction starts right as the power is going down, it has to be able to come back up and remember what it was doing and complete the transaction. There are other, similar technologies for that also, such as battery-backed DIMMs, whereby a battery keeps data in DRAM on a disk.”
Kochak said while multiple companies such as Intel, Micron, Sony, Viking Technology, Samsung and Netlist already are shipping NVDIMMs, there’s also work going on at SNIA (the Storage Networking Industry Association) to promote the standard, and a separate group has formed to create a programming model for using persistent memories.
Over time, he believes applications developers are going to become aware of how to use persistent memory. “A lot of our software today is driven with, and developed with one key thing in mind—that DRAM is volatile, you can lose data that is in DRAM, so always design your system in a way that you can expect failures. To do that, there are a lot of things done such as checkpointing, logging, and journaling. And if you don’t have to do a lot of those operations, you can figure out ways to do things a lot faster and effectively with the same infrastructure and increase the application performance. Then you don’t need to do all of these other operations for dealing with volatile system memory because the system memory would be nonvolatile.”
This is hardly a standard plug-and-play approach, though. When it comes to designing in NVDIMMs, and connecting to the CPU or SoC, each one is different.
“Everybody wants something different, and it’s sort of a reflection a little bit on how each SoC is somewhat different,” said Greenberg. “The memory needs of each SoC are somewhat different, so they vary in all kinds of ways from the very high level. Do they support DIMMs or is the memory soldered down onto the board? How many channels of DIMM are supported? How fast? What types? What features need to be supported? Everybody has a little bit of a different need, and everybody is trying to optimize for performance for area and for power. The ability to give people that flexibility is really important.”
One of the main considerations for making the memories fit with the CPU is the continuing demand for increased memory bandwidth, he said. “We constantly have people asking us to help them to push the boundaries of frequency, and trying to increase that memory bus frequency as much as they can. There is also is a significant demand to increase the amount of DRAM that can be attached onto the bus. There are two new ways of making the DRAM bus faster in the data center enterprise storage space. One is to use DDR5 instead of DDR4. That’s the mainstream way of getting there. Shift your memory pipe from DDR4 to DDR5. And then if you really want to juice it in the capacity direction, that’s the point where you might start looking at NVDIMM – P.”
Will it work?
Designing systems with advanced memories isn’t simple. Ben Whitehead, storage product specialist in the emulation division at Mentor, a Siemens business, said he spends a lot of time talking to companies about their methodologies and their tools in order to create advanced memories.
“An NVDIMM has both technologies—it has DRAM technology, so it looks like a DRAM with the form factor of a DIMM, but it also has a non-volatile portion, whether it’s flash or something else. If you just take a flash drive like an SSD drive, the design and the challenges of verifying that design are far greater than the actual design itself because the characteristics of flash now are so complicated that it’s causing the design challenges there. You can come up with an architecture and design that meets that architecture requirement but to really flesh it out, to verify that it is doing what it needs to be doing is extremely complex.”
Problems range from garbage collection, write amplification, and where to write because depending on the state of the NAND flash at the time you do that write, it could cause many many more writes.
“If you do a write to a certain address, if that address is wearing out, then you need to move it,” said Whitehead. “So you erase that block and then write that data to a different location. That causes another disturb, which moves that block. So you can actually have 8, 10, 16 writes of a block just from the host writing one block. That kind of amplification of what the host sees and the traffic that the host is doing causes an amazing number of bottlenecks and performance issues on the backend that you really can’t predict.”
This requires a verification methodology to create those situations, Whitehead explained. “You don’t necessarily have to create them — usually these situations happen after six or eight hours of a drive being in operation. Once the drive fills up a few times, then you really start seeing these performance characteristics, these bottlenecks that start happening. The design itself isn’t straightforward, but you can create blocks that manage these functions, whether it’s in firmware or hardware. But to test out these blocks, and I’m just talking about an SSD right now, has become an extreme challenge in the industry.”
One approach is to gather everything it takes to write a flash memory and combine that with a DRAM so that the system looks like a DRAM. The downside is that doubles the complexity.
“You really need the ability to see the whole system in an emulator, or just some ability to capture that system as it is and not partition it into sub-blocks, where this sub-block or subsystem we think works really well,” he said. “You have to put it in the system to really know. We don’t have time to create a sub-system that is so solid and so IP-centric and so perfect that it will work inside of any system, because there’s just not enough time. We could create that, but the time, money and expense required would be too much. Companies have to get these blocks into a system level as fast as possible so that they can find the real bugs that only relate to that configuration and ship the product. That’s the challenge. There’s an enormous amount of analog physics that are going on in the NAND flash that you have to deal with at the system level.”
Conclusion
As new applications such as deep learning, machine learning and artificial intelligence applications push data centers to improve processing performance to handle greater amounts of data more quickly, the industry is devising interesting new ways to approach memory in server architecture. In this quickly evolving space, ecosystems are firming up where data should reside, how systems should be architected, and where the bottlenecks will be in the future.
Memory is an important component in this world, and at this point there is no single solution that addresses all concerns.
Related Stories
New Memories And Architectures Ahead
So far there is not widespread adoption, but many see change as inevitable.
China Unveils Memory Plans
Government and industry are investing tens of billions of dollars, but so far results are mixed.
New Embedded Memories Ahead
Options grow as new wave of MCUs demand more capable NVM.
What Are FeFETs?
How this new memory stacks up against existing non-volatile memory.
When will DDR5 come out?
Nothing about MRAM…?
HBM & HMC compete for the exact same socket. HMC should have been mentioned.