Cost makes it difficult to supplant DRAM and NAND, but the number of options around those memories continues to grow.
System architects increasingly are developing custom memory architectures based upon specific use cases, adding to the complexity of the design process even though the basic memory building blocks have been around for more than half a century.
The number of tradeoffs has skyrocketed along with the volume of data. Memory bandwidth is now a gating factor for applications, and traditional memory types cannot keep pace with improvements in raw compute capability. In fact, processors and memory are, in many ways, a study in contrasts. But there are ways around that problem, and the the onus is on the system architect to figure out the best solution.
“Each new generation of processor wants to compute more quickly, and to be more power-efficient,” said Steven Woo, fellow and distinguished inventor at Rambus. “Two ways to do that are to have more processing engines, and to run them at higher speeds. Processors need transistors to switch very quickly, and they need lots of them to implement many processing engines. Moore’s Law has given us reliable increases in the number of transistors, which enables lots of compute logic.”
In the highest-performing processors — those used in servers and AI/ML applications — the volume of data moving through those chips puts a demand on transistors to switch quickly and to move data back and forth to memory fast enough to keep those transistors working. But those types of systems are leaky, which in turn generates heat and impacts performance.
Memory is a key piece of this equation, and each new generation of DRAM needs to provide higher capacity and more bandwidth to feed the processing engines. Memories also need to be more power-efficient, as well.
“DRAMs are less expensive than high-performance processors, which impacts design and manufacturing methods that can be used to make them,” said Woo. “The building blocks in a DRAM are designed to retain charge, which represents the 1s and 0s of the data being stored. Unlike processors, which use fast-switching transistors that also leak, DRAMs can’t use the same design approaches because they can’t afford to have charge leakage or data will be lost. DRAM cores also have limitations in how much bandwidth they can economically provide. The demand for higher bandwidths drives a desire for higher DRAM core frequencies, but it’s become impractical to hit the desired core frequencies in the highest-bandwidth DRAMs today. This has caused changes in how DRAM cores are organized (bank grouping is one example), and in the addition of new timing constraints that dictate the conditions under which full bandwidth can be achieved.”
Fig. 1: HBM physical layer showing high-speed interconnect over multiple channels. Source: Rambus
It’s not just about hardware
That’s part of the picture. Software has a big impact on memory choices and power efficiency as well.
“Take a chip that has 4 megabits of on-chip memory,” said Paul Hill, director of product marketing at Adesto. “Software gets bigger. It rarely gets smaller, so the software engineer is now struggling to get his application driven by his customer requirements into that 4 megabits of on-chip memory launching. Now you think about adding in over-the-air updates. What happens when the custom application code requires 8 megabits or 16 megabits? This is why they still have to go out and use external memory to supplement the on-chip memory.”
Typically code that is downloaded in over-the-air updates is used to patch existing software, but rarely is any of it deleted. This can create software bloat, which is why systems tend to slow over time, and it has a direct impact on how much memory is required.
“Any MCU uses over-the-air updates for memory for the primary application, but to do an over-the-air update, you now need to download another image, which doubles the amount of memory needed. They probably also want to retain a factory image, which is the recovery image in case something goes wrong. That’s three times the original size. So now your image needs 12 megabits or 16 megabits of code space. External memory is still very much part of those edge devices that we are proliferating. And the more amperage the system draws, the lower the voltage drop, and that starts to cause problems. This is why all of our systems now operate over a wider voltage range than in the past. Irrespective of what voltage your system starts at, it still works consistently over a 1.65 to 3.6 volt range. We’re now looking at 1.2 volt capabilities because the adoption 1.2 volts is starting to make sense since you consume 40% less power than you would at say 1.8 volts. And for those applications that are really energy-conscious, 1.2 volts does make sense even if the number of host applications that can work at 1.2 volts is still building.”
These challenges only grow in intensity when leading-edge applications such as AI, machine learning and deep learning enter the picture. Each of these relies on moving massive amounts of data through a chip, and for rapid movement of data back and forth between processing elements such as CPUs, GPUs, DSPs and FPGAs/eFPGAs, and both SRAMs and DRAMs, particularly in the data training phase.
“There is an increased need to analyze data in real time for applications such as self-driving cars in order to, for instance, update route information based on traffic, or make real time decision in case of sudden obstacles,” said Munish Goyal, senior product engineering manager at Mentor, a Siemens Business. “Most of the upcoming applications that are based on AI require computers to process stored and incoming data at much higher speeds, and HBM (high bandwidth memory) has emerged as one of the technologies that promises to address demands of these data intensive applications.”
HBM enables a boost in DRAM bandwidth by fundamentally changing the way the DRAMs are configured. HBM is a high-speed, system in package (SiP) that incorporates stacks of vertically interconnected DRAM chips and a wide interface to enable more storage capacity and data bandwidth than memory products using conventional wire bonding-based packages.
That certainly helps with performance by opening more channels for data. But it doesn’t address a fundamental problem in memory, which is that it cannot keep up with raw compute capability due to semiconductor physics.
“The guts of the DRAM bit-cell — a capacitor controlled by a switch and feeding into an amplifier — is fundamentally an analog device and not a digital one,” said Marc Greenberg, product marketing group director for Cadence’s IP Group. “As such, a shrinking process geometry allows the circuit to potentially get smaller, but that doesn’t make it significantly faster. DRAM has been in production for almost 50 years at this point, and the fundamental architecture of the core of the DRAM device has remained unchanged during all that time. Similarly, NAND Flash has been around for more than 30 years and has grown greatly in capacity, but not nearly as much in speed.”
Together, the market for DRAM and NAND is more than $100 Billion, and there are plenty of companies, universities and research institutions looking at alternative devices to replace either or both of these technologies. The challenge for anyone looking to replace DRAM or NAND is there are decades and billions of dollars of investment in each of these technologies, not to mention an entire global semiconductor industry that has grown up using these technologies.
“Anything non-DDR, LPDDR, GDDR or HBM is not mainstream,” said Graham Allan, senior product marketing manager for DDR PHYs at Synopsys. “It’s very difficult to product anything that can compete with DRAM. This is why you’re starting to see DDR-like interfaces on new memories. If it’s not a large customer driving it, you want some way to help offset the cost.”
The usefulness of a memory device is measured in the areas of capacity, speed/bandwidth, latency, ease of use, reliability, cost, power, retention/lifetime and manufacturability. Any new device needs to be significantly better in at least one area than what exists today to gain traction, without giving up much in any of the other areas. So despite the massive marketing push for the Hybrid Memory Cube (HMC) — memory stacked on logic and connected by through-silicon vias — the technology achieved success only in narrow market slices.
“HMC reduced the latency of DRAM, which made it popular with networking companies,” said Allan. “But it has been hard to find opportunities beyond very niche markets because there is no way to compete on cost.”
New twists ahead
Conventional DRAM isn’t standing still, though. “The speed increases that we’ve seen in DRAM have been achieved through changing the physical interface (PHY) between the SoC or CPU to the DRAM,” said Cadence’s Greenberg. “The increases come from changing the architecture of the DRAM device, as opposed to changing the core of it, and parallelizing access into the DRAM core. While we typically increase the speed of the interface from DDR4 to DDR5, or LPDDR4 to LPDDR5, or adopt new signaling with an interface like GDDR6 or take a 2.5D approach with a technology like HBM2E, these speed increases come from more parallel access into the DRAM array, not by changing the DRAM array itself.”
Changing the PHY basically involves changing the memory class, such as moving from DDR to an LPDDR to a GDDR6 to an HBM. “Each time you make one of those changes, you get a higher bandwidth interface,” he said. “But the fundamental guts of the memory doesn’t change that much. The guts of what’s happening in 80% or 90% of the die is not really fundamentally that different between those different memory classes. It’s just how you interface to that changes.”
When it comes to some of the prevailing ways that people are talking about changing the DRAM array itself, this is a problem with many dimensions to solve. There’s the issue of capacity and cost.
“You can make faster DRAM arrays,” Greenberg said. “They just tend to cost more than what people are really willing to pay for them. You also can change out the memory completely. If you want a memory that’s really fast, SRAM is still there. It’s very fast, but nobody wants to pay for that, and you can’t put much capacity on an SRAM. And looking at novel memory architectures, which seem to be coming out relatively frequently now, we’ve seen what are often called novel, non-volatile memory products. While these are interesting concepts, so far none has been able to push DRAM or NAND flash off of their respective perches. That could potentially change. There’s promise in carbon nanotubes and in some of these new technologies. It just hasn’t been realized on a global industrial manufacturing scale yet.”
Much of this depends upon the end application, because there are multiple facets that need to be optimized, including bandwidth, capacity, power and cost. Those, in turn, affect latency, the number of bits that can be accessed on a bus, and the overall system power budget.
“You’re obviously trying to optimize all of those things simultaneously,” Greenberg said. “The choices are going to be different, depending on which one of those things is most important to you. There are further things that also differentiate between DRAM and flash, for example, whether it’s volatile, and the amount of latency incurred. Those also play into the equation, and those would play into the equation when comparing DRAM, flash, and the emerging novel non-volatile memory architectures.”
The answers likely will be very different for a server plugged into a wall and a mobile device that runs on a battery. And the choices are different depending upon an AI training chip architecture, and whether an inferencing chip is used in a data center or in a car or some other edge application.
One of the rules of thumb about new memory architectures is that it takes 20 years to bring a new memory architecture to market. “That’s been borne out quite a few times in the past, and really stems from the fact that making something in a lab is very different from making it in a production environment where you’ve got to satisfy a global need of billions of units a year, and being able to do it reliably, manufacturably, and at the appropriate cost point,” Greenberg noted. “This is where some of the non-traditional memory technologies stumble. And that’s really the point where, if we look at the 30 or 50 years of investment in some of these other memory technologies, a lot of that investment had been in the manufacturability of those devices. We are clearly standing on the shoulders of giants, but are we standing behind the giant where we can’t really see forward because of because of all the investment that’s been made in the existing technologies? It’s an interesting dilemma. If you could entirely break the mold and go to new memory technology, the market might accept it. But that new memory technology would have to be substantially better than the existing technologies in at least one or two of the dimensions, and then at least not worse in any of the other dimensions. So it’s a tall order.”
Conclusion
Memory tradeoffs are becoming more complex, but so far traditional memory types are proving difficult to topple. But optimizing memory for some of these new architectures also is becoming much more complicated.
The challenge created by timing/access constraints in DRAM is that processors, which want to achieve the highest performance, must now be aware of their data access patterns and the architectures of DRAMs in order to take full advantage of their potential, and to get the most bandwidth out of them. Understanding and adhering to the growing set of constraints is critical for processor architects going forward, Rambus’ Woo said.
—Ed Sperling contributed to this report.
Related Stories
Tricky Tradeoffs For LPDDR5
New memory offers better performance, but that still doesn’t make the choice simple.
Solving The Memory Bottleneck
Moving large amounts of data around a system is no longer the path to success. It is too slow and consumes too much power. It is time to flip the equation.
Will In-Memory Processing Work?
Changes that sidestep von Neumann architecture could be key to low-power ML hardware.
Using Memory Differently To Boost Speed
Getting data in and out of memory faster is adding some unexpected challenges.
In-Memory Computing Challenges Come Into Focus
Researchers digging into ways around the von Neumann bottleneck.
GDDR6 Drilldown: Applications, Tradeoffs And Specs
How GDDR6 compares to other memory types and where it works best.
Machine Learning Inferencing At The Edge
How designing ML chips differs from other types of processors.
Leave a Reply