Plumbing problems of the past continue to haunt chipmakers as gap grows between processor and memory speed.
If humans ever do create a genuinely self-aware artificial intelligence, it may well exhibit the frustration of waiting for data arrive.
The access bandwidth of DRAM-based computer memory has improved by a factor of 20x over the past two decades. Capacity increased 128x during the same period. But latency improved only 1.3x, according to Kevin Chang, a researcher at Carnegie Mellon University, who proposed a new data pathway to address the problem.
Modern computers, especially data-center servers that skew heavily toward in-memory databases, data-intensive analytics, and increasingly toward machine-learning and deep-neural-network training functions, depend on large amounts of high-speed, high capacity memory to keep the wheels turning. Despite years of effort by researchers looking for better, faster alternatives, DRAM remains a near-universal choice when performance is the priority.
That helps explain the surge in DRAM sales this year, despite limited supply that helped drive average selling prices up 74% this year, according to IC Insights. Skyrocketing prices drove the DRAM market to generate a record $72 billion in revenue, and it drove total revenue for the IC market up 22%. Without the extra boost from DRAM prices, which rose 111% in 12 months, growth in the overall IC market would have reached only 9% in 2017, compared to 4% in 2016, the IC Insights report said.
Those are impressive numbers for DRAM, a well-worn technology that many people want to replace because it’s not as fast as a processor. There is a long list of current or pending alternatives, but experts seem to consider them half measures that supplant the price/performance benefits of DRAM. It leaves a performance gap with CPUs, as well, even with planned improvements in DRAM performance and new DRAM architectures such as HBM2 and the Hybrid Memory Cube.
DDR5, the next-gen DRAM specification from JEDEC, will have twice the density and twice the bandwidth of DDR4, which may speed things up a bit, according to Steven Woo, vice president of systems and solutions and distinguished inventor in Rambus’ Office of the CTO.
That will be important for performance-intensive, time-sensitive FinTech applications and other high-end analytic, HPC and supercomputing applications—especially when combined with specialized accelerators.
“There is clearly a need for more memory bandwidth and more memory capacity, but DDR5 won’t be enough by itself and it’s not clear which of several other approaches may take off,” Woo said. “We’re already seeing a lot of processing cycles moving away from traditional x86 processes – more mining of cryptocurrencies and training neural networks, moving toward GPU and specialized silicon, or even morphing of architectures to shift some of the processing closer to the storage in the datacenter or as edge or fog computing.”
Fig. 1: Introduction of new standards. Source: Cadence
GPUs are the clear favorite for training machine learning applications on neural networks, but chip- and systems makers are experimenting with slightly out-of-the-box options like GDDR5, a synchronous graphics RAM developed for game consoles, graphics cards and HPC, which is how Nvidia is using it, according to Chris Rowen, CEO of Babblabs and a strategic advisor for Stanford University’s System X.
Fig 2: Where memory fits in the hierarchy of chips. Source: Rambus
HBM2, which is manufactured by SK Hynix and Samsung, puts memory closer to the processor than GDDR5 by placing several high-speed DRAM chips on top of a layer that adds logic processing and an interposer that provides high-speed data links to the processor. HBM2 is an essential element in 2.5D packaging, where blazing-fast speed is essential. HBM2 is a JEDEC standard rival to the Hybrid Memory Cube, developed by IBM and Micron, which uses through-silicon vias to connect various memory layers to a base logic level.
Optical connections using silicon photonics also could speed things along. So far, most of the applications of silicon photonics have been between server racks and storage inside of data centers, and inside of high-speed networking devices. Industry experts expect this will migrate closer to the processor over the next few years, particularly as the packaging is fully vetted and design flows expand to include this technology. The advantages of optical are very low heat and extremely fast speeds, but the light waves still need to be converted into electrical signals for storing and processing data.
There also are new interconnect standards such as Gen-Z, CCIX, OpenCAPI. There also are new memory types, such as ReRAM, Intel’s phase-changing 3D Xpoint, , and magnetic phase-changing MRAM.
NVDIMM is slower but much higher capacity, made non-volatile with the addition of a battery or super-capacitor, which lets them cache far more data than ordinary DRAM while using less power and guaranteeing they won’t lose transaction data if the power goes out. Micron and Rambus are among the chipmakers backing NVDIMM, which could grow in sales from $72.6 million during 2017 and could rise to $184 million by 2025, according to an August report from Transparency Market Research.
The range of options may be confusing, but will make it easier to tailor memory performance to machine learning, or large in-memory databases or video streaming – each of which has a different set of bottlenecks, Rowen said. “There are mainstream choices you can make to push the bandwidth curve – DDR3, DDR4, DDR5, but you can try others as well to match memory bandwidth with what you’re trying to do.”
The whole problem may be easier than it looks for people willing to learn to write code that will directly control NAND memory, rather than going through the layers of protocol and interface that makes NAND look like a hard drive and conceals how hard it is to write data to it, Rowen said. “With the low cost, capacity and availability, I see a lot of opportunity to make flash storage subsume more and more of the storage hierarchy.”
DRAM is chill
Every memory architecture has its own advantages to consider, but they all have at least one disadvantage shared by every other integrated circuit – they generate heat. If you could reliably pull out the heat, you could pack memory, processors, graphics co-processors and memory much more tightly, saving space that can be used for more servers and improving performance by reducing lag between memory and all the other components of the system, according to Craig Hampel, chief scientist for Rambus’ Memory and Interface Division.
Liquid cooling – bathing components in dielectric mineral oil – cut cooling costs for the HPC cluster at Hong Kong Bitcoin miner Asicminer by 97% and cut its space requirements by 90% according to a 2014 article in IEEE Spectrum.
Rambus has been working with Microsoft since 2015 on memory for quantum computing as part of Microsoft’s effort to build a topological quantum computer. Since the quantum processor has to operate at cryogenic temperatures – below -292° F/-180°C or 93.15K — so did the DRAM Rambus was testing for the project. By the time Rambus expanded the program in April, Hampel said, the company was already convinced that serious performance benefits could come out of the cold.
Fig. 3: Cyrogenic computing and storage. Source: Rambus
When CMOS gets cold enough, for example, data leaks from a CMOS chip stopped completely. It became almost non-volatile. Performance increased to the point that memory could catch up to the speed of processors, eliminating one of the most stubborn bottlenecks in the IC industry. At very cold temperatures, between 4K and 7K, wires effectively superconduct, allowing the chip to communicate over long distances using very little energy. (See related video.)
Cryogenic systems have an added benefit. They can suck the heat out of a stack of memory chips a lot faster than A/C, allowing far greater density of stacked, cubed or otherwise assembled processors to cooperate efficiently, Hampel said. “Taking out the heat lets you compress the size of the server rack as much as 70%, which means the density per cubic foot of data center improves. That makes them easier to maintain and easier to place in areas you couldn’t reach before.”
More importantly, if the boost in efficiency at the processor level is mostly consistent in other areas of the data center, cryogenic systems could reduce the need for more data centers by making existing ones more efficient from both a cost and compute-power basis.
And it doesn’t even have to be really cold to get most of the benefit; cooling memory to about 77 Kelvin (-321°F/-196°C) delivers most of the efficiency.
“Liquid nitrogen is cheap – tens of cents per gallon – and the cost curve doesn’t get really steep until you approach supercooling, around 4 Kelvin,” Hampel said. “Down to about 50 Kelvin, it’s not that expensive.”
Processor proximity
Supercooling may extend the life of DRAM in the datacenter, but none of the chips or standards on offer will be able to stand up to the flow of data as the industry moves from hyperscale to zettascale, according to Jeroen Dorgelo, director of strategy for Marvell’s Storage Group. DRAM is fast but power hungry, he said. NAND isn’t fast enough to scale, and most of the cutting-edge memories – 3D XPoint, MRAM, ReRAM – also are not able to scale sufficiently.
What most datacenters haven’t dealt with, however, is the need to become far more distributed than they are now. That helps reduce the amount of data that has to be sent long distances to be processed, while leaving most of the heavy lifting in the datacenter, according to Yaniv Kopelman, networking CTO of the Connectivity, Storage and Infrastructure Business Unit at Marvell.
The pressure of data from social media, the IoT and nearly everywhere else is putting pressure on datacenters to spread out – to build two or three super-scale datacenters across the country rather than one hyperscale monster in the center of it, said Shane Rau, datacenter hardware analyst at IDC.
“It’s a different scale, but the question is still about latency,” Rau said. “If I have a datacenter nearer to me on the edge, so to speak, I don’t have to move all my data as far, and I can have some of the processing done a little on my laptop, more in the local datacenter, so it’s a little pre-digested by the time it gets where it’s going. A lot of people are talking about putting processing power where the storage is to balance out the bottlenecks on a device-by-device basis. Right now the issue at scale seems more about having datacenters at the edge doing some of the work, part way between where the data originates and the place it is ultimately going.”
Related Stories
Data Centers Turn To New Memories
DDR5, NVDIMMs, SGRAM, 3D XPoint add more options, but solutions may be a mix and much more complex.
Tech Talk: Cryogenic DRAM
What happens when you use DRAM at extremely low temperatures?
Memory Test Challenges, Opportunities
Business is booming in advanced memory chips, but it’s getting tougher to test them.
HBM Upstages DDR In Bandwidth, Power
Design challenges and tool flow gaps emerge, but so do real-world PPA metrics.
What’s Next For DRAM? (Feb 2016)
As DRAM scaling runs out of steam, vendors begin looking at alternative packaging and new memory types and architectures.
I do not know why so many act like there is nothing out there but server.
Server is a niche, it’s not representative, it’s not very demanding, it’s insane to focus on it so much. Has high margins and that’s why so many pretend that it must define the roadmap for the entire industry.
Server is low efficiency, very high cost, has minimal thermal and volumetric constrains and this kind of article should never focus on server.
Server is what , maybe 30% of the DRAM market in revenue and less in bits – and remember that the bulk of the server units are 1 and 2P lower end machines
And ofc server is in a bubble right now as new areas of compute are just starting to migrate towards the edge.
The real challenges are not server, you got glasses as the next form factor for personal computing, then there is IoT, robots and so on. Server will need to follow as the solutions adopted in these segments must be 1000x more efficient.
As for DRAM, perf and power metrics aside, cost doesn’t scale well anymore, DRAM has to be replaced as it’s not going to be viable anymore.
Long term, processing and memory/storage must be integrated and for that, everything has to go 3D