Memory Tradeoffs Intensify in AI, Automotive Applications

Why choosing memories and architecting them into systems is becoming much more difficult.

popularity

The push to do more processing at the edge is putting a strain on memory design, use models and configurations, leading to some complex tradeoffs in designs across a variety of markets.

The problem is these architectures are evolving alongside these new markets, and it isn’t always clear how data will move across these chips, between devices, and between systems. Chip architectures are becoming more complex to handle a flood of data in automotive and AI applications, but it isn’t always clear how that data will be prioritized, leaving design teams to strike a balance between merging and sharing memories to reduce costs, or adding more and different kinds of memories to improve performance and lower power.

And underlying all of this are safety and security requirements, which vary by design as well as by market. In some cases, such as various types of image sensors in cars (LiDAR, radar and cameras), there is so much data that it has to be processed locally. The same is true in AI chips, where 100X improvement in performance is essential.

That has led to a variety of different approaches, from near-memory computing, where small memories are scattered around a die or package next to a variety of processors, to in-memory computing where the movement of data is minimized. The goal of these approaches is to eliminate memory bottlenecks by reducing the number of loads and stores, and also to do so using less energy.

“In-memory computing may be analog, digital, or both,” said Dave Pursley, senior principal product manager in Cadence’s Digital & Signoff Group. “So while the idea of doing computation in memory may be a growing trend, what actually happens in that computation seems to vary widely.”

New memories
Despite all of this churn across markets, on-chip SRAM and off-chip DRAM remain the de facto leaders. Experts have predicted the imminent death of DRAM for years, but it still remains the most cost-effective and reliable choice. DRAM is high density, has a relatively simple architecture using a capacitor as a storage element, low latency and high performance. It also has almost infinite access endurance and uses relatively low power.

And while DRAM density increases are slowing, new architectures such as HBM2 allow increases in density vertically by stacking modules instead of using DIMMs. That approach also allows DRAM to be located closer to processing elements that are scattered around a die or around a package.

SRAM, meanwhile, is expensive and limited in density, but extremely fast and proven over the years. The challenge with on-chip memories is either distributing them or sharing them, and in some cases, adding redundancy where safety is involved.

“All of these requirements have an impact on the type of memories, number of memories, and the tradeoffs between on-chip and off-chip memory, as well as the complexity of the interconnects to access each memory,” said Ryan Lim, senior principal IoT architect at Arm.

Low-power memory
One of the key issues with memory is power, and there are multiple factors that can play a role in how much power the various memory types and configurations consume. For example, moving data in and out of memory at 7nm could require more power because of the RC delay in wires. That, in turn, also can generate heat, which could disrupt the integrity of signals moving in and out of memory.

At the same time, a slower data path off-chip using high-bandwidth memory could save power and be just as fast as higher-speed GDDR6. How those decisions are made depends on a variety of factors, from average selling price of the end device to the way that memory ultimately will be used.

There also are extremely low-power versions of memories that are targeted at handheld devices, which increasingly include all edge devices with a battery.

“These memories are extremely power-efficient, operating in a range that optimizes both power usage and data rate for battery-operated devices,” said Steven Woo, fellow and distinguished inventor at Rambus. “They also can operate in multiple modes, allowing them to consume very little power to match the needs of products like phones and tablets when they are in standby, and quickly transitioning to higher-performance/higher-power modes when active processing is needed.”

Low-power memories also support multiple packaging options, allowing them to be stacked with cell phone processors to achieve tight form factors typical of mobile phones, as well as being placed onto PCBs to support higher-capacity configurations typical of tablets and other consumer devices.

Not surprisingly, developing these memories is a challenge. “When low-power memories are being designed, there’s a range of data rates that they support, and those tend to be pretty high data rates, relatively speaking, for a low-power memory,” said Woo. “This is usually driven by one or two main uses, so it’s got to be some industry that’s big and it’s got to be an industry that has enough money to enable a new DRAM. Historically, of course, that’s been the cell phone market. If you talk to the various cell phone manufacturers, they want more performance and they want better power efficiency because they want to be able to extend battery life. For other companies that want to adopt low-power memories, they say, ‘Thank goodness there’s somebody else out there that’s helping enable this so I’ll count on the supply being there and I’ll use it in the same way that they’re guaranteeing it can be used.’ These companies will tend to try and run the memories at about the same data rates.”

Usually these memories are qualified to run at a few different data rates, which are usually relatively similar. “There might be a data rate of 4.2 gigabits per second, another one that’s 3.2 and it’s the same part,” he explained. “What that does is allow the memory manufacturer, as they are manufacturing all these parts, to do what’s called binning. This happens sometimes when some of the parts don’t yield, and/or won’t run at full speed, but it gives them a place to sell them because some people will buy the lower-performing parts at a cheaper price. Binning allows that market dynamic to happen. And it tends to be the case that people will run these parts how they’re qualified, and they tend to all be in that range of performance.”

AI and memory
Memory plays a big role in AI, and AI is playing a big role in almost all new technology. But there are AI chips, and there are applications of AI inside of chips, and that helps account for some of the different ways that memory is being used. For blazing-fast speed and the lowest power, the best approach is to put everything on the same die. This doesn’t always work, because space is limited, but it helps explain why AI chips for the data center and training applications are larger than many other kinds of chips that are going to be deployed in end-point devices for inference applications. The other approach is to move some of the memory off-chip and to either improve the throughput and reduce the distance to memory through design, or to limit the off-chip data flow.

In either case, the off-chip memory race largely boils down to two flavors of DRAM—GDDR and HBM.

“GDDR, from an engineering and manufacturing standpoint, looks a lot like other kinds of DRAMs like DDR and LPDDR,” said Woo. “You can put it on a standard PCB, can route to it and use similar manufacturing processes. HBM is newer and it involves stacking and silicon interposers, because HBM has lots of connections that run at a slower speed—wide and slow. Each HBM stack will have a thousand connections, so high densities of interconnect are needed. This is much more than what a PCB can handle. That’s why some companies are using silicon interposers, because you can etch these lines really close to each other. It’s a lot like on-chip connections, so you can definitely get more connections in there.”

HBM typically is adopted for the highest performance and best power efficiency, but it costs more and requires more engineering time and skill. With GDDR, there are not nearly as many interconnections between the DRAM and the processor, but they do run a lot faster, which can impact signal integrity.


Fig. 1: Tradeoffs in various types of DRAM. Source: Rambus

PPA
Power, performance and area are still the key drivers, despite the swirl of architectural changes and new technologies.

“All three are extremely important but a lot depends on the applications. noted Farzad Zarrinfar, managing director of the IP division at Mentor, a Siemens Business. “For example, if you have a portable application, power is very important. But even power itself is split to two pieces—dynamic and static. Some portion of power is dynamic power. Some portion of power is static power. If the application is for wireless communication, if there is a lot of computation happening, dynamic power is very important. But if it is in some kind of wearable design that by nature is going to sleep, wake up and run, and go back to sleep, the static/leakage power is very important.”

Features such as transparent light sleep allow designers to dramatically reduce leakage. Here, the memory banks that are not working go to source bias mode to reduce leakage, while other banks being directly accessed are always working. In deep sleep portions of a design, data can be retained by implementing techniques for power management, managing Vdd, and minimizing leakage. If the data doesn’t need to be retained, shut down mode further reduces leakage.

Everything relating to power efficiency is also of utmost importance in automotive. “In electric cars, the life of batteries is very important, so power consumption has been critical,” Zarrinfar said. “People want to have a really uniform characteristic going from -40°C all the way to 125°C, or even in some cases 150°C. They don’t want to have explosive growth in leakage, in high temperature, and they want to keep it in the linear range as much as they can. Again, we must pay a lot of attention to power consumption and leakage during the full range of temperature. That’s very important,”

No matter the application area, power remains a primary consideration. “We are seeing that with SoC designs moving forward to smaller geometries,” he said. “The consumption of memories is increasing, as well as embedded memory content. Frequently, we are seeing over 50% of the die is memory these days. So people have to pay attention to power consumption of the memory absolutely.”

 

pastedGraphic.png

Fig. 2: Total Average Die Area Partitioning 1999-2023 Source: Semico Research

 

Conclusion
Despite a slew of revolutionary technologies and innovative architectures, memory remains a core piece of any design. And while there are new memory types on the horizon, such as phase-change and spin-torque, the bulk of the market remains firmly planted in what has been proven under a variety of conditions and over years of use and sometimes abuse. The biggest changes are in how existing memories are prioritized, shared, positioned within a design, and ultimately how they are used. And while that may sound like a straightforward problem to solve, it isn’t.

“Selecting the right memory solution is often the most critical decision for obtaining the optimal system performance,” said Vadhiraj Sankaranarayanan, senior technical marketing manager at Synopsys, in a recently-published white paper. That’s easier said than done.

Related Stories
Making Sense Of DRAM
What kind of memory is used where and why.
In-Memory Vs. Near-Memory Computing
New approaches are competing for attention as scaling benefits diminish.
Using Memory Differently
Optimizing complex chips requires decisions about overall system architecture, and memory is a key variable.
Embedded Phase-Change Memory Emerges
What could make this memory type stand out from the next-gen memory crowd.



Leave a Reply


(Note: This name will be displayed publicly)