How To Choose The Right Memory

Different types and approaches can have a big impact on cost, power, bandwidth and latency.

popularity

When it comes to designing memory, there is no such thing as one size fits all. And given the long list of memory types and usage scenarios, system architects must be absolutely clear on the system requirements for their application.

A first decision is whether or not to put the memory on the logic die as part of the SoC, or keep it as off-chip memory.

“The tradeoff between latency and throughput is critical, and the cost of power is monstrous,” said Patrick Soheili, vice president of business and corporate development at eSilicon. “Every time you move from one plane to another, it’s a factor of 100X. That applies to on-chip versus off-chip memory, as well. If you can connect it all together one one chip, that’s always the best.”

For that reason and others, the first choice of chipmakers is to put as much RAM or flash on the logic die. But in most cases, it’s not enough. Even microcontrollers, which in the past were defined as processing elements with on-chip memory, have begun adding off-chip supplemental memory for higher-end applications.

“When the size of memory on the logic die exceeds what can be produced economically, then off-chip memory is the obvious choice,” observed Marc Greenberg, group director of product marketing for DDR, HBM, flash/storage and MIPI IP at Cadence. “There’s a vibrant array of low-cost, low-power memories based on the SPI (Serial Peripheral Interface) bus of several types from several manufacturers, including memories with automotive speed grades. The SPI bus is speeding up and adding width.”

In fact, Cadence is seeing a lot of demand for 200MHz Octal-SPI IP interfaces — both controllers and PHYs.

To better understand how people are looking at the power design for memories in automotive or other applications, it helps to take a step back and try to understand what the problem statement is in terms of the overall bandwidth for speed and power consumption in memories, noted Navraj Nandra, senior director of marketing for the DesignWare Analog and MSIP Solutions Group at Synopsys. “What’s happening in terms of the application requirements is there are people pushing the microprocessor/CPU performance, and that is requiring memory capacity and memory bandwidth. But you can’t have both at the same time even though that’s what the applications demand.”

The critical tradeoffs in memory are bandwidth, latency, power consumption, capacity and cost. Engineers sometimes forget about the cost part, but it drives a lot of decision points of implementation, Nandra said.

The capacity versus the speed of the memories must be considered, and each type of memory has different trade-offs. For example, if the application is driven by speed or bandwidth gigabits per second, then HBM may be the way to go because it has much higher bandwidth per pin than a DDR memory. If the application is dominated by capacity issues, such as how many gigabytes of storage can be accommodated on the memory interface, then DDR may be a better option.

“DDR gives the capacity and HBM gives the bandwidth,” Nandra said. “If the question is about power consumption, it’s better with something like a low power DDR compared to say HBM or GDDR. With GDDR or HBM you get performance. With DDR and LPDDR you really get power savings.”

Embedded memory
If the memory is embedded as part of an SoC, there are a number of architectural considerations.

“If you look at leakage, for example, the leakage of SRAM is predominantly bit cells,” said Arm Fellow Rob Aitken. “The periphery contributes, but you can fool around with it in the design process. If you have a certain number of bit cells, you’re going to have a certain leakage, so you have to start at that point and work around it. Depending on how ambitious you get, there are circuit design tricks that will let you get rid of some of that, usually at the expense of performance. Some of these include well biasing or power gating of various descriptions, and combinations of these help to save on leakage.”

To this point, it’s important to understand that if a certain number of bits are needed, and there is this much leakage, the system architect has to figure out what configuration works best for the various memories they have. “Considerations include things like the bit line length, said Aitken. “The shorter the bit line, in general, the faster the memory. This is because the way SRAM sensing works, essentially the individual bit cell has to discharge the bit line, and when it has discharged it by enough, then the sense amplifier fires and says, ‘Oh, there’s a signal there.’ So the more bit line it has to discharge, the longer it takes.”

Fortunately, for any given memory configuration, choices can be made within a band of possibilities.

“I can have a memory with a lot of short bit lines, or a small number of longer bit lines, and that’s still the exact same number of words and bits,” he explained. “It’s just the way the columns are arranged in the memory. So there’s this architecture level playing around that can be done, and memory generators let you do that. You can play around and say, ‘In this case I’d like to have column mux 8 for this,’ which is 8 bit lines going to each output bit because that gives a nice balance of speed and power. Or you might say that’s actually faster than you need, so you can go with a column mux of 4 because it gives better power and okay speed. You wind up going through that exercise as an SoC architect to see what’s the best way to implement things, and they also fit into the floorplan differently because some of them are more square, and some of them are more rectangular.”

Cost matters
While power has dominated many design decisions for some time, cost is another critical element in this equation.

“Just from the memory perspective, you’re looking at whether you want to have more area, which equates to a higher cost,” said Aitken. “There are some second-order tradeoffs that become important with large amounts of memory, such as the ratio of bit cell area to periphery area. The bit cell area is essentially fixed once you pick one. But the periphery area around it can get bigger or smaller, which generally makes it faster or lower power. When you do that, it adds cost to the SoC, which may or may not show up depending on how many of a given instance you have. Often when you look at an SoC, there are a few very large instances that dominate the area, so small changes in periphery area or performance have a huge impact on those. There are a bunch of instances that really don’t matter because if they doubled in size nobody would notice. And then there’s another often small set of architecturally significant instances where these things have some sort of ultimate performance requirements or ultra-low voltage or some aspect of your chip that is important. In those cases, the area cost argument is less important typically than if it is meeting the speed criteria or leakage criteria or whatever else is dominant.”

Automotive priorities
This is particularly relevant in the automotive market, where cost is a critical element in deciding which components to use.

“Power is important but it hasn’t shown up as a super critical factor in some other markets,” said Frank Ferro, senior director of product management at Rambus. “Power is always important to everyone, but the tradeoff in automotive systems is really cost versus bandwidth. If I had to rank them, I would say performance and price are neck and neck. Power would be a distant third.”

Automotive is one of the hot markets for chip design today. For self driving cars, the number of sensors that are being deployed in the cars to get feedback about all the real-time information is exploding, and for a car to do different levels of driver assistance requires multiple sensors feeding into advanced logic. The amount of data that needs to be processed is enormous, because in the case of vision and radar this data is being streamed through the sensor network.

“Chipmakers are looking at memory systems that can handle bandwidth greater than 100 Gbps and higher as you get into different levels of driver assisted cars, and ultimately self-driving cars,” said Ferro. “In order to do that, the number of memory choices starts to narrow down quite a bit in terms of what can provide you with the necessary bandwidth to process all that data that is coming in.”

He said that some of the early ADAS system designs included both DDR4 and LPDDR4 because that was what was available at the time. Both have advantages and disadvantages. “The DDR4 is obviously the cheapest option available, and those are in the highest-volume productions,” he said. “They are certainly very cost-effective and very well understood. Doing error correction on DDR4 is simpler and well understood. LPDDR4 was also an option that was used, as well.”

Going forward, Ferro expects a variety of memory types to coexist in different systems. “If they are heavily cost-driven, then they are going to be looking at something like DDR or maybe even LPDDR4. But if they are heavily bandwidth-driven, then they will be looking at something like HBM or GDDR. It’s really a function of where you are in your architecture stage. There are different ADAS levels and what’s required for the system and timeframe, too, because when you are shipping is important. If you are getting a system shipping this year, it would have a different solution than systems being developed for next year or the year after that. Those are all the things that we are seeing on the continuum of time-to-market versus cost.”

Then, on the high performance side, the bandwidth-power tradeoff is the key challenge from a system-design standpoint. “How do you get more bandwidth to fit in a reasonable area on your chip with reasonable power? For example, if you have an HBM, it is very efficient from a power and area standpoint because it uses 3D stacking technology, so from a power efficiency point of view, HBM is fantastic. And from an area standpoint, one HBM stack takes up a relatively small amount of space, so that’s a really nice looking solution from a power performance perspective. You get great density, you get great power, low power, within a small area,” Ferro said.

Others agree. “GDDR is faster than DRAM for GPUs, but with HBM there is no comparison,” said Tien Shiah, HBM product marketing manager at Samsung. “HBM is the fastest form of memory with a micropillar grid array. You can have 4- or 8-high stacks, which gives you 1,024 I/Os over 8 channels, with 128 bits per channel. That’s four times the I/O bus width of standard graphics cards. You can hit 2Gbps per pin, and it will be 2.4Gbps at 1.2 volts.”


Fig. 1: Samsung’s HBM2 DRAM 5. Source: Samsung

That’s enormous throughput for external memory. But here, too, the tradeoff is cost.

“You are going to pay a little bit more for HBM, so if you can absorb the cost it is a great solution,” said Ferro. “If you can’t absorb the cost, then what other companies are looking at is how many DDRs or LPDDRs can they squeeze on a board and putting them side-by-side to try to mimic some of the HBM performance with a more traditional solution.”

Making sense of memory
Because the memory market serves many different applications, getting a clear picture of how to approach memory in a design can be tricky. Getting a sense of the various options can help.

“You can basically take your silicon and actually break it into several key vertical markets,” said Farzad Zarrinfar, managing director of the IP Division at Mentor, a Siemens Business. Some vertical markets could be smartphone applications, high-performance computing, automotive, IoT, virtual reality, mixed reality, among others. What you will find is that the silicon technology varies. There isn’t one silicon technology that addresses everything. For example, IoT is very power-sensitive and cost-sensitive, and people take advantage of the most advanced technology in ultra-low-power 40nm or 28nm flavors like ULP or HPC+. Those are a fantastic fit for IoT. In automotive, there is a lot of demand in 28nm and going down.”

The choice of memory compiler is another piece of the puzzle and most memory providers provide them for their memory products. “An intelligent compiler can be instrumental for providing a solution because it can optimize based on different requirements. For example, some applications may need ultra-low dynamic power, whereas automotive has its own requirements, and the only constant that we have there is change. Things are evolving. There are very clear requirements that exist for safety, temperature grades, among a number of other considerations,” Zarrinfar said.

All of these requirements impact the memory design, which the engineering team needs to take into consideration.

“When we design memory we have certain targets, which could be +125 degrees C ambient or 150 degrees C ambient, which translates to some junction temperature,” he said. “We have a marketing requirements document that is based on the target market. Then we know what kind of design we need to have. And then you need to have models from the semiconductor foundry that say this is the range of models that we have. Automotive temperatures are forcing the semiconductor foundries to increase the traditional operating temperature ranges. And while not every permutation of every model is supported in every process node and type, adequate verification must be done to make sure the various combinations will achieve the desired result.”

At the end of the day, the insatiable demand for more bandwidth, less latency, lower power consumption, and more capacity at a lower cost is only expected to increase. With the number of memory types available today, both off-chip and embedded, the system architect must get better all the time at juggling the options for the ideal memory approach to their specific application.

Related Stories
A New Memory Contender?
FeFETs are a promising next-gen memory based on well-understood materials.
The Future Of Memory
Experts at the table, part 3: Security, process variation, shortage of other IP at advanced nodes, and too many foundry processes.
Will China Succeed In Memory?
The country is banking on DRAM and NAND to reduce its trade deficit.
How AI Impacts Memory Systems
The ways different architectures get around the memory bottleneck
Memory Market: Will History Repeat Itself?
China can’t support three DRAM companies, but it’s likely that one of them will be successful.



Leave a Reply


(Note: This name will be displayed publicly)