Defining Edge Memory Requirements

Edge compute covers a wide range of applications. Understanding bandwidth and capacity needs is critical.

popularity

Defining edge computing memory requirements is a growing problem for chipmakers vying for a piece of this market, because it varies by platform, by application, and even by use case.

Edge computing plays a role in artificial intelligence, automotive, IoT, data centers, as well as wearables, and each has significantly different memory requirements. So it’s important to have memory requirements nailed down early in the design process, along with the processing units and the power, performance and area tradeoffs.

“In the IoT space, ‘edge’ to companies like Cisco is much different than ‘edge’ to companies like NXP,” observed Ron Lowman, strategic marketing manager for IoT at Synopsys. “They have completely different definitions and the scale of the type of processing required looks much different. There are definitely different thoughts out there on what edge is. The hottest trend right now is AI and everything that’s not data center is considered edge because they’re doing edge inference, where optimizations will take place for that.”

Memory IP providers understand users come at this from different angles, so discussions begin at the application level. “You can start at the very low end, where all the memory is embedded on chip—including the non-volatile memory (NVM)—and as you move to more advanced processors in more advanced applications like digital, home, industrial, and mobile, you start leveraging external DRAM, external SPI (Serial Peripheral Interface) NOR Flash, or external wider-bus NOR Flash for NVM,” Lowman said. “In the data center, there is a lot of adoption around HBM2 because of the bandwidth and the pico-joules per bit that are important when you’re doing large applications like AI. It really is about taking into context the perspective of the application as to how to define what edge is.”

Marc Greenberg, group director of product marketing for DDR, HBM, flash/storage and MIPI IP at Cadence, agreed. “There’s a real diversity of what is at the edge. It could be a really lowly thing, a thermostat maybe, or an IoT button of some kind. It could be anything from that up to an edge servers that can collect and aggregate all of that data before loading it further into the cloud. The edge of the cloud is like the edge of a real cloud—not very well defined.”

While there may be a wide range of devices at the edge, these devices can have a relatively powerful CPU, Greenberg pointed out. “Inside they may have some vision or display capability and can be thought of as similar to a 2G cell phone. If you think about what a 2G cell phone was, it had a little camera that wasn’t very good, but it had a camera that could look at things and take pictures. It had a little display. Obviously it had computing capability. And it had network connectivity, either to the cellular network or sometimes to WiFi. Those chips that we had 10 to 20 years ago for 2G cell phones are re-emerging as IoT-type devices. Everything is updated inside of them, but the architectures still look fairly similar. The memory for those IoT type devices has been not so much the very high speed DRAM that we use for an application processor, but more memory over the SPI bus and devices using either four channels or eight channels of SPI — quad-SPI or octal-SPI — and using that kind of memory as the external memory when an external memory is needed.”

And for simple digital applications, those approaches still work. “The initial definition of digital home versus mobile is reflected in the technologies they’ve chosen in the past from a volatile memory perspective,” said Synopsys’ Lowman. “Many of those devices are plugged into the wall, so they’ll leverage the cheapest memory possible such as DDR3, DDR4, and in the future, DDR5.”

AI’s impact
Add AI into applications and the memory needs change. That requires more memory, additional bandwidth, and frequently the latest process nodes.

“AI requires high-performance memory,” Lowman said. “It’s really pushing the envelope with high performance. It’s also pushing the envelope with densities of memory, and with trying to get low leakage. AI algorithms actually require heterogeneous compute, so sometimes you’re doing scalar, sometimes you’re doing standard vector DSP, and sometimes you’re doing massively parallel matrix multiplication like a CNN engine. Each of those heterogeneous compute capabilities may require different memory technologies.”

AI applications also push the limit for the next-generation memory technologies, and users are now looking to integrate DDR and LPDDR not at 28nm, but at 22, 12, or 7nm in order to increase the memory density, and lower the leakage as they adopt finFET process technologies. So while AI doesn’t necessarily change the types of memories users want, it does require the next generations of them.

However, one big shift happening is in the kinds of overall memory architectures being used for AI applications, Lowman asserted. “People are having to create the types of chips that don’t just use standard buses and standard technologies. They want to use the right size, high density SRAM, the right size caches, the right size low leakage SRAM, the right size high performance SRAM. For that, they are adopting new technologies in addition to existing technologies.”

In addition, especially when moving data from the edge devices to the cloud, instead of just using the latest in DDR, there has been an uptick in HBM 2.0 adoption for higher bandwidth, he pointed out. “What’s interesting about those chips is that they’re also adopting DDR because AI has a huge capacity requirement so you have to have both DDR and HBM 2.0. Given these dynamics, users are having to become very innovative on the architecture as a whole and are adopting lots of different technologies and configuring them in differentiated ways.”

In many respects, a machine learning chip looks like other types of high-performance computing chips that have been put out in the past, with the exception of how they use memory. “The AI/machine learning applications definitely use a lot more memory bandwidth than other types of computing applications, but they don’t use a lot of gigabytes of memory. They use a lot of gigabits per second. So it’s a lot of bandwidth but just not so much capacity in terms of megabytes,” Greenberg said.

This comes down to the nature of the algorithms. “The neural network is often kept in an external DRAM,” he explained, “and the things that the neural network needs to know are stored out there in DRAM. You need to go out and touch those things a lot in the memory, read them, update them, read them, update them. There are a lot of transactions that happen between a machine learning SoC and the memory, but there aren’t so many nodes in the neural network that there needs to be a giant amount of capacity to do that neural network function.”

Today, the memory subsystem for an AI or ML chip likely has the very highest bandwidth, with heavy use of graphics memories such as GDDR6, as well as the latest generations of DDR technologies like LPDDR5, DDR5, and HBM2 and future generations of the HBM standard.

Fig. 1: Edge devices with very different memory needs—iPhone 8 vs. Fitbit.

For every one of these applications, there is a sweet spot for an amount of memory that’s needed to perform the majority of the tasks that need to be performed, noted Magdy Abadir, vice president of marketing at Helic. “For embedded applications, which run the same tasks over and over again, like in automotive [other than autonomous applications] where you are doing things like engine control, these types of applications can be characterized as to how long they take, how much memory they need. Then you can figure out the best memory to include in these kinds of devices to achieve decent performance, a decent level of security, and accuracy for the task you’re doing.”

Mobile is another example. “The tasks a phone is doing can be pre-characterized until you get into things like streaming 10 videos at the same time and the phone would say, ‘Tough luck, I can’t do it.’ So there could be limits imposed in certain areas,” Abadir said. “Still, there is always a demand for more memory, no matter what the application is, but you can’t afford to overdo it. You don’t want the memory that’s sitting idle, and you don’t want to be below it because it just causes a lot of paging and going out to memories that are not close by the computing. There are a lot of solutions that technologically ease the problem by bringing memories closer, and reducing the access time to these memories opened up doors for more applications to take advantage of some of these memories.”

How much memory is allocated for a wireless application or for an automotive application depends on the benchmark that being run and where the greatest market opportunities are. “There are a lot of performance modeling people who spend all their life doing nothing but creating these tables, where they add more memory and/or use different processors or a different architecture,” he said. “Some numbers go up, some numbers go down. It’s a balancing act.”

System tradeoffs are always tricky, especially when it comes to memory because more than half of the die area may be taken up by memory.

“When it comes to performance (speed), power and area, architects are constantly making tradeoffs,” said Farzad Zarrinfar, managing director of the IP Division at Mentor, a Siemens Business. “For example, if the power consumption is exceeding the limit, then it has other adverse ramifications like thermal management packaging, and the cost of packaging is going to go up. And if the system cost exceeds the specifications, that product may not be able to go to market. So the power consumption may also have a cost implication.”

These concerns are being seen across the board, even in areas like automotive due to the thermal operating conditions. As such, all automotive-grade IP, including the memories, must be designed to meet these conditions, Zarrinfar said.

Advanced nodes on the edge
Edge chips also are being designed in the most advanced process nodes when there is a need for a lot of computing in the hardware. “Moore’s law is still around, but it’s changing its character, so we still can put more transistors on a die by going down in technology nodes,” Greenberg pointed out. “But in the most recent technology nodes, the transistors have not been getting the cost advantage that we saw from previous process technologies. If we look at the transition from 130nm to 90 to 40 to 28nm, each time we went down a process node, we got more transistors for the die, and we also went down in cost on a dollars-per-transistor or microcents-per-transistor basis. Now, as we’re moving toward very advanced finFET nodes, that cost function has flattened out a bit. We definitely still get more transistors per die, which is great, but it’s not always costing us less.”

What kind of edge device really needs a 7nm die and needs to get all of that computing done that couldn’t necessarily be done at 16 or 28nm? “Some of those things do exist, and what we’re seeing that might be considered to be an edge device requiring 7nm processing are some artificial intelligence applications that are trying to have not only vision associated with them but making sense of what they’re seeing,” Greenberg said.

Think of an IoT camera attached to a smart doorbell, or a smart security camera. “It’s not just taking in a signal, processing it, and sending that signal up to the cloud,” he said. “It’s trying to figure out if there has been movement in the frame and if what it is seeing makes sense. So it’s not only, ‘I saw something move.’ It’s now, ‘Was it a piece of trash or a dog running across my field of vision, or was it a threat to the thing that I’m supposed to be protecting?’ Not only that, but maybe it should do some recognition. Somebody is approaching the house. Who is it? Is it one of the residents? Is it somebody who’s delivering something? Is it somebody who’s really not supposed to be there? Having that amount of processing happening at the edge is one of the directions that things are going, and that’s the sort of case where a 7nm SoC might start to make sense even at the edge of the cloud.”

This is already beginning to happen. Lowman noted that the market for AI chips is bigger than what people think it is already. “It has been estimated that by 2022 or 2023, 50% of chipsets will have AI capabilities, but we’re already seeing that happening today. It’s across the board, and they’re doing different things to make it work. Some of them are just doing small things, some of them are doing complete re-architectures. It’s really interesting to see the different adoption of the IP.”

Looking ahead, the demand for memory is increasing, the percentage of die with memory is increasing, as are more stringent requirements based on the end application, Mentor’s Zarrinfar said. “If you’re looking at automotive, qualification is slightly more rigid. Foundries that offer a process for these types of chips have to worry about extensive qualification, and there’s a cost implication there.”

This may impact the choices the design team has, because a foundry is unlikely to offer qualified automotive IP for every single process node. “This is a parameter the design team has to play with, because when designers have to make a decision that certain process nodes could do the job for them, if it’s not automotive-qualified they have to go with processes that are qualified. Because of this, there is a lot of demand in automotive for 28nm HPC+ and beyond,” Zarrinfar said.

Related Stories
Challenges At The Edge
Real products are starting to hit the market, but this is just the beginning of whole new wave of technology issues.
Processing Moves To The Edge
Definitions vary by market and by vendor, but an explosion of data requires more processing to be done locally.
High-Performance Memory Challenges
Capacity, speed, power and cost become critical factors in memory for AI/ML applications.
Tech Talk: MCU Memory Options
A look at the tradeoffs between embedded NVM and system in package.



Leave a Reply