Tradeoffs Between Edge Vs. Cloud

As localized processors become more powerful, what works best where?


Increasing amounts of processing are being done on the edge, but how the balance will change between what’s computed in the cloud versus the edge remains unclear. The answer may depend as much on the value of data and other commercial reasons as on technical limitations.

The pendulum has been swinging between doing all processing in the cloud to doing increasing amounts of processing at the edge. There are good reasons for this. Processing locally reduces latency, increases reliability and security, and it helps to resolve data privacy concerns. And with increasingly powerful edge processor architectures, as well as a reduction in the resolution required for many applications, more computing at the edge is rapidly gaining traction.

But how far can this transition go? Do we really need the cloud anymore, or will that become a distributed function, following the lead of several other technologies?

As of today, edge computing is on the upswing. “The way algorithm development is going, and the direction of computer architectures, means that from an architecture point of view more and more compute is likely to get closer to the edge,” says Simon Davidmann, CEO for Imperas Software. “If your device runs from a battery, that will limit the amount of computation you can do and it may be one of the limiting factors, but batteries are getting better. If you don’t do it on the edge, you have to communicate back to base, which also takes a lot of power.”

Edge computing itself is getting faster, too. “Because of the process technology, because of the innovations in the AI hardware design, and because we are doing processing in fixed point, like 8-bit, you can put a lot of processing power on devices like mobile platforms,” says Pulin Desai, group director for product marketing, management and business development at Cadence. “Obviously, there is some limit, but you cannot think that you only have a very small amount of processing. These devices have a lot of processing power.”

But computing at the edge also has limitations. “When you cannot do something on the edge, because devices are not powerful enough, you have to do them in the cloud,” says Suhas Mitra, product marketing director for Tensilica AI products at Cadence. “Today, we are seeing more go to the edge, but some remain in the cloud. That is basically how it will be – some on the edge and some in the cloud. The question is, where does the balance lie? That depends upon the kind of networks we are talking about and the use cases.”

It also depends upon the workload. “The reason things are migrating to the edge is because every time you move data through the network, it’s expensive,” says Frank Ferro, senior director of product management for IP cores at Rambus. “It costs power. It is about efficiency. Customers don’t want to wait anymore. So if you have applications that are trying to run quickly, you want to move them as close to the edge as possible. The more you can do locally, the less you have to move that data through the network. Or with AI, you train these big models in the cloud and then you push them out to the endpoints for inference.”

For mission-critical applications, there is little choice but to move compute closer to the data source. “You don’t want to rely on interconnect for mission-critical things,” says Joe Rodriguez, product marketing manager at Rambus. “It’s just a natural evolution to move them the edge. They will probably both coexist. The Cloud will be used where there are large datasets that you process in the background, and smaller things can evolve and be on the edge. The balance will maneuver with the problem and technology availability.”

While the edge is seeing increasing amount of compute, it also has limitations. One of those is storage.

“I have appreciation for the cloud’s big data storage capability and how it enables management and visibility from multiple distributed deployments,” says Flavio Bonomi, board technology advisor at Lynx Software Technologies. “However, storing data locally is more private. People and organizations don’t trust that big cloud companies will use their data in a well-meaning way. Storing data on the edge will support a safer environment. The cloud cannot effectively be used to control moving data that’s needed for real-time decisions. Cloud computing is great for predictive data analytics, but ultra-quick decisions require rich data to be processed locally. So if we’re going to see an evolution from data-driven to AI-driven workflows, the future lies in moving a lot of compute power to the edge.”

This is particularly evident when it comes to moving data using cellular technology. 5G will help to some degree, particularly with millimeter-wave, but it is still not the most efficient approach. “The wireless link is always the most expensive link in the network,” says Sam Fuller, senior director of marketing for AI Inferencing at Flex Logix. “You will always be looking at the tradeoff between where I do my processing and what I need to communicate to minimize the cost of that link. In high-bandwidth applications like vision or video processing, it’s crazy to try to process that data across the 5G link. If you’re talking about a text message, or you’re talking about something like audio data, maybe it makes sense to do that. But you always have to look at the bandwidth price tradeoffs.”

Data as the new oil
One such tradeoff involves shrinking the volume of data close to the source. At that point, it can either be processed and stored locally, or sent to the cloud for further processing. “Data transfer to the cloud needs a lot of energy,” says Andy Heinig, group leader for advanced system integration and department head for efficient electronics at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “Pre-processing on the edge makes a lot of sense because the data often contain little information of value in the AI context. In many cases, nothing extraordinary will be happening for hours, and then there is a small event that should be detected.”

The real value of the cloud, in this case, is a broader analysis of processed data that is less time-sensitive. “An edge system may need to do its own analytics for its primary task,” says Tim Vehling, senior vice president of product and business development at Mythic. “An example is a drone that is examining an electrical tower. It may detect something and immediately has to act on that to figure out what the issue is. When it returns to its home base, it may transfer a lot more data that you want to analyze in the cloud. There are different layers of processing that could be done.”

That pre-processed data can be refined further. “You don’t need all of the bytes,” says Imperas’ Davdimann. “You want quality data and not mindless data. A lot of computation associated with AI is mindless. You’re doing a lot of processing to decide what is in front of me. You don’t need to transfer all of the pixels. For diagnostics, that’s a different issue. You might want to find why something made a bad decision. When you collect quality data, you will often find there are several ways in which you can use it. It is a way of abstracting information.”

Balancing these very different use cases for data is essential for increasingly smart automobiles. “This is an example where the edge is doing two things,” says Cadence’s Mitra. “One is to do inference. That is the ability to take the network and run it. Every now and then, it is maintaining local data and statistics about what is running, the use case. This is metadata. Then, every night, or every week, it sends that information upstream. When you think about millions of people sending this over time, the networks become more useful. They will see situations that could not have been covered in simulation or even in drive testing. They need to collaborate, but how they do that is decided by the use case.”

Data transfer also triggers privacy and security concerns, which is another consideration for where data is processed, and from there, which data is sent to the cloud. “You may not be allowed or may not want to be transmitting things like face identification, or images of people in private or public,” says Mythic’s Vehling. “That may force you to do your analytics locally and only transmit the metadata results. It may be on a needs basis that you get the raw data. That’s another scenario where you still need high-speed connection for connectivity to the cloud, but you may want to do model processing locally because of those privacy concerns.”

Architectural migration
The migration of AI to the edge has triggered profound changes in computer architectures. “In the past, people were trying to build hardware with lots of parallel processes, but they weren’t focused on how it could be programmed,” says Davidmann. “We had so much silicon that everyone was trying to build parallel processes, but there weren’t the applications for it. If you fast forward to today, we have large software suites and frameworks for machine learning and AI, like Tensor and Caffe. People are now building hardware to try and accelerate that software. That’s driving the demand for better hardware architectures to fulfill the needs of those software algorithms. AI works really well on parallel processors. And so now we’ve got all these hardware parallel architectures coming out to fulfill that demand for faster AI algorithm execution.”

That also has caused a shift in attitudes about data formats. “Several years ago, there were definitely questions about whether you could actually quantize a model and get the necessary accuracy,” says Vehling. “That seems to not be an issue anymore, and that argument has pretty much gone away. You will not hear many people claim they need floating point or 16-bit for accuracy. For most applications, especially in analytics on the edge side, 8-bit, maybe even 4-bit might be okay from an accuracy point of view.”

Accuracy has to be balanced against other factors. “People spend a lot more time looking at the accuracy requirement, and what is required to meet that,” says Cadence’s Desai. “There will be certain workloads, or certain use cases, where people may be okay with 4-bit. This enables them to get higher performance, or higher throughput, lower energy, or reduce memory demand. This is the same argument as when we went from floating point to 8-bit. Sometimes people will need to have mixed mode. It may be acceptable to run parts the network in 8-bit, but other parts require 16-bit, and 4-bit is all that is needed for others.”

That has brought about another architectural change. “In the RISC-V world, the new vector engine has just been ratified,” says Davidmann. “You can configure the engine to be 32-bit floating point, bfloat16, or fixed-point. They built the vector engines, so if you’ve got a 32-bit word but are processing 8-bit, you can do the computations in parallel. It is a single instruction, multiple data (SIMD) engine. You get better throughput because you use less cycles to do more operations. This means the architecture is a little more complex, but it means you get more computation per cycle without increasing the silicon much.”

The edge dichotomy
That flexibility creates opportunity. “As the performance levels of edge hardware are going up, they’re actually looking at deploying models that can take advantage of that performance to get better throughput and better accuracy,” says Dana McCarty, vice president for inference sales, marketing, and applications at Flex Logix. “They were stripping down models to work on the edge, but some customers are now moving away from that and moving more to richer frameworks, because they’re seeing that they can get the better performance from it within the power envelopes that they want.”

And that creates new demands on that hardware. “Typically, when you build an edge system, you just have a little bit of DDR memory,” says Rambus’ Ferro. “But now they are trying to do more and more in-line processing where their DDR requirement is exploding to the point where it just couldn’t manage the amount of processing. So HBM has starting to fill in the gaps on the edge.”

This, in turn, is creating a divide in edge processors. “We see the TinyML initiative, which is running AI on tiny microcontrollers,” says Vehling. “On the flip side you have these massive models being trained and deployed in the cloud. I guess the question is, ‘What are the edge devices going to look like?’ Are there going to be more like the super tiny models, or will they increasingly want to run bigger and more complex models?”

Software is driving hardware. “People looking at algorithms are somewhat agnostic about how far the hardware and software can be pushed in their roadmap,” says Mitra. “Their job is to convince people that new designs and new technology are required to improve the solution. There is a gap. It comes primarily not because of the innate ability of being unable to move forward fast enough, but the fact that the number of degrees of freedom in the cloud is slightly larger. They can move much faster.”

Use case limits
Hardware has limitations. Some of those limitations are created by the state of technology, others by things like power, cost, or form factor. “Despite that, they’re looking for relatively high-performance analytical models, and that’s the hard thing to do,” says Vehling. “That is what we call the high-performance edge. How do you solve all those parameters effectively? How do you satisfy high performance, big model, low latency, but yet have low power, low cost, and small form factor? That is the magical combination, and if a company can solve it for the intersection of those items, that can be a home-run product.”

Accomplishing that may require new design tools. “As people build better architectures, we have to build better tools,” says Davidmann. “A lot of the smarts in a big architecture is the network on chip, and how things communicate. We can model some of that at a low level, or we can abstract it away. From a software development point of view, you don’t really care how the data flows. You just assume you have access to the data you need. So there are two types of problems. One is functional accuracy, and this is what the industry has concentrated on. But the big challenge we see is performance analysis – how can you model the cost of moving data around? That is a very challenging sets of tools. Companies need to do architecture performance analysis. The size of modern chips makes that quite a challenge in terms of how you accurately model it and the amount of analysis you can do.”

Learning on the edge
While some compute will certainly move to the edge, other compute operations appear to be firmly ensconced in the cloud. This is especially true for training of AI/ML/DL algorithms. “One of the things about training, especially initial training, is that it takes a long time,” says Flex Logix’s Fuller. “It is very computationally intense. We’re talking about it being a million to a billion times more intense to train a model than to do an inference. When people talk about learning on the edge, they’re not talking about that level of training. What they’re talking about is essentially touching up, tuning up, or adapting an existing trained model to a particular use case.”

Vehling agrees. “There could be some retraining, or minor training, that could move to the edge. This is especially the case where you may want to introduce a slightly different data set. I don’t think you’ll see brand new, from scratch, model training. That will remain more of a cloud, or server-based application. But you could see some tuning up or retraining of your model in an edge device.”

Training may be local. “If there is a voice activation workload that runs on your phone, the network will be trained for each person’s voice,” says Mitra. “There are subtle differences in the way we speak, or intonation, and those models will be refined as we go along. That is fine tuning and will help everyone in the long run.”

It depends upon use case. “AI training produces vast volumes of data that are almost exclusively implemented and stored in the cloud,” says Lynx’s Bonomi. “As we see more compute at the edge, organizations are able to create a change to process and instead look for patterns locally. When you run digital twins in parallel at the edge, you enable one model to learn and apply something in real-time from the other, improving the control of real-world systems.”

It may require the function to be split. “Most machine learning models have a feature extraction component, called the backbone,” says Fuller. “Then there is the decision-making piece, which takes those features and processes them to provide something that can be acted upon. This is often called the head. You may have a big feature extraction component (backbone), and then multiple heads, each looking at that data in different ways. It is the heads that are much more likely to get trained or retrained on the edge.”

There is a concept of transfer learning, where once a system is deployed in the field, they can be updated based on what they’re seeing and what they’re learning. That makes them better adapted for that particular use case. “The edge needs to be adaptable, and you need to be able to adapt the models over time to make them better suited for the application,” adds Fuller. “It is going to be a critical component of this sort of work. It will be a combination of a centralized development, training, and then deployment and field update services.”

The situation is fluid. “The curve will keep bending in one direction or the other,” says Mitra. “The reality is they will collaborate, which is something they don’t do a lot of today. This will enable them to make the networks run better or faster, personalizing the network.”

Will the curve bend back toward the cloud? “The architecture of ‘systems of systems’ has oscillated from centralized processing to distributed processing,” says Bonomi. “The significant benefits of cloud include a critical mass of developers and ease of development. This means that a shift of power back toward the cloud is very possible.”

Making Sense Of New Edge-Inference Architectures
How to navigate a flood of confusing choices and terminology.
Edge-Inference Architectures Proliferate
What makes one AI system better than another depends on a lot of different factors, including some that aren’t entirely clear.
Challenges In Developing A New Inferencing Chip
What’s involved in designing, developing, testing, and modifying an accelerator IC at the edge.

Leave a Reply

(Note: This name will be displayed publicly)