Designing For The Edge

Growth in data is fueling many more options, but so far it’s not clear which of them will win.


Chip and system architectures are beginning to change as the tech industry comes to grips with the need to process more data locally for latency, safety, and privacy/security reasons.

The emergence of the intelligent edge is an effort to take raw data from endpoints, extract the data that requires immediate action, and forward other data to various local, regional or commercial clouds. The basic idea is to prioritize what data goes where, what data should be discarded altogether, and what data needs to be analyzed on a broader scale using more sophisticated tools than are available locally.

“Solutions will range from ‘lighter-weight’ requirements that look like souped-up endpoints, while others will be much more powerful, and look like those found in cloud data centers,” said Steven Woo, fellow and distinguished inventor at Rambus.

Key to this compute model is a more granular approach to assessing the value of data at any point in time. So for self-driving cars, immediacy is critical for accident avoidance and for analyzing location and traffic flows to determine better routes. On the other hand, a preliminary analysis of how patients respond to certain medical treatments could be passed along to a cloud data center to analyze how patients around the world respond to these treatments and to determine the efficacy of various treatment regimens.

Across this range of applications, there will be a broad range of requirements for processors and memories. That makes defining edge devices—let alone designing them—much more difficult. Much of that will depend on the type of task being performed, Woo said.

Many of these operations will require some level of AI to determine what gets processed where, and that will vary greatly. The amount of AI computing required at the edge ranges from 0.5 TMACs (1012 multiply accumulate operations) on smart IoT products, to 3 TMACs on smartphones, 5 TMACs for augmented/virtual reality products, 10 TMACs for surveillance, and tens to hundreds of TMACs for automotive, said Lazaar Louis, senior product marketing group director at Cadence. Here, several techniques are employed by processor architects to lower power, with compute and external memory being the two main contributors for power.

“Adding internal SRAM to the SoC allows the processor to store data in SRAM and minimize data access to external memory,” Louis said. “This helps reduce the memory power consumption. Novel techniques like sparse computing help increase performance and reduce compute power by taking advantage of inherent sparsity in neural networks. In addition, network pruning helps increase sparsity in neural networks with minimal loss in accuracy. This allows processors to trade off compute power for acceptable loss in inference accuracy.”

Other key considerations in the design of intelligent edge compute devices include how much processing will be done on the device – such as a connected sensor – with the goal of transmitting less up to the cloud and having more autonomy at the edge versus having a cloud-centric decision-making process.

“Let’s say you have a connected sensor actuator,” said Jeff Miller, product marketing manager at Mentor, a Siemens Business. “How autonomous is it? How autonomous can it be? Here, there’s usually a cost balance going on where you say, ‘If I consume more power doing compute, I can consume less power or less money in transmitting data back up to the cloud. But then I’m going to have to more frequently service and replace batteries. That’s the first axis—just trying to figure out how much processing needs to be done locally in terms of the overall power budget.”

The second big question is how will that processing be done. “Am I going to use dedicated compute architectures—full custom logic that implements the specific algorithm that I have in mind for that device, or some kind of structured DSP, GPU or dedicated machine learning processors that are beginning to emerge—or just a general compute or embedded processor that can run anything and use the software to define that behavior? Obviously, you can get much better performance per watt with full custom logic, but there’s a huge amount of design effort that can go into that versus having a CPU. It lets you make all of those processing decisions in firmware that you can publish later,” Miller said.

Some of the machine learning processors that have been developed can handle a range of applications. “If you’ve got an ML algorithm, you’ve trained it in your compute farm, you can download that model and you can update that model in some way, giving you the same kind of flexibility that you have with firmware, but applying something that does a lot better on a performance per watt metric if you’re following that sort of ML type algorithms,” he said.

Many intelligent edge compute designs today are highly specific. The advantage there is highly optimized performance and power. The downside is that it only can do one thing extremely well. But if a general-purpose design doesn’t work well enough, a more targeted solution might be the only choice.

“It varies a lot,” Miller noted. “Everything is happening out there all at once but when there’s enough volume or enough constraints, that really pushes people toward making something that really fits that use case.”

The very newness of this application area makes it interesting, but it also makes it confusing because what constitutes the edge for one person may be different than the edge for someone else.

“In server equipment, or networking equipment, there’s a lot of innovation, but it’s a fairly well-defined area,” said Marc Greenberg, product marketing group director at Cadence. “For the edge, it’s still undefined, and that’s why we have trouble finding the definition of where the edge is. We have a lot of high-speed data links and we could choose to take every 4k video from every camera all the way to the central server—but maybe not every one. You have to decide what the tradeoff is. How much of this do I want to do locally? How much of it do I want to send to a central server? How much bandwidth do I have, and how do I do that? How do I do that trade off? What’s the right solution for power, as well? Locally processing the data is probably going to require less energy than shipping that data long distance. But those are still open questions in the industry.”

Geoff Tate, CEO of Flex Logix, defines the edge differently. “In the edge, you’ve may have one camera, which could be a surveillance camera, a camera in your robot or your set-top box, and it’s processing one image. Any architecture that uses large batch sizes to get high throughput is disqualified in the edge. You should be able to do a good job with processing one image at a time, which is also known as batch size equals one. To be in an edge device, you’ve got to be single-digit watts. You’re not going to put an Nvidia Tesla T4 card at $2,000 and 75 watts into your surveillance camera because it’s too much power. But the people at the edge want to do real-time. Lots of detection and recognition, and processing bigger images on the tougher models, is what gives them better prediction accuracy. They want to get as much throughput as they can or fewer watts for their dollar budget.”

All of the chips at the edge will have some of the same elements, Tate noted. “Everybody’s going to have multipliers and accumulators. Everyone’s going to have thousands of them and they’re going to all look about the same. There’s no secret way to do multiplier accumulators. The challenge is to keep the multiply accumulate utilized as much as possible because you’re paying for the transistors. If you don’t have them doing useful work, your throughput per dollar is going down. And transistors burn power even when they’re not doing useful work so your throughput per watt goes down.”

Fig. 1: Inferencing at the edge. Source: Flex Logix

This is where programmability becomes important. “Programmable logic has specific applications to the edge,” said Chris Shore, director embedded solutions, automotive and IoT line of business at Arm. “Especially where the edge is sometimes inaccessible, it’s sometimes inhospitable, and the ability to replace or modify functionality without having to actually access the device is key in a ton of applications and in a number of places where we’re going to use this stuff. Whether that’s programmable logic or whether it’s a programmable learning model or whether it’s just the ability teo replace the firmware on the device, remote upgradeability is absolutely key to a ton of applications.”

But there is a tradeoff there, as well, particularly around cost and raw performance. “One of the things that people have used FPGAs for in the post is for things where the standards may change over time,” noted Greenberg. “There may be a new standard, for example, for the way a particular piece of data is transmitted or perhaps encryption. That might be a good use of programmable logic in that application.”

Leveraging existing compute resources
Regardless of how people define the edge, most agree there is a tremendous amount of untapped potential and an ability to utilize existing compute resources more effectively.

“People seem to be instantly rushing to achieve as much as possible at the edge,” Shore observed. “Everyone is now talking about ML at the edge and AI at the edge. I’m not quite convinced that we’re ready for that yet. We don’t have the hardware in place to do it, and I’m not convinced that people have the right solutions to do that kind of thing in what we’re looking at, which is a sort of microcontroller envelope and microcontroller in terms of power, in terms of size, in terms of cost. I don’t think anyone’s really got that solution in place yet. People are neglecting that there’s an awful lot you can do with what we have in place right now that moves us a long way toward where we want to be. Further, those platforms of the future—all of the AI-at-the-edge that we’re all talking about—they will come because there are huge numbers of companies and people working on that. It will happen, and are there are really exciting developments in it. But there’s a lot that we can do right now with devices that we already have at the edge.”

Specifically, a lot of the sensors at the edge already have been digitalized, Shore said. “If you look at something like health monitors and these sort of things, they already have a significant amount of compute capability built into the device. Even something like a water monitor that a farmer would carpet bomb his fields with is going to have a usable amount of compute built into the sensor, and you can use that to make the system more efficient and commercially more useful in terms of the preprocessing it can provide.”

The same is true for smart watches and thermostats, he said. “A Nest thermostat is capable of doing far more than just running your central heating, but the rest of that compute power is woefully under-used. It could be used to control the lighting and all kinds of other stuff around the house. For want of interoperability standards and the ability to share that compute with other devices and other use cases, that’s just not happening. That’s a shame because that device sits there most of the time on my wall and it’s not even battery powered, so there’s no excuse. This so often happens at the beginning of a rollout of a new way of doing things. Everybody’s trying to do it different ways. Everybody uses different standards, different ways of communicating, different communication protocols, different data standards. That means it’s very difficult to make them cooperate.”

The edge as a concept has been around for decades. What’s new is the idea that more intelligent computing can be done in a range of devices, including figuring out which data to process locally, what to process somewhere else and when and where to move it.

This has set off a giant scramble to both define and control various pieces of the edge, or at least to carve out a lucrative niche. For the foreseeable future, it appears there will be plenty of choices and lots of opinions about how best to apply technology for a particular application, but at this point it’s too early to tell how or when all of this will shake out. This is a brand new opportunity, and so far no one owns it.

Related Articles
Racing To The Edge
The opportunity is daunting, but so are the challenges for making all the pieces work together.
AI Chip Architectures Race To The Edge
Companies battle it out to get artificial intelligence to the edge using various chip architectures as their weapons of choice.
Spreading Intelligence From The Cloud To The Edge
Explosion of data is forcing significant changes in where processing is done.
Planning For 5G And The Edge
Experts at the Table, part 2: Understanding 5G’s benefits, limitations and design challenges.
FD-SOI At The Edge
Experts at the Table, Part 3: Pushing fully depleted SOI to 10/7nm for AI at the edge; what’s missing in the supply chain.
IIoT Edge Is A Moving Target
Industrial Internet Consortium defines use scenarios, not standards, of IIoT edge computing.
Visit our Edge Knowledge Center
All about edge computing: latest articles, blogs, white papers and videos

Leave a Reply

(Note: This name will be displayed publicly)