Accelerating Endpoint Inferencing

Machine learning, with the correct hardware infrastructure, may soon reach endpoints.


Chipmakers are getting ready to debut inference chips for endpoint devices, even though the rest of the machine-learning ecosystem has yet to be established.

Whatever infrastructure does exist today is mostly in the cloud, on edge-computing gateways, or in company-specific data centers, which most companies continue to use. For example, Tesla has its own data center. So do most major carmakers, banks, and virtually every Fortune 1,000 company. And while some processes have been moved into public clouds, the majority of data is staying put for privacy reasons.

Still, something has to be done to handle the mountain of data heading their way. End-point sensors in cameras, medical devices and various other sensors in cars are starting to generate so much data that it’s impossible to process everything remotely. It costs too much money and bandwidth to ship that data to a centralized processing facility, and it takes too long. It also is too power-intensive, and it can introduce security risks.

(Image: Gerd Altmann from Pixabay)

This is why a half-dozen new endpoint chips and chipsets are being introduced this year. But endpoint device designers are struggling to understand—and invent—ways to achieve the promised boost in inferencing performance while staying within the device’s power budget. It may be slow going at first.

An endpoint device is the machine equivalent of an end user. In simple terms, it’s a node connected at the outermost edge of the network whose primary function is unrelated to data networking. An endpoint device produces data to be communicated upstream for other devices to store, manage or process.

A classic example is a smart security camera that is set to constantly monitor a door, yard or other vulnerable area. Most of these devices can identify movement or a significant change in images taken microseconds apart. An inference-enabled version might have enough onboard processing power to identify if a door is open or closed and use authorization badges or facial recognition to know whether they have the right to do so or not. In contrast, sending video to the cloud for analysis takes significantly more time.

Chips aimed at each of those tiers are distinctly different, according to Kurt Shuler, vice president of marketing at Arteris IP. Chips designed for the data center can run at much higher power levels than chips in the networking infrastructure, which often are labeled fog or edge networks. SoCs designed for automotive assisted-driving systems are complex, multi-chip designs “with a significant power budget, but that still can’t be burning more than about 300 watts without causing cooling problems,” Shuler said.

Not to be confused with the training side of AI/machine learning, inferencing compares real-world data to the training data algorithm. “Neural networking traditionally has been considered a high-performance computing platform because of the amount of resources required, especially for training,” said Tom Hackenberg, principal analyst for embedded processor technology at IHS Markit. “But AI training is still largely an MPU-based application—in the data center, but housed mostly in server farms where it is still pretty rare for servers to be fitted with GPUs unless they’re doing graphics. The bulk of inference is being done in the data center, but less than 5% of servers in the market have either a graphics accelerator or another form of accelerator.”

Chipmakers will have to start slowly to adapt the technologies developed in the billion-dollar data center inference market to an endless variety of endpoint devices, according to Geoff Tate, CEO of Flex Logix.

“The very first devices are starting to come up where you need single-digit watts and maybe double-digit-dollar chips, and they’re trying to do things like object detection and recognition,” Tate told Semiconductor Engineering in a recent video. “We don’t even know how big this market is going to be. This is where all the data will be produced that will eventually be sent up to the Internet data centers. Some of this has to be processed locally because there’s too much to move.”

That means an endpoint video camera someday will have to be able to handle 2-megapixel images at 30 frames per second—a flow multiplied by a population of cameras Tate and others estimate in the hundreds of millions worldwide.

“And remember, data centers can be the size of multiple football fields, so it’s possible to aggregate workloads and make the architecture more efficient,” Tate said. “If you’re at the edge, you’ve got one camera. You’ve got to process images one at a time, and a lot of current architectures don’t do that well.”

Much of this will depend upon the speed at which sensors are added into electronics. “Everyone we talked to says if adoption [of endpoint devices] is broad, that there’s just not enough bandwidth to get it into the data center. Processing has to be where the data is generated,” Tate said. “If you’re doing inferencing, there is no reason to go outside of wherever a device is. If you look at companies like WalMart or Wells Fargo, they already have cameras. What they’re looking for now is more detailed analytics.”

That requires more computation inside or close to the cameras. Robust support for multiply/accumulate operations is critical, as is good support for visual and audio recognition, which are the functions in highest demand for inference clients.

“There is no roadmap here,” Tate said. “This has never been done at this level. The edge has never really existed as a concept other than as just sort of a client-server type approach.”

Defining terms
Still, there is plenty of confusion about what this space is, what architectures will work best, and even how to delineate the various pieces within it. While endpoint devices and cloud computing are in widespread use these days, there is a vast space between them that so far no one owns. It has been called the edge, the fog, the midrange, and at various times included both endpoint devices and corporate data centers get lumped in with them.

Some people define the edge as anything that is not the cloud. But there also are edge servers and edge clouds, at least in concept.

“There is a whole range of things that people are calling the edge, from a mini-data center to a microcontroller-based object,” said Rob Aitken, an Arm fellow. “There will continue to be more differentiation among those, but the challenge in the definition is which models have legs going forward.”

Ty Garibay, vice president of engineering at Mythic, looks at it from the standpoint of the power source, because the battery-power edge device is different than a device with a plug or an internal combustion engine. “Everyone agrees it’s not possible for all of these devices to stream video to the cloud, and processing at the edge has to be capable of handling streaming video and do that in a thermal and power envelope. And if you’re a system-level customer, you’ll probably craft a solution that is part of a private network.”

Charlie Janac, CEO of Arteris IP, distinguishes end point and cloud technology based on storage. “If you think about a car, which is end-point-like, there is a minimal amount of storage. That means you need to do all or most of the processing on board. All of the big storage is in the cloud.”

And Raik Brinkmann, CEO of OneSpin Solutions, views the edge as starting with the sensor and ending with a communications link, which probably is wireless. “The trend for people doing edge devices is to include multiple levels of AI. So a simple AI algorithm may detect movement, which powers up the next stage, which may switch to recognition. And if that’s interesting, then it will power up the real computation engine that does something.”

There are so many possibilities and permutations, though, that confusion is likely to persist for the foreseeable future, with terminology evolving alongside of the technology.

Demand starts to get real supply
None of that diminishes the vibrancy of this technology shift, however. Demand for machine learning apps has skyrocketed since Apple announced in 2017 that it would put a “neural engine” in the iPhone X to improve the performance of FaceID. Every other phone maker had to follow suit, which has helped drive demand for sophisticated machine-learning-driven capabilities since then. By 2023, just under half of all servers and 1.9 billion client devices will contain deep-learning accelerator chips, according to the Linley Group’s 2019 Guide to Processors for Deep Learning.

Most new high-end smartphones have an AI accelerator, including Apple’s A11 and A12 processors, Samsung’s Exynos 9810, Huawei Kirin’s 970 and 980, Qualcomm’s Snapdragon 845 and 855 and MediaTek’s Helio P90 with the Cadence P6 neural engine, according to the Linley report.

Inference accelerators have also shown up in the voice assistants used in smart speakers and other smart-home devices, as well as home security cameras, the Ring smart doorbell, consumer drones and industrial IoT devices including smart parking meters parking sensors, smart vending machines and building-environment monitors.

The promise of a machine-learning-enabled future inspired a raft of inference-specific product development from existing vendors and the entry of so many startups and spinoffs targeting the inference end of the machine-learning market that it is hard to keep a list of more than 90 startups up to date, according to Arteris IP’s Shuler.

Nvidia announced a line of comparatively low-priced line of inference engines designed for servers and gateways on the periphery of the network. Intel announced a series of down-powered Xeons and a line of 10nm FPGA chips designed for small computers or endpoint devices.

The eFPGA and IP provider FlexLogix announced InferX X1, a power-efficient, image-optimized inference co-processor designed to add machine-learning power to edge gateways, low-end servers and other high-performance edge devices.

Some day a big deal
It is easy to overestimate how big a deal machine learning really is, however.

“Sales growth for semiconductors optimized for AI applications is in triple-digit percentages compared to most non-specialized semiconductor products that are in the low single digits,” said IHS Markit’s Hackenberg. “That is growth from a small base, though—less than 1% of the market in many applications. It’s higher in the automotive market for ADAS, but the majority of control systems in automotive rely on plain old MCUs, not inference accelerators, for things other than the ADAS.

“There is a lot of hype and a lot of confusion because we are dealing with a real revolutionary technology that has enabled a lot of new use cases and allowed us to do things in ways that are more simple, but which had been more complicated,” according to Noam Mizrahi, vice president of technology in the CTO office at Marvell. “This is an exciting technology that will really change us and the way we do things very efficiently. We are all very excited, but no matter what it is, one technology is not the answer to everything.”

What exactly AI/machine learning is good for, and what it is not good for, are still a matter of research. “Everyone is trying to do everything with machine learning and assuming it will solve all the problems, but I’m not sure that everything you need to deploy really needs a machine learning algorithm,” Mizrahi said. 

Data heavy
One potential advantage to running inference on an endpoint device is the potential to let the devices themselves solve problems like the flood of data that comes from connecting too many devices and asking them to communicate.

“Security cameras collect a lot of data that needs to be processed down and matched with a specific set of actions,” said Jeff Miller, product marketing manager at Mentor, a Siemens Business. “Is the door open? Is the door closed? Is there a person there? Security cameras are actually one of the areas I have a lot of privacy concerns in sending everything to the cloud. An edge-processing device can mitigate those concerns because then the camera can process the data locally, run whatever processes you need to safeguard it, and send relevant information to an application in the cloud.”

It’s not clear, however, how well developed the ability is to identify incoming data or know how to deal with it. “If you drive all day, you arrive home and have to do something with tons of information that has been collected by your car at your house,” Mizrahi said. “This information needs to be uploaded so you can extract the value from it, but directly uploading hundreds of gigabytes of data is probably not the most efficient way. If you can replace the speed of analysis with cost, you can use a background processing system. You don’t need a very big engine. You can run the process in the background or on an IoT device, or in the car or other engine component.”

Machine learning is spreading slowly but relentlessly through the computing ecosystem but doesn’t create the same advantages for every person with every device.

“AI is a fast-growing market and it’s very likely we will see most inferencing and training handed off to some kind of co-processor or edge device, in the near future,” IHS Markit’s Hackenberg said. “Every processor vendor will have some form of inference accelerator. But requirements for AI applications are not one size fits all. Many are in applications that work well on standard scalar applications.”

The greatest potential of any machine-learning effort is not that the computer will automatically learn how to imitate human decisions or behavior, Mizrahi said. The real value would be from having an ML system come up with a uniquely beneficial solution that no human would ever have thought of.

Machine learning has so much potential that even a clunky, brute-force application of its capabilities can be remarkable, Mizrahi said. “There is a lot of value in the process; in being able to connect all the dots to create a clear picture. Humans can do that to a certain level; I expect we will get a lot more out of a machine that can do the same thing at very large scale.”

—Ed Sperling contributed to this report.

Related Stories
Edge Knowledge Center
Edge top stories, videos, white papers, and blogs
Designing For The Edge
Growth in data is fueling many more options, but so far it’s not clear which of them will win.
Racing To The Edge
The opportunity is daunting, but so are the challenges for making all the pieces work together.
AI Chip Architectures Race To The Edge
Companies battle it out to get artificial intelligence to the edge using various chip architectures as their weapons of choice.
Spreading Intelligence From The Cloud To The Edge
Explosion of data is forcing significant changes in where processing is done.
Planning For 5G And The Edge
Experts at the Table, part 2: Understanding 5G’s benefits, limitations and design challenges.
FD-SOI At The Edge
Experts at the Table, Part 3: Pushing fully depleted SOI to 10/7nm for AI at the edge; what’s missing in the supply chain.
IIoT Edge Is A Moving Target
Industrial Internet Consortium defines use scenarios, not standards, of IIoT edge computing.

Improving Edge Inferencing
Where are the bottlenecks in AI chips and how to boost the efficiency.
Inferencing At The Edge
Why a different architecture is needed to handle massive amounts of data.
Building An Efficient Inferencing Engine In A Car
How to model a chip quickly, including corner cases.
Edge Inferencing Challenges
Balancing different variables to improve performance.
Benchmarks For The Edge
What works, what doesn’t and why.

Leave a Reply

(Note: This name will be displayed publicly)