SPONSOR BLOG

AI Inference: Pools Vs. Streams

Whether data needs to be processed immediately makes a big difference to inference implementation.

April 2nd, 2020 - By: Geoff Tate

Deep Learning and AI Inference originated in the data center and was first deployed in practical, volume applications in the data center. Only recently has Inference begun to spread to Edge applications (anywhere outside of the data center).

In the data center much of the data to be processed is a “pool” of data. For example, when you see your photo album tagged with all of the pictures of your dog or your family members, it was done by an inference program that ran “in the background” processing your photos. The data is available in large chunks and there is nobody waiting for the results, so data can be processed in large batches to maximize throughput/$.

At the edge, the data to be processed is typically coming from a sensor (a camera typically, but also LIDAR, Radar, a medical imaging device and others).

The most common sensor is a camera: typically they capture images at 30 frames/second in megapixel sizes.

So the data on the edge is coming in streams and typically needs to be processed in real time, so latency becomes very important.

Streaming inference example

Let’s consider an example: a 2 Megapixel camera capturing 30 frames/second.

So a new frame is available every 33 milliseconds.

In a typical application there are 3 steps in the processing pipeline:

Image processing to clean up the image (contrast, remove glare, etc)
AI Inference processing
Action on the results of the inference

For example, if the application is autonomous driving, all 3 steps must be completed in a very short time in order to detect and avoid hitting objects like pedestrians or cars.

Different applications will have different needs based on what they are doing and their power/cost/size constraints.

Let’s consider an application that is running a YOLOv3 neural network model which detects and recognizes objects.

Consider the Nvidia Xavier NX: it has 3 processing units which can run the model

The GPU which can process a 2 Megapixel (1440×1440) frame in 95 milliseconds
Two DLAs (Deep Learning Accelerators) which can each process a 2 Megapixel frame in 290 milliseconds

What is the throughput of the Nvidia Xavier NX for YOLOv3?

For a “Pool” application it is as follows:

The GPU can process 10.5 frames/second (1000 milliseconds divided by the 95 ms latency)
Each of the 2 DLAs can process 3.4 frames/second (1000 milliseconds divided by 290 ms latency)
The sum is 17.3 frames/second: this is the Pool or Large Batch throughput

What about for a Streaming application? Each image arrives every 33 milliseconds. The images need to be processed in order to be acted on sequentially. For example, if tracking a pedestrian first you must detect the pedestrian when they first come in sight, then track them frame by frame as they move. The Xavier NX cannot keep up with 30 frames/second. It cannot even keep up with 15 frames/second despite the “Pool” throughput of 17 frames/second above.

Let’s break it down assuming we process every 2^nd image arriving every 67 ms.

Image 0: arrives at 0 msec, dispatched to the GPU, processed by 95msec

Image 1: arrives at 67 msec, GPU is busy, so dispatch to DLA1, processed by 67+290 = 357 msec

Image 2: arrives at 133 msec, GPU is available, processed by 133+95 = 228 msec

Image 3: arrives at 200 msec, GPU is busy, DLA1 is busy, dispatch to DLA2, processed by 490msec

Image 4: arrives at 267 msec, GPU is available, processed by 362 msec

So you can see the images are being processed out of sequence which is not acceptable for the application.

If we instead choose to process 10 frames/second, images arrive every 100msec, so the 95 msec GPU can keep up. So the Streaming Throughput of the Xavier NX for this application is 10 frames/second.

Conclusion

Streaming throughput = the inverse of the latency of the execution unit that can “keep up.”

AI Inference is very new to most of us so it is easy to get confused. Large batch size throughputs sound very impressive, but for edge applications processing is done on Streams in batch sizes of 1 and execution latency is what matters. Streaming throughput is the inverse of execution latency.

Geoff Tate

(all posts)
Geoff Tate is a technology strategy advisor. He was the founding CEO of Flex Logix (now part of Analog Devices). Before that, he was the founding CEO of Rambus, and prior to that he was senior vice president of AMD's processor group. He received his BSc in computer science from the University of Alberta, and an MBA from Harvard Business School.

Knowledge Centers
Entities, people and technologies explored

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

AI Inference: Pools Vs. Streams

Streaming inference example

Conclusion

Geoff Tate

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

Big Changes Ahead For Interposers And Substrates

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Recent Comments

About

Navigation

Connect With Us

AI Inference: Pools Vs. Streams

Streaming inference example

Conclusion

Geoff Tate

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

Big Changes Ahead For Interposers And Substrates

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored