SPONSOR BLOG

ResNet-50 Does Not Predict Inference Throughput For MegaPixel Neural Network Models

Why the size of images being processed changes how accelerators should be assessed.

November 5th, 2020 - By: Geoff Tate

Customers are considering applications for AI inference and want to evaluate multiple inference accelerators.

As we discussed last month, TOPS do NOT correlate with inference throughput and you should use real neural network models to benchmark accelerators.

So is ResNet-50 a good benchmark for evaluating relative performance of inference accelerators?

If your application is going to process small images (300 x 300 pixels or smaller), then ResNet-50 batch=1 throughput may be a good way to evaluate your options, even though no one actually uses it in a real-world application. It does use a range of operations that will stress compute even though the small image size doesn’t stress memory subsystem, but that’s ok if your model is for small images.

But if you plan to process large images (608×608 up to Megapixels), then ResNet-50 batch=1 throughput is not necessarily going to be predictive of performance of neural network models processing much larger images. (Remember, 608×608 is FOUR times bigger than 300×300; 1440×1440 = 2 megapixel images are about 25x bigger.)

Why?

Neural network models consist of:

An input image size and depth (larger images will give higher accuracy predictions, just like us humans can recognize large crisp images better than small fuzzy ones)
Dozens to hundreds of layers which take the output of the previous layer and generates the input to the next layer (activations)
Weights for all of the layers
The code to run them on the accelerator

So larger, tougher models are more difficult to process due to not just higher compute but because of much larger memory requirements.

How much memory capacity is required?

Weights take a lot of room: 22.7MB for ResNet-50 INT8 and 62MB for YOLOv3 INT8. The expectation is that future, better models will be bigger models with more weights.

Intermediate activation storage: the largest activation for ResNet-50 is 0.8MB and the next largest is 0.4MB; this is for 224×224 images. So a buffer memory of 1.5-2MB is sufficient for batch size =1.

Intermediate activation storage for YOLOv3 can be much larger depending on the size of the image. YOLOv3 requires intermediate activation storage of ~18MB for 608×608 images and >100MB for 2Megapixel images.

Some edge inference accelerators that only have SRAM will at some point run out of room to store everything. ResNet-50 might run fast because it fits in SRAM – but YOLOv3 likely will not unless the on-chip SRAM is huge, making the chip very expensive.

Code size is significant, but no one is disclosing this information yet for their chips.

The chart below shows the total megabytes required in an inference chip for weights and activations for ResNet-50 and YOLOv3 at various images sizes.

There are 3 choices for memory system implementation for AI inference chips. Most chips will have a combination of 2 or 3 of these choices in different ratios:

Distributed local SRAM – a little less area efficient since overhead is shared across fewer bits, but keeping SRAM close to compute cuts latency, cuts power, and increases bandwidth.
Single bulk SRAM – a little more area efficient but moving data across chip increases power, increases latency and makes the single SRAM the performance bottleneck.
DRAM – much cheaper cost per bit but the number of bits is likely way more than is needed; the power is significantly higher than SRAM access; and the cost on the controller to access the DRAM with high bandwidth is very significant.

When the memory requirements exceed on-chip SRAM, the benchmarks will determine how well the chip architecture/software handles DRAM traffic: does it bog down performance or is it pipelined and “hidden” behind other transactions to minimize the impact on compute.

Conclusion

ResNet-50 is a popular benchmark which is fine to compare inference accelerators if you plan to process small images.

But it won’t stress the memory subsystem the way a megapixel model like YOLOv3 will.

So don’t use ResNet-50 to compare accelerators if you want to process near-megapixel and megapixel images. YOLOv3 is a good alternative to consider.

Geoff Tate

(all posts)
Geoff Tate is a technology strategy advisor. He was the founding CEO of Flex Logix (now part of Analog Devices). Before that, he was the founding CEO of Rambus, and prior to that he was senior vice president of AMD's processor group. He received his BSc in computer science from the University of Alberta, and an MBA from Harvard Business School.

1 comments

Theo Omtzigt says:

November 7, 2020 at 2:14 pm

We need more empirical measurements to map out the behavior of raw models as well as integrated application performance. Too many models, parameters, and compute engines for individual research/product teams to evaluate.

Knowledge Centers
Entities, people and technologies explored

EUV’s Future Looks Even Brighter

Demand for AI chips is growing exponentially, but costs and complexity limit the technology to a handful of companies. That could soon change.

by Gregory Haley

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

ResNet-50 Does Not Predict Inference Throughput For MegaPixel Neural Network Models

How much memory capacity is required?

Conclusion

Geoff Tate

1 comments

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

EUV’s Future Looks Even Brighter

Startup Funding: Q1 2025

Chip Industry Week in Review

Advanced Packaging Fundamentals for Semiconductor Engineers

Linear Pluggable Optics Save Energy In Data Centers

Interconnects Approach Tipping Point

Optimization of Oxygen Plasma Conditions for Cu-Cu Bonding

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Recent Comments

About

Navigation

Connect With Us

ResNet-50 Does Not Predict Inference Throughput For MegaPixel Neural Network Models

How much memory capacity is required?

Conclusion

Geoff Tate

1 comments

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

EUV’s Future Looks Even Brighter

Startup Funding: Q1 2025

Chip Industry Week in Review

Advanced Packaging Fundamentals for Semiconductor Engineers

Linear Pluggable Optics Save Energy In Data Centers

Interconnects Approach Tipping Point

Optimization of Oxygen Plasma Conditions for Cu-Cu Bonding

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored