ResNet-50 Does Not Predict Inference Throughput For MegaPixel Neural Network Models

Why the size of images being processed changes how accelerators should be assessed.

popularity

Customers are considering applications for AI inference and want to evaluate multiple inference accelerators.

As we discussed last month, TOPS do NOT correlate with inference throughput and you should use real neural network models to benchmark accelerators.

So is ResNet-50 a good benchmark for evaluating relative performance of inference accelerators?

If your application is going to process small images (300 x 300 pixels or smaller), then ResNet-50 batch=1 throughput may be a good way to evaluate your options, even though no one actually uses it in a real-world application. It does use a range of operations that will stress compute even though the small image size doesn’t stress memory subsystem, but that’s ok if your model is for small images.

But if you plan to process large images (608×608 up to Megapixels), then ResNet-50 batch=1 throughput is not necessarily going to be predictive of performance of neural network models processing much larger images. (Remember, 608×608 is FOUR times bigger than 300×300; 1440×1440 = 2 megapixel images are about 25x bigger.)

Why?

Neural network models consist of:

  • An input image size and depth (larger images will give higher accuracy predictions, just like us humans can recognize large crisp images better than small fuzzy ones)
  • Dozens to hundreds of layers which take the output of the previous layer and generates the input to the next layer (activations)
  • Weights for all of the layers
  • The code to run them on the accelerator

So larger, tougher models are more difficult to process due to not just higher compute but because of much larger memory requirements.

How much memory capacity is required?

Weights take a lot of room: 22.7MB for ResNet-50 INT8 and 62MB for YOLOv3 INT8. The expectation is that future, better models will be bigger models with more weights.

Intermediate activation storage: the largest activation for ResNet-50 is 0.8MB and the next largest is 0.4MB; this is for 224×224 images. So a buffer memory of 1.5-2MB is sufficient for batch size =1.

Intermediate activation storage for YOLOv3 can be much larger depending on the size of the image. YOLOv3 requires intermediate activation storage of ~18MB for 608×608 images and >100MB for 2Megapixel images.

Some edge inference accelerators that only have SRAM will at some point run out of room to store everything. ResNet-50 might run fast because it fits in SRAM – but YOLOv3 likely will not unless the on-chip SRAM is huge, making the chip very expensive.

Code size is significant, but no one is disclosing this information yet for their chips.

The chart below shows the total megabytes required in an inference chip for weights and activations for ResNet-50 and YOLOv3 at various images sizes.

There are 3 choices for memory system implementation for AI inference chips. Most chips will have a combination of 2 or 3 of these choices in different ratios:

  1. Distributed local SRAM – a little less area efficient since overhead is shared across fewer bits, but keeping SRAM close to compute cuts latency, cuts power, and increases bandwidth.
  2. Single bulk SRAM – a little more area efficient but moving data across chip increases power, increases latency and makes the single SRAM the performance bottleneck.
  3. DRAM – much cheaper cost per bit but the number of bits is likely way more than is needed; the power is significantly higher than SRAM access; and the cost on the controller to access the DRAM with high bandwidth is very significant.

When the memory requirements exceed on-chip SRAM, the benchmarks will determine how well the chip architecture/software handles DRAM traffic: does it bog down performance or is it pipelined and “hidden” behind other transactions to minimize the impact on compute.

Conclusion

ResNet-50 is a popular benchmark which is fine to compare inference accelerators if you plan to process small images.

But it won’t stress the memory subsystem the way a megapixel model like YOLOv3 will.

So don’t use ResNet-50 to compare accelerators if you want to process near-megapixel and megapixel images. YOLOv3 is a good alternative to consider.



1 comments

Theo Omtzigt says:

We need more empirical measurements to map out the behavior of raw models as well as integrated application performance. Too many models, parameters, and compute engines for individual research/product teams to evaluate.

Leave a Reply


(Note: This name will be displayed publicly)