Why the size of images being processed changes how accelerators should be assessed.
Customers are considering applications for AI inference and want to evaluate multiple inference accelerators.
As we discussed last month, TOPS do NOT correlate with inference throughput and you should use real neural network models to benchmark accelerators.
So is ResNet-50 a good benchmark for evaluating relative performance of inference accelerators?
If your application is going to process small images (300 x 300 pixels or smaller), then ResNet-50 batch=1 throughput may be a good way to evaluate your options, even though no one actually uses it in a real-world application. It does use a range of operations that will stress compute even though the small image size doesn’t stress memory subsystem, but that’s ok if your model is for small images.
But if you plan to process large images (608×608 up to Megapixels), then ResNet-50 batch=1 throughput is not necessarily going to be predictive of performance of neural network models processing much larger images. (Remember, 608×608 is FOUR times bigger than 300×300; 1440×1440 = 2 megapixel images are about 25x bigger.)
Why?
Neural network models consist of:
So larger, tougher models are more difficult to process due to not just higher compute but because of much larger memory requirements.
Weights take a lot of room: 22.7MB for ResNet-50 INT8 and 62MB for YOLOv3 INT8. The expectation is that future, better models will be bigger models with more weights.
Intermediate activation storage: the largest activation for ResNet-50 is 0.8MB and the next largest is 0.4MB; this is for 224×224 images. So a buffer memory of 1.5-2MB is sufficient for batch size =1.
Intermediate activation storage for YOLOv3 can be much larger depending on the size of the image. YOLOv3 requires intermediate activation storage of ~18MB for 608×608 images and >100MB for 2Megapixel images.
Some edge inference accelerators that only have SRAM will at some point run out of room to store everything. ResNet-50 might run fast because it fits in SRAM – but YOLOv3 likely will not unless the on-chip SRAM is huge, making the chip very expensive.
Code size is significant, but no one is disclosing this information yet for their chips.
The chart below shows the total megabytes required in an inference chip for weights and activations for ResNet-50 and YOLOv3 at various images sizes.
There are 3 choices for memory system implementation for AI inference chips. Most chips will have a combination of 2 or 3 of these choices in different ratios:
When the memory requirements exceed on-chip SRAM, the benchmarks will determine how well the chip architecture/software handles DRAM traffic: does it bog down performance or is it pipelined and “hidden” behind other transactions to minimize the impact on compute.
ResNet-50 is a popular benchmark which is fine to compare inference accelerators if you plan to process small images.
But it won’t stress the memory subsystem the way a megapixel model like YOLOv3 will.
So don’t use ResNet-50 to compare accelerators if you want to process near-megapixel and megapixel images. YOLOv3 is a good alternative to consider.
We need more empirical measurements to map out the behavior of raw models as well as integrated application performance. Too many models, parameters, and compute engines for individual research/product teams to evaluate.