Common benchmarks like ResNet-50 generally have much higher throughput with large batch sizes than with batch size =1.
For example, the Nvidia Tesla T4 has 4x the throughput at batch=32 than when it is processing in batch=1 mode.
Of course, larger batch sizes have a tradeoff: latency increases which may be undesirable in real-time applications.
Why do larger batches increase throughput...
» read more