Do Large Batches Always Improve Neural Network Throughput?


Common benchmarks like ResNet-50 generally have much higher throughput with large batch sizes than with batch size =1. For example, the Nvidia Tesla T4 has 4x the throughput at batch=32 than when it is processing in batch=1 mode. Of course, larger batch sizes have a tradeoff: latency increases which may be undesirable in real-time applications. Why do larger batches increase throughput... » read more

Neural Network Performance Modeling Software


nnMAX Inference IP is nearing design completion. The nnMAX 1K tile will be available this summer for design integration in SoCs, and it can be arrayed to provide whatever inference throughput is desired. The InferX X1 chip will tape out late Q3 this year using 2x2 nnMAX tiles, for 4K MACs, with 8MB SRAM. The nnMAX Compiler is in development in parallel, and the first release is available now... » read more

Multi-Layer Processing Boosts Inference Throughput/Watt


The focus in discussion of inference throughput is often on the computations required. For example, YOLOv3, a power real time object detection and recognition model, requires 227 BILLION MACs (multiply-accumulates) to process a single 2 Mega Pixel image! This is with the Winograd Transformation; it’s more than 300 Billion without it. And there is a lot of discussion of the large size ... » read more

Inference Acceleration: Follow The Memory


Much has been written about the computational complexity of inference acceleration: very large matrix multiplies for fully-connected layers and huge numbers of 3x3 convolutions across megapixel images, both of which require many thousands of MACs (multiplier-accumulators) to achieve high throughput for models like ResNet-50 and YOLOv3. The other side of the coin is managing the movement of d... » read more

Use Inference Benchmarks Similar To Your Application


If an Inference IP supplier or Inference Accelerator Chip supplier offers a benchmark, it is probably ResNet-50. As a result, it might seem logical to use ResNet-50 to compare inference offerings. If you plan to use ResNet-50 it would be; but if your target application model is significantly different from Resnet-50 it could lead you to pick an inference offering that is not best for you. ... » read more

Lies, Damn Lies, And TOPS/Watt


There are almost a dozen vendors promoting inferencing IP, but none of them gives even a ResNet-50 benchmark. The only information they state typically is TOPS (Tera-Operations/Second) and TOPS/Watt. These two indicators of performance and power efficiency are almost useless by themselves. So what, exactly, does X TOPS really tell you about performance for your application? When a vendor ... » read more

High Neural Inferencing Throughput At Batch=1


Microsoft presented the following slide as part of their Brainwave presentation at Hot Chips this summer: In existing inferencing solutions, high throughput (and high % utilization of the hardware) is possible for large batch sizes: this means that instead of processing say one image at a time, the inferencing engine processes say 10 or 50 images in parallel. This minimizes the number of... » read more

Real-Time Object Recognition At Low Cost/Power/Latency


Most neural network chips and IP talk about ResNet-50 benchmarks (image classification at 224x224 pixels). But we find that the number one neural network of interest for most customers is real-time object recognition, such as YOLOv3. It's not possible to do comparisons here because nobody shows a YOLOv3 benchmark for their inferencing. But it's very possible to improve on the inferencing per... » read more

Reconfigurable eFPGA For Aerospace Applications


Market research reports indicate about 10% of all dollar revenue of FPGA chips is for use in aerospace applications, and DARPA/DoD reports indicate about one-third of all dollar volume of ICs purchased by U.S. aerospace are FPGAs. FPGAs clearly are very important for aerospace applications because of a combination of short development time and the long mission life of many aerospace applica... » read more

Flexible, Energy-Efficient Neural Network Processing At 16nm


At Hot Chips 30, held in August in Silicon Valley, Harvard University (Paul Whatmough, SK Lee, S Xi, U Gupta, L Pentecost, M Donato, HC Hseuh, Professor Brooks and Professor Gu) made a presentation on “SMIV: A 16nm SoC with Efficient and Flexible DNN Acceleration for Intelligent IOT Devices. ” (Their complete presentation is available now on the Hot Chips website for attendees and will be p... » read more

← Older posts