Memory Subsystems In Edge Inferencing Chips


Geoff Tate, CEO of Flex Logix, talks about key issues in a memory subsystem in an inferencing chip, how factors like heat can affect performance, and where these kinds of chips will be used. » read more

AI Inference Memory System Tradeoffs


When companies describe their AI inference chip they typically give TOPS but don’t talk about their memory system, which is equally important. What is TOPS? It means Trillions or Tera Operations per Second. It is primarily a measure of the maximum achievable throughput but not a measure of actual throughput. Most operations are MACs (multiply/accumulates), so TOPS = (number of MAC units) x... » read more

Do Large Batches Always Improve Neural Network Throughput?


Common benchmarks like ResNet-50 generally have much higher throughput with large batch sizes than with batch size =1. For example, the Nvidia Tesla T4 has 4x the throughput at batch=32 than when it is processing in batch=1 mode. Of course, larger batch sizes have a tradeoff: latency increases which may be undesirable in real-time applications. Why do larger batches increase throughput... » read more

Inference Acceleration: Follow The Memory


Much has been written about the computational complexity of inference acceleration: very large matrix multiplies for fully-connected layers and huge numbers of 3x3 convolutions across megapixel images, both of which require many thousands of MACs (multiplier-accumulators) to achieve high throughput for models like ResNet-50 and YOLOv3. The other side of the coin is managing the movement of d... » read more