Customized Micro-Benchmarks For HW/SW Performance


Raw performance used to be the main focus of benchmarks, but they may have outlived their usefulness for many applications. Dana McCarty, vice president of sales and marketing for AI Inference Products at Flex Logix, talks about why companies need to develop and utilize their own specific models to accurately gauge hardware and software performance, which can be slowed by bottlenecks in I/O and... » read more

ResNet-50 Does Not Predict Inference Throughput For MegaPixel Neural Network Models


Customers are considering applications for AI inference and want to evaluate multiple inference accelerators. As we discussed last month, TOPS do NOT correlate with inference throughput and you should use real neural network models to benchmark accelerators. So is ResNet-50 a good benchmark for evaluating relative performance of inference accelerators? If your application is going to p... » read more

New Ways To Optimize Machine Learning


As more designers employ machine learning (ML) in their systems, they’re moving from simply getting the application to work to optimizing the power and performance of their implementations. Some techniques are available today. Others will take time to percolate through the design flow and tools before they become readily available to mainstream designers. Any new technology follows a basic... » read more

Software In Inference Accelerators


Geoff Tate, CEO of Flex Logix, talks about the importance of hardware-software co-design for inference accelerators, how that affects performance and power, and what new approaches chipmakers are taking to bring AI chips to market. » read more

Memory Subsystems In Edge Inferencing Chips


Geoff Tate, CEO of Flex Logix, talks about key issues in a memory subsystem in an inferencing chip, how factors like heat can affect performance, and where these kinds of chips will be used. » read more

AI Inference Memory System Tradeoffs


When companies describe their AI inference chip they typically give TOPS but don’t talk about their memory system, which is equally important. What is TOPS? It means Trillions or Tera Operations per Second. It is primarily a measure of the maximum achievable throughput but not a measure of actual throughput. Most operations are MACs (multiply/accumulates), so TOPS = (number of MAC units) x... » read more

Inference Acceleration: Follow The Memory


Much has been written about the computational complexity of inference acceleration: very large matrix multiplies for fully-connected layers and huge numbers of 3x3 convolutions across megapixel images, both of which require many thousands of MACs (multiplier-accumulators) to achieve high throughput for models like ResNet-50 and YOLOv3. The other side of the coin is managing the movement of d... » read more