Author's Latest Posts

Multi-Layer Processing Boosts Inference Throughput/Watt

The focus in discussion of inference throughput is often on the computations required. For example, YOLOv3, a power real time object detection and recognition model, requires 227 BILLION MACs (multiply-accumulates) to process a single 2 Mega Pixel image! This is with the Winograd Transformation; it’s more than 300 Billion without it. And there is a lot of discussion of the large size ... » read more

Inference Acceleration: Follow The Memory

Much has been written about the computational complexity of inference acceleration: very large matrix multiplies for fully-connected layers and huge numbers of 3x3 convolutions across megapixel images, both of which require many thousands of MACs (multiplier-accumulators) to achieve high throughput for models like ResNet-50 and YOLOv3. The other side of the coin is managing the movement of d... » read more

Use Inference Benchmarks Similar To Your Application

If an Inference IP supplier or Inference Accelerator Chip supplier offers a benchmark, it is probably ResNet-50. As a result, it might seem logical to use ResNet-50 to compare inference offerings. If you plan to use ResNet-50 it would be; but if your target application model is significantly different from Resnet-50 it could lead you to pick an inference offering that is not best for you. ... » read more

Lies, Damn Lies, And TOPS/Watt

There are almost a dozen vendors promoting inferencing IP, but none of them gives even a ResNet-50 benchmark. The only information they state typically is TOPS (Tera-Operations/Second) and TOPS/Watt. These two indicators of performance and power efficiency are almost useless by themselves. So what, exactly, does X TOPS really tell you about performance for your application? When a vendor ... » read more

High Neural Inferencing Throughput At Batch=1

Microsoft presented the following slide as part of their Brainwave presentation at Hot Chips this summer: In existing inferencing solutions, high throughput (and high % utilization of the hardware) is possible for large batch sizes: this means that instead of processing say one image at a time, the inferencing engine processes say 10 or 50 images in parallel. This minimizes the number of... » read more

Real-Time Object Recognition At Low Cost/Power/Latency

Most neural network chips and IP talk about ResNet-50 benchmarks (image classification at 224x224 pixels). But we find that the number one neural network of interest for most customers is real-time object recognition, such as YOLOv3. It's not possible to do comparisons here because nobody shows a YOLOv3 benchmark for their inferencing. But it's very possible to improve on the inferencing per... » read more

Reconfigurable eFPGA For Aerospace Applications

Market research reports indicate about 10% of all dollar revenue of FPGA chips is for use in aerospace applications, and DARPA/DoD reports indicate about one-third of all dollar volume of ICs purchased by U.S. aerospace are FPGAs. FPGAs clearly are very important for aerospace applications because of a combination of short development time and the long mission life of many aerospace applica... » read more

Flexible, Energy-Efficient Neural Network Processing At 16nm

At Hot Chips 30, held in August in Silicon Valley, Harvard University (Paul Whatmough, SK Lee, S Xi, U Gupta, L Pentecost, M Donato, HC Hseuh, Professor Brooks and Professor Gu) made a presentation on “SMIV: A 16nm SoC with Efficient and Flexible DNN Acceleration for Intelligent IOT Devices. ” (Their complete presentation is available now on the Hot Chips website for attendees and will be p... » read more

Sandia Labs’ New Configurable SoC

At DAC 2018, held in June in San Francisco, Sandia Labs made a public presentation for the first time describing its first SoC using eFPGA, called Dragonfly. This is the first public disclosure by any organization describing its requirements, architecture and use cases for the new technology option of embedded FPGA. John Teifel led the project for Sandia National Laboratories. Sandia has ... » read more

Reconfigurable AI Building Blocks For SoCs And MCUs

FPGA chips are in use in many AI applications today, including Cloud datacenters. Embedded FPGA (eFPGA) is now becoming used for AI applications as well. Our first public customer doing AI with EFLX eFPGA is Harvard University, who will present a paper at Hot Chips August 20th on Edge AI processing using EFLX: "A 16nm SoC with Efficient and Flexible DNN Acceleration for Intelligent IoT Devi... » read more

← Older posts