Advantages Of BFloat16 For AI Inference


Essentially all AI training is done with 32-bit floating point. But doing AI inference with 32-bit floating point is expensive, power-hungry and slow. And quantizing models for 8-bit-integer, which is very fast and lowest power, is a major investment of money, scarce resources and time. Now BFloat16 (BF16) offers an attractive balance for many users. BFloat16 offers essentially t... » read more

VC Perspectives On An AI Summer


It’s been a busy summer for Applied Ventures. Our team has had many interactions in the startup and investing space, and added some new companies to our portfolio. I’ll be sharing highlights of these activities in a series of upcoming blogs, but first I’d like to reflect on current market developments in machine learning and how they are affecting VC investment patterns. Strategic inve... » read more

AI Inference Memory System Tradeoffs


When companies describe their AI inference chip they typically give TOPS but don’t talk about their memory system, which is equally important. What is TOPS? It means Trillions or Tera Operations per Second. It is primarily a measure of the maximum achievable throughput but not a measure of actual throughput. Most operations are MACs (multiply/accumulates), so TOPS = (number of MAC units) x... » read more

TOPS, Memory, Throughput And Inference Efficiency


Dozens of companies have or are developing IP and chips for Neural Network Inference. Almost every AI company gives TOPS but little other information. What is TOPS? It means Trillions or Tera Operations per Second. It is primarily a measure of the maximum achievable throughput but not a measure of actual throughput. Most operations are MACs (multiply/accumulates), so TOPS = (number of MAC... » read more

Deep Learning Models With MATLAB And Cortex-A


Today, I’ve teamed up with Ram Cherukuri of MathWorks to provide an overview of the MathWorks toolchain for machine learning (ML) and the deployment of embedded ML inference on Arm Cortex-A using the Arm Compute Library. MathWorks enables engineers to get started quickly and makes machine learning possible without having to become an expert. If you’re an algorithm engineer interested ... » read more

Do Large Batches Always Improve Neural Network Throughput?


Common benchmarks like ResNet-50 generally have much higher throughput with large batch sizes than with batch size =1. For example, the Nvidia Tesla T4 has 4x the throughput at batch=32 than when it is processing in batch=1 mode. Of course, larger batch sizes have a tradeoff: latency increases which may be undesirable in real-time applications. Why do larger batches increase throughput... » read more

Machine Learning Drives High-Level Synthesis Boom


High-level synthesis (HLS) is experiencing a new wave of popularity, driven by its ability to handle machine-learning matrices and iterative design efforts. The obvious advantage of HLS is the boost in productivity designers get from working in C, C++ and other high-level languages rather than RTL. The ability to design a layout that should work, and then easily modify it to test other confi... » read more

Multi-Layer Processing Boosts Inference Throughput/Watt


The focus in discussion of inference throughput is often on the computations required. For example, YOLOv3, a power real time object detection and recognition model, requires 227 BILLION MACs (multiply-accumulates) to process a single 2 Mega Pixel image! This is with the Winograd Transformation; it’s more than 300 Billion without it. And there is a lot of discussion of the large size ... » read more

AI: Where’s The Money?


A one-time technology outcast, Artificial Intelligence (AI) has come a long way. Now there’s groundswell of interest and investments in products and technologies to deliver high performance visual recognition, matching or besting human skills. Equally, speech and audio recognition are becoming more common and we’re even starting to see more specialized applications such as finding optimized... » read more

Inference Acceleration: Follow The Memory


Much has been written about the computational complexity of inference acceleration: very large matrix multiplies for fully-connected layers and huge numbers of 3x3 convolutions across megapixel images, both of which require many thousands of MACs (multiplier-accumulators) to achieve high throughput for models like ResNet-50 and YOLOv3. The other side of the coin is managing the movement of d... » read more

← Older posts Newer posts →