Software Is At Least As Important As Hardware For Inference Accelerators

In articles and conference presentations on Inference Accelerators, the focus is primarily on TOPS (frequency times number of MACs), a little bit on memory (DRAM interfaces and on chip SRAM), very little on interconnect (also very important, but that’s another story) and almost nothing on the software! Without software, the inference accelerator is a rock that does nothing. Software is wha... » read more

Where Is The Edge AI Market And Ecosystem Headed?

Until recently, most AI was in datacenters and most was training. Things are changing quickly. Projections are AI sales will grow rapidly to $10s of billions by the mid 2020s, with most of the growth in Edge AI Inference. Edge inference applications Where is the Edge Inference market today? Let’s look at the markets from highest throughput to lowest. Edge Servers Recently Nvidia annou... » read more

AI’s Impact On Power And Performance

AI/ML is creeping into everything these days. There are AI chips, and there are chips that include elements of AI, particularly for inferencing. The big question is how well they will affect performance and power, and the answer isn't obvious. There are two main phases of AI, the training and the inferencing. Almost all training is done in the cloud using extremely large data sets. In fact, ... » read more

Implementing Low-Power Machine Learning In Smart IoT Applications

By Pieter van der Wolf and Dmitry Zakharov Increasingly, machine learning (ML) is being used to build devices with advanced functionalities. These devices apply machine learning technology that has been trained to recognize certain complex patterns from data captured by one or more sensors, such as voice commands captured by a microphone, and then performs an appropriate action. For example,... » read more

Modeling AI Inference Performance

The metric in AI Inference that matters to customers is either throughput/$ for their model and/or throughput/watts for their model. One might assume throughput will correlate with TOPS, but you’d be wrong. Examine the table below: The Nvidia Tesla T4 gets 7.4 inferences/TOP, Xavier AGX 15 and InferX 1 34.5. And InferX X1 does it with 1/10th to 1/20th of the DRAM bandwidth of the ... » read more

Making Sense Of ML Metrics

Steve Roddy, vice president of products for Arm’s Machine Learning Group, talks with Semiconductor Engineering about what different metrics actually mean, and why they can vary by individual applications and use cases. » read more

Advantages Of BFloat16 For AI Inference

Essentially all AI training is done with 32-bit floating point. But doing AI inference with 32-bit floating point is expensive, power-hungry and slow. And quantizing models for 8-bit-integer, which is very fast and lowest power, is a major investment of money, scarce resources and time. Now BFloat16 (BF16) offers an attractive balance for many users. BFloat16 offers essentially t... » read more

VC Perspectives On An AI Summer

It’s been a busy summer for Applied Ventures. Our team has had many interactions in the startup and investing space, and added some new companies to our portfolio. I’ll be sharing highlights of these activities in a series of upcoming blogs, but first I’d like to reflect on current market developments in machine learning and how they are affecting VC investment patterns. Strategic inve... » read more

AI Inference Memory System Tradeoffs

When companies describe their AI inference chip they typically give TOPS but don’t talk about their memory system, which is equally important. What is TOPS? It means Trillions or Tera Operations per Second. It is primarily a measure of the maximum achievable throughput but not a measure of actual throughput. Most operations are MACs (multiply/accumulates), so TOPS = (number of MAC units) x... » read more

TOPS, Memory, Throughput And Inference Efficiency

Dozens of companies have or are developing IP and chips for Neural Network Inference. Almost every AI company gives TOPS but little other information. What is TOPS? It means Trillions or Tera Operations per Second. It is primarily a measure of the maximum achievable throughput but not a measure of actual throughput. Most operations are MACs (multiply/accumulates), so TOPS = (number of MAC... » read more

← Older posts