Apples, Oranges & The Optimal AI Inference Accelerator

Pay attention to these key areas to determine the right accelerator for your needs.


There are a wide range of AI inference accelerators available and a wide range of applications for them.

No AI inference accelerator will be optimal for every application. For example, a data center class accelerator almost certainly will be too big, burn too much power, and cost too much for most edge applications. And an accelerator optimal for key word recognition won’t have the capability to handle more computationally intensive image CNNs.

Like Goldilocks, the optimal AI inference accelerator for your application needs to be “just right”… for you.

In our experience, many customers already have neural network models they are running and they are looking for more throughput on that model within their cost and power and size constraints.

Apples and oranges: Get the full picture

A recent product announcement touted a frame rate for a popular neural network model ten times faster than Nvidia Xavier at a tiny fraction of the power.

But you need to get the full picture.

If a frame rate is given, you can’t judge how impressive it is without knowing:

  • The image size.
  • Is it batch=1 for minimum latency or batch>1 for maximum throughput?
  • What numerics are being used (INT4, INT8, BF/FP16, …)?
  • Has the neural network model been altered or is the computation algorithm altered, and if so how and with what effect on accuracy: what post-training optmizations have been applied such as pruning, and how does the accelerator deal with sparsity, both at compile and runtime?
  • What predication accuracy is achieved, especially if the model or weights have been altered?

If power is given, you need to know:

  • What are the measurement conditions? Temperature, Voltage, Process.
  • What model is running?
  • Is it the power for the inference accelerator core? For the whole chip? Or the whole chip plus DRAM?

The right way to compare two accelerators

Get both vendors to benchmark your neural network model at your image size with your preferred numerics; if they in any way alter the model or the weights, get them to give you the impact on prediction accuracy.

Ask all of the questions above to determine the throughput and power for the operating conditions that matter for your application.

As well, get a demo: if the vendor can’t demo what they claim, that means something isn’t right! When you get a demo ask to see live streams not videos – if you are there in person put your hand in front of the camera to verify it’s real time; if you are on Zoom ask them to put their hand in front of the camera. The point is to verify that inference is actually happening in real time and you’re not watching something pre-canned and in some way modified to look better. Fresh fruit is better than canned fruit.

And get price information, in the same volumes, or if you can’t get that ask what the die size is and in what process to get some sense of relative cost.

The optimum accelerator that is “just right” for you will have the best throughput/$ and the best throughput/watt for your model at your image size at your target prediction accuracy.


Be careful not to jump to conclusions when you hear an impressive performance number without knowing all of the necessary data to judge it. The less information a vendor is giving, probably the more they are hiding.

Get all the data so you can be sure to pick the right fruit for you.

Leave a Reply

(Note: This name will be displayed publicly)