Making AI Run Faster

Why inferencing is the next battleground.


The semiconductor industry has woken up to the fact that heterogeneous computing is the way forward and that inferencing will require more than a GPU or a CPU. The numbers being bandied about by the 30 or so companies working on this problem are 100X improvements in performance.

But how to get there isn’t so simple. It requires four major changes, as well as some other architectural shifts.

The first piece of this puzzle is the processor. This is an obvious one, and it’s what most people think about when it comes to improved performance. The problem on the processor side is that while it’s possible to get more than 100X performance, the algorithms are changing too quickly to build a blazing fast ASIC. AI chips require programmability, which has an impact on performance, and if the algorithms change substantially it puts greater reliance on the programmable portion and less on the hard-wired logic.

That leads to the second piece, which is the software—or in this case, the inferencing data that is weighted in the training phase of AI. There was some debate in the industry about whether the algorithms today will mature and stabilize, or whether algorithms will be constantly in flux. The current thinking is they will be updated forever, just as software is patched today.

What will change, though, is visibility into those algorithms. Today, these are basically black boxes. That creates a problem for companies selling AI in their products, because if something goes wrong they have no way to see what caused the error. AI inferencing adapts to usage models, which is one of the big attractions of this technology, but companies need to understand how it has adapted and why. In addition, they need a way to ensure the adaptation falls within acceptable parameters of behavior for whatever they’re selling.

That requires a change in the data structure, and this leads to the third puzzle piece—how that data interacts with memories. In an AI chip, processors and localized memories are scattered around a die or a package. But the data also can be stored as patterns rather than individual bits, and it can be read or written in four directions—up, down, left and right. Condensing data into patterns also can help on the processing side, where more data can be processed per cycle.

The fourth piece involves throughput, which involves both the movement of data between processors and memories. Most chipmakers are trying to limit on-chip data movement, which is the idea behind multiple processors and memories. Between chips is another matter, and that is where advanced packaging can make a big difference because it improves the bandwidth and shortens the distance that signals need to travel.

Neural networks can help significantly in applications where there are multiple sensors, such as semi-autonomous or fully autonomous vehicles. CNNs are particularly well-suited to computer vision. Recurrent neural networks are essential where the time dimension is a critical factor, such as security or mil/aero applications. But in all cases, the data being collected needs to be scrubbed down to what is useful as quickly as possible, and that’s where performance really tends to get bogged down. At this point, there is no obvious solution for that.

The other place where performance gets bogged down is trying to figure out what is accurate enough for an accepted behavioral distribution, and how to limit that accuracy when it is not necessary. Sparser data with 80% accuracy runs a lot faster than a more complete data set with 99.9% accuracy, and it requires far less energy to do the processing.

All of these factors play a role in speeding up inferencing chips, and no single change will make AI systems run significantly faster. But put all of the pieces together correctly and 100X improvements in performance at the same or less power could well prove to be conservative estimates. That could have a big impact on where and how AI is used, and which companies will be the next rising stars.

Related Stories
AI Architectures Must Change
Using the Von Neumann architecture for artificial intelligence applications is inefficient. What will replace it?
Security Holes In Machine Learning And AI
A primary goal of machine learning is to use machines to train other machines. But what happens if there’s malware or other flaws in the training data?
Using ASICs For AI Inferencing
How to speed up artificial intelligence in edge devices and data centers.

Leave a Reply

(Note: This name will be displayed publicly)