AI ASICs Will Become Increasingly Application-Specific

How to optimize performance and maximize ROI in new designs.

popularity

Back in 2017, I blogged about AI ASICs being not exactly ASICs. One of the primary reasons for not calling AI acceleration chips ASIC is because historically ASIC or Application Specific Integrated Circuit has referred to a fixed hardware block with limited programmability. AI ASICs on the other hand offer significant programming via frameworks such as Tensorflow and the point was that they are not exactly ASIC in that context.

Fast forward five years, it does seem as though AI ASICs are becoming more application-specific, albeit with more programmability than historical ASIC definition. What’s changed? Well, several events have happened since. Nvidia and Intel have raked in billions of dollars in chip sales. Both GPU and CPU have provided a stable, general-purpose architecture for AI acceleration with solid software base and a large number of developers. Everybody in the industry has started using GPUs and CPUs because they are the only chip solutions stable enough to be used for development and production. Nvidia and Intel have also poured in billions of dollars in enhancing software stack to make their solutions more amenable for AI acceleration. The result is that majority of the AI production systems today run on either CPU or GPU. With the exception of few solutions from Google and Amazon, they are the only architectures that can run almost all AI software and neural networks (NNs) today.

AI ASICs from start-ups in the meantime got delayed. Chips that were supposed to be in the market by 2018, took well until 2020. The start-ups continued to raise funds but had to focus on tape-out and getting the chip to the market rather than software stack. Rather than investing resources in supporting every AI application and neural networks, they picked on one or two that best demonstrated their chip and presented to customers and investors.

Neural networks also continued to evolve. CNNs, that were hot commodity in the early days of AI resurgence, got replaced by transformer architecture. LSTMs and RNNs started fading. Visual transformers are now showing promise of relacing CNNs even in the computer vision world. Architecture of these AI ASIC were to accelerate NN that was hot at the time of design start. A two year delay in hitting production meant it was possible to change the architecture for a new type of NN acceleration. So for these start-ups, it came down to modifying and adopting compilers for the new NNs on designed chip architecture.

Taking the multi billion parameter NN and optimizing it for a given architecture is even more challenging problem that designing the AI ASIC. The intricacies of compiler and optimized libraries turned out to be more difficult and capital intensive than was estimated by ASIC start-ups. So rather than trying to solve the acceleration every NN class, start-ups took a set of application and use cases and optimized it for their chip. This helped them focus and create optimized software chain without having to spend billions of dollars.

This is quite visible on the MLPerf page. If you look at the results, no start-up has submitted results for all classes of models, either because the results don’t look good or because in some cases NNs might not even run on the architecture.

Here are some examples of chips that are optimized to solve one or the other problem.

  • IBM is going after fraud detection via its Telum chip.
  • Neuchip is targeting recommendation engine with Facebook as a primary use case.
  • Hailo is focused on vision and so are many edge companies that are going after automotive or surveillance market.
  • D-matrix is betting on Transformer architecture being dominant and optimizing architecture.

In other words, rather than becoming a general-purpose ASIC to accelerate every NN type, these ASICs are evolving into more application specific nature. It will take several billion dollars of investment in software to make their stack to get to the quality of current market leaders. Given the amount of funding available, it makes sense for them to focus on NNs where they can provide best performance and maximize chances of monetization. After all, the race is about performance, being able to run as many NNs as possible in shortest amount of time.



Leave a Reply


(Note: This name will be displayed publicly)