AI: Where’s The Money?

What the market for AI hardware might look like in 2025.

popularity

A one-time technology outcast, Artificial Intelligence (AI) has come a long way. Now there’s groundswell of interest and investments in products and technologies to deliver high performance visual recognition, matching or besting human skills. Equally, speech and audio recognition are becoming more common and we’re even starting to see more specialized applications such as finding optimized physical design options in semiconductor layout. We’re overwhelmed by possibilities, but what is often less clear is where the money is really going. What is aspiration, what is hype and what is reality?


Source: McKinsey and Co., with Arteris IP market segment overlay

There are multiple ways to slice this question, such as dividing by applications or implementation choices. At Arteris IP, we have a unique view because our interconnect technology is used in many custom AI designs which, as we’ll see, are likely to dominate the space. Combining this view with recent McKinsey analyses provides some interesting and, in some cases, surprising insights.

Start with a striking McKinsey estimate that growth in the semiconductor market from 2017 to 2025 will be dominated by AI semiconductors at 5X higher CAGR than all other semiconductor types combined. Whatever you may think of the role of AI in our future, not playing in this segment is a hard sell. There’s a Tractica survey which breaks this growth down further by implementation platforms: CPU versus GPU, FPGA and ASIC. In 2019, CPU-based platforms start at about $3B, growing to around $12B in 2025. GPU-based systems start near $6B in 2019 and grow to around $20B in 2025. The FPGA contribution is pretty small, maybe around $1B in 2025. But the ASIC segment grows from ~$2B in 2019 to around $30B in 2025. ASIC implementations of AI will overtake even GPU-based AI in dollar volume by around 2022.


Source: Tractica, with Arteris IP overlay

The implementation breakdown shouldn’t be too surprising. CPU-based platforms will work well for low-cost, low-performance applications – a smart microwave – where system designers don’t want to deal with non-standard processing. GPUs made the AI revolution real and will continue to be important in relatively high-performance datacenter training where power and cost are not a concern, also in prototypes for emerging applications like robotics and augmented reality headsets. But for anyone looking for battery-powered high-performance and low-cost at volume, or the ultimate in differentiated performance and capability in mega-datacenters where cost is not a concern, ASIC is (and always has been) the best solution.

Common wisdom assumes that datacenter AI is predominantly about training machine learning to recognize patterns using large training sets, and AI at the edge is predominantly about inference, using those trained networks in a target application. The reality is more complicated. If you break down training and inference against datacenter and edge, training in the datacenter is certainly a big market, growing from $1B to $5B between 2017 and 2025 per McKinsey, but dominated by just a few very big players. Training on the edge is a very small (but non-zero) market, maybe ~1B in 2025, to support say voice training in a car when out of communication range.

Inference on the edge is, of course, a big market with many players, growing from essentially zero to ~$5B in 2025; this is where we routinely expect most of the action. The real surprise is inference in the datacenter, in 2017 already at ~$5B and expected to grow to ~$10B in 2025; this market also has many players. What drives this? We tend to think of newer applications like public surveillance and facial recognition, but the most common uses are in the financial industry. In fact, credit card companies were some of the first organizations to make commercial use of machine learning. Ever get junk mail from your credit card company offering you a higher credit limit right after you just made a big purchase? Or have they ever turned off your card after you just bought a pair of expensive sneakers and $5 worth of gas? You have AI to thank for that. These kinds of inference in the datacenter may well be the dominant $$ driver for AI.

Now let’s look at chip architecture. On the edge, we see each application tuned to just a few use-cases, often with tight latency requirements, and an SoC architecture tightly optimized to execute these use cases. This requires custom processing elements (often many types, and many of each), and highly-customized on-chip data flows. As the numbers and types of processing elements in these chips have been growing, there’s been an increase in demand for cache coherence within the AI core to connect them all (Arteris IP-based implementations use the Ncore interconnect for this purpose). There’s also increased need for tight integration between the accelerator core and the rest of the SoC design. The hardware architecture for these kinds of designs can become complex, but that can greatly simplify the software by handing off more of the complexity in these tricky AI algorithms to the hardware. Since vehicles have become the most important AI edge devices for innovating and proving new technologies, it shouldn’t be surprising that we see AI demands commonly combined with functional safety. In fact, we see this not only in cars, trucks and other vehicles but increasingly also in robots and drones.

The implementation needs in the datacenter are quite different and are also somewhat different between training and inference. Datacenter service providers want high throughput through multiple lanes of neural-net engines and don’t to want to tune applications to a specific job. They want ultra-high-performance general-purpose AI solutions, using a common set of hardware, so are trending more and more to spatially-distributed mesh-architectures using homogenous processing elements, organized in regular topologies like grids, rings and tori.

We tend to see homogenous mesh approaches for training, supporting the general-purpose style mentioned earlier. In datacenter inference, we more commonly see heterogenous meshes with tactically embedded cache memories, again in support we believe of somewhat more targeted applications.

Also, regarding architecture, bandwidth to off-chip/die memory remains a big limiter. We see HBM2 being adopted pretty quickly for this reason, although GDDR6 is also getting a lot of attention; if it works for your needs it can be much cheaper than HBM2. The logic designs themselves can be huge in these mesh architectures, pushing to and beyond full reticle size. This is generating growing interest in open communication interfaces, such as CCIX, OPENCAPI and GEN-Z, between chips or die.

My takeaways:

  • Custom AI will dominate all other platforms, not just at the edge but also in datacenters.
  • Inference will, no surprise, be the biggest $$ contributor in AI but, big surprise, datacenter-based inference will generate more revenue than inference at the edge.
  • Architectures at the edge will demand cache-coherence with AI in tightly-integrated SoC designs, whereas architectures in the cloud will lean more heavily to spatially-distributed configurations.
  • Also, in the cloud, memory bandwidth is driving more adoption of HBM2 and GDDR6, again not a surprise, but multi-die architectures will also be pushed by the sheer size of spatially-distributed structures.

Bottom line: AI is big, but there is no such thing as a “standard AI chip.” Optimal chip architectures differ according to the types of functions that must be executed, where they must be performed, and within what amount of time and power budget.



Leave a Reply


(Note: This name will be displayed publicly)