The Emergence Of Hardware As A Key Enabler For The Age Of Artificial Intelligence

The design-by-optimization paradigm calls for a new way of looking at how hardware and software interact.


Over the past few decades, software has been the engine of innovation for countless applications. From PCs to mobile phones, well-defined hardware platforms and instruction set architectures (ISA) have enabled many important advancements across vertical markets.

The emergence of abundant-data computing is changing the software-hardware balance in a dramatic way. Diverse AI applications in facial recognition, virtual assistance, autonomous vehicles and more are sharing a common feature: They rely on hardware as the core enabler of innovation. Since 2017, the AI hardware market has grown 60-70% annually, and is projected to reach $65 billion by 2025.

Source: Tractica

What are the key drivers of growth in AI hardware, and how will they shape markets in the next years?

To answer this question, it is important to start with a key observation: software is changing. In traditional software design, computer scientists focused on devising algorithmic approaches to match specific problems and implemented them in a high-level procedural language. Some algorithms could be threaded to leverage available hardware, but massive parallelism remained elusive due to the implications of Amdahl’s Law.

Effect of Amdahl’s Law on speedup as a fraction of clock cycle time in serial mode
John L. Hennessy, David A. Patterson, 10.1145/3282307

Today’s advent of big data and connected-everything are enabling design by optimization. It is a new paradigm where data scientists use inherently parallelized computing systems, such as neural networks (NN), to ingest massive amounts of data and train themselves through iterative optimization. By adjusting a NN architecture and varying its parameter values, data scientists can create empirical (i.e. statistical) software solutions for problems that have no practical algorithmic (or polynomial-time) implementations. Importantly, these solutions are inherently parallelizable and can break free of the constraints of traditional algorithmic design.

Here is where hardware becomes a critical component of the solution.

Cloud AI chips are enabling abundant-data computing in data centers

Needless to say, our traditional workhorses for executing software – standardized Instruction Set Architectures (ISA) that have dominated the computing world for several decades – are not designed for the new design-by-optimization paradigm. At the time of this writing, Megatron, the world’s largest transformer-based language NN for natural language processing (NLP), is a 8.3 billion parameter transformer language model with 8-way model parallelism and 64-way data parallelism (source: NVIDIA). This model is generally pre-trained on a dataset of 3.3 billion words. To execute such models, the recently launched NVIDIA A100 GPU was designed to deliver 312 teraFLOP of FP16 compute power. Another example, Google’s tensor processing unit (TPU), can be combined in Pod configurations that exceed 100 petaFLOPS of processing power for training NN models (source: Google).

AlexNet to AlphaGo Zero: A 300,000x Increase in Compute. Source: Open AI

Cloud AI designs are characterized by massive dimensions, multiple levels of physical hierarchy, locally-synchronous and globally-asynchronous architectures, and very fragmented floorplans.

New edge AI devices are driving an explosion of real-time abundant-data computing

These are circuits embedded in myriads of applications in automotive, consumer and enterprise robotics, drones, smart speakers, mobile phones, tablets, etc. Many of these applications impose real-time requirements with safety-critical implications. Autonomous navigation, for example, imposes a computational response latency limit of 20s. Voice and video assistants must understand spoken keywords in less than 10s and hand gestures in a few hundred milliseconds.

By 2021, more than 8 billion smartphones will have shipped AI processing engines —some as small as 1mm2. The edge will soon become the world’s largest computing environment, with devices already capable of pushing teraFLOPs through their trained neurons to meet the needs of interactive applications. By 2025, it is estimated that 70% of the world’s AI software will be run at the edge (source: Tractica). The estimate does not include “haze” devices like micro servers or edge routers.

Breadth of Edge AI applications.

Edge AI designs are generally small embedded engines but need to handle hundreds of design corners, extreme variability, ultra-low power requirements and heterogeneous integration (e.g. sensors).

Hardware design has become a core enabler of innovation for the age of AI. At the same time, it is presenting a unique set of challenges to its pioneers, with both cloud and edge segments pushing the limits of existing silicon technologies for performance, power, and area. This blog series, Designing AI Machines, will explore key design challenges and solutions for building AI processing hardware, from today’s AI accelerators to tomorrow’s cognitive systems.

Leave a Reply

(Note: This name will be displayed publicly)