Bandwidth, not compute power, is emerging as the major bottleneck in many AI applications.
A deep neural network (DNN) is a system that is designed similar to our current understanding of biological neural networks in the brain. DNNs are finding use in many applications, advancing at a fast pace, pushing the limits of existing silicon, and impacting the design of new computing architectures. Figure 1 shows a very basic form of neural network that has several nodes in each layer that are connected to all other nodes in adjacent layers. Each node performs a simple computation on the sum of the inputs to that node. Because of the high level of connectivity between layers of the neural network, scaling this type of neural network to much larger sizes means that more storage is needed to hold the parameters of the network. In addition, higher memory and interface bandwidths are required to load, store, and update these model parameters, as well as the large number of examples used to train the neural network.
Figure 1: Simple neural network example.
The sprint towards custom Si solutions
As tasks for deep learning become more complicated and decisions require higher accuracy, these neural networks can grow in size dramatically. Each layer can be multi-dimensional in order to capture and analyze many aspects of the input data. Figure 2 shows an example of how a deep neural network for facial recognition might look.
Figure 2: Schematic representation of a deep neural network. Source: NVIDIA Dev Blog
Many companies are currently using hardware acceleration through FPGAs and GPUs to build neural networks. These accelerators are more efficient than general purpose CPUs at computing the mathematical functions required throughout the network. Custom solutions like the Google TPU, which is tailored for TensorFlow, provides a double digit multiple improvement in performance per watt than CPUs or GPUs.
Combining the custom silicon gains with the growing size of neural networks, and it is clear that there is a lot of silicon business to capture. Due to this, there is a spike in companies who are quickly innovating and developing solutions for both inference and learning.
The emerging bottleneck: Bandwidth and interconnects
Custom silicon allows designers to optimize performance and power consumption and to match silicon designs with the most effective interface and memory solutions for the task at hand. In many AI applications, the ability to perform computations is not the current bottleneck, but rather it is the bandwidth and availability of data to be computed. As previously detailed by Steve Woo, memory systems play a key role in neural network chip design and there are multiple options that cater to specific needs on both the learning and inference side of deep learning applications. Between on-chip memory, HBM (which uses an interposer), and more traditional PCB-based memories like GDDR and DDR, choosing the right solution for a given system depends on evaluating the intended use cases and the various design tradeoffs.
When training a neural network, there are a few factors that drive the need for high-bandwidth memory solutions. The first is the quantity and quality of input data. When doing facial recognition for example, the volume of high-quality images that must be presented for learning in order to teach a neural network to identify different individuals’ faces is extremely large. The second is the need to continuously retest and train, as well as verify results with new datasets in order to keep error rates low. Depending on the application, new data can arrive very frequently and require constant learning. Another driver for bandwidth is the actual size of the model at hand. When models include many layers and nodes there is a need for high memory bandwidths and interface speeds to keep the neural network learning and inferencing at peak speeds.
On the inference side of the equation, inputs to the system themselves can be immensely large. For example, applications like ADAS or fully autonomous vehicles have many high-resolution sensors/cameras that are constantly requiring analysis for real-time decision making.
Conclusion
With applications like autonomous vehicles, gene decoding, voice recognizing virtual assistants and many others in-between, deep learning has proven to be a useful, and it is here to stay. This broad and seemingly endless opportunity is why deep learning is a key driver of custom silicon and new system architectures that utilize hardware acceleration and advanced neural networks. A key driver of success for these applications today and in the future is providing enough memory and interconnect performance in order to keep accelerators and neural networks operating at top speed. Choosing the right memory and interface solutions is critical and involves understanding both the use cases as well as the design tradeoffs associated with solution.
Leave a Reply