SPONSOR BLOG

Impact Of IP On AI SoCs

Deep learning applications will call for specialized IP in the form of new processing and memory architectures.

August 9th, 2018 - By: Ron Lowman

The combination of mathematics and processing capability has set in motion a new generation of technology advancements with an entire new world of possibilities related to Artificial Intelligence. AI mimics human behavior using deep learning algorithms. Neural networks are what we define as deep learning, which is a subset of machine learning, which is yet a subset of AI, as shown in Figure 1. This is an important classification because it isn’t AI or more specifically machine learning that is changing the system-on-ship (SoC) architecture designs, but it is the subset known as deep learning that is.

Figure 1: AI mimics human behavior using deep learning algorithms

Deep learning is not only changing the makeup of SoCs but spawning a new generation of investments in the semiconductor market. Deep learning algorithmic models, such as convolutional neural networks (CNN), are heavily used in both the R&D community and commercial investments. CNNs have been the primary focus for machine vision. Models such as recurrent neural networks have seen applicability into natural language understanding because of its ability to recognize time.

AI applications
Deep learning neural networks are used in many different applications, giving powerful new tools to those who leverage them. For example, they enable advanced security threat analysis, predicting and preventing security breaches, as well as helping advertisers identify and streamline the sales process by predicting the process potential buyers follow. These are two examples of data center applications that run on server farms featuring the latest GPU and AI accelerator semiconductor technologies.

But AI designs are not contained within the data center. Many new functions such as vision systems for object and facial detection, natural language understanding for improved human machine interfaces, and context awareness enable an understanding of what activities are taking place based on a combination of sensor inputs. These deep learning capabilities are being added to SoCs in all markets, including automotive, mobile, digital home, data center, and Internet of Things (IoT), as shown in Figure 2.

Figure 2: AI capabilities have been added to a wide range of applications

The mobile phone utilizes neural networks for many of the AI functions described above. The phone is running a facial recognition app, an object identification app, a natural language understanding app. In addition, it is internally using neural networks for 5G self-organization as the wireless signals become denser, over many additional mediums, many different spectrums, and with differing priorities of the data transferred.

The human brain
Deep learning has only recently been made feasible via advancements in both mathematics and semiconductor hardware. There are several efforts to better replicate a human brain in next generation math models and semiconductor architectures. This is often referred to as neuromorphic computing. The human brain is incredibly efficient, and technology is only beginning to scratch the surface on replicating a human brain. The human brain incorporates over a petabyte of memory storage and is equivalent to about 540 trillion transistors at a power footprint less than 12 watts. At this point, replicating the brain is a stretch goal. However, the ImageNet challenge has progressed from the first back propagation CNN algorithm in 2012 to a more advanced AI model called ResNet 152 in 2015 that has an error rate better than humans. The market is moving quickly, with new algorithms published often and semiconductors rapidly integrating the needed features to outpace their competitors.

AI design challenges
There are several critical changes to SoC architectures where deep learning capabilities are incorporated. These design modifications impact both highly unique solutions and more general purpose AI SoC designs and include specialized processing needs, innovative memory architectures, and real-time data connectivity.

Specialized processing
SoCs adding neural network capability must accommodate both heterogeneous and massively parallel matrix multiplication. The heterogeneous component requires scalar, vector DSP, and neural network algorithm capabilities. Machine vision, for example, requires individual stages each of which require different types of processing, as shown in Figure 3.

Figure 3: Neural network capabilities require unique processing

The pre-processing requires more simple data-level parallelism. The precise processing of selected areas requires more complex data-level parallelism that can be efficiently tackled with dedicated CNN accelerators with good matrix multiplication capabilities. The decision-making stages can commonly be handled with scalar processing. Each application is unique, but what is clear is that heterogeneous processing solutions, which also include acceleration of neural network algorithms, are required to handle AI models efficiently.

Memory performance
AI models use a significant amount of memory, adding cost to the silicon. Training neural networks can require GBytes to 10s of GBytes of data, creating a need for the latest in capacity requirements offered in DDR. As an example, VGG-16, which is an image neural network, requires about 9GBytes of memory to train. A more accurate model, VGG-512, requires 89GBytes of data to train. To improve the accuracy of an AI model, data scientists use larger datasets. Again, this either increases the time it takes to train the model or increases the memory requirements of the solutions. Due to the massively parallel matrix multiplication required and the size of the models and number of coefficients needed, external memories are required with high bandwidth accesses. New semiconductor interface IP, such as High Bandwidth Memory (HBM2) and future derivatives (HBM2e), are seeing rapid adoption to accommodate these needs. Advanced FinFET technologies enabling larger arrays of SRAM on-chip and unique configurations with custom memory-to-processor and memory-to-memory interfaces are being developed to better replicate the human brain and address the memory constraints.

AI models can be compressed. This is a required technique to ensure the models can operate on constrained memory architectures found in SoCs at the edge in mobile phones, automobiles, and IoT applications. Compression is done using techniques called pruning and quantification without reducing the accuracy of the results. This enables traditional SoC architectures featuring LPDDR or, in some cases, no external memory to support neural networks. However, there are power consumption and other tradeoffs. As these models are compressed, the irregular memory access and irregular compute intensities increase, prolonging the execution time and latency of the systems. Therefore, system designers are developing innovative, heterogeneous memory architectures.

Real-time data connectivity
Once an AI model is trained and possibly compressed, it is ready to execute with real-time data through many different interface IP solutions. For example, vision applications are supported with CMOS-image sensors and connected via MIPI Camera Serial Interface (CSI-2) and MIPI D-PHY IP. LiDAR and radar can be supported via several technologies including PCI Express and MIPI. Microphones transmit voice data through connections such as USB, Pulse Density Modulation (PDM), and I2S. Digital televisions support HDMI and DisplayPort connections to transmit video content that can be improved after transmission with neural networks enabling super image resolution that produce higher quality pictures with less data. Many if not most TV manufacturers are looking at deploying this technology.

Hybrid AI systems are another concept that is expected to see more adoption. For instance, a heart rate algorithm identifies anomalies with AI, even false positives, on a fitness band that sends the information to the cloud for a more accurate in-depth AI neural network analysis of the anomaly for proper action. This type of technology has already been successfully deployed in the balancing of loads for electrical grids, especially in the case of downed power lines or unexpected heavy loads. To support a fast, reliable network to the cloud, Ethernet connectivity is required in aggregators in the above examples.

Addressing bottlenecks
Although, there is a long way to go to replicate the human brain, the human brain has been used as an effective model to build AI systems and continues to be modeled by leading research institutions worldwide. The newest neural networks attempt to copy its efficiency and computing capabilities. SoC architectures are also just beginning to replicate human brains by tightly coupling the processors and memory. ARC Subsystems include the processing capabilities needed for AI with their APEX extensions and pervasive RISC architecture. The Subsystems tightly couple both the peripherals and the memories to the processor to address the critical memory bottlenecks.

DesignWare IP for AI
AI, specifically deep learning neural networks, is a once in a lifetime technology development. It has been fast-tracked from a combination of innovations in neural network algorithms and innovations in high-bandwidth, high-performance semiconductor designs.

Synopsys is working with many of the leading providers of AI SoCs across the world in every market segment. This experience has proven valuable for the adoption of proven, reliable IP solutions that lower risk, expedite time-to-market and enable critical differentiations for AI designers.

Synopsys provides many specialized processing solutions, with options from memory interface IP to on-chip SRAM compilers with TCAMs and multi-port memory to address memory bottlenecks, and a full portfolio of connectivity options for real-time data. These IP solutions are critical components to next generation AI designs.

Ron Lowman

(all posts)
Ron Lowman is the strategic marketing manager for IoT at Synopsys. He is responsible for driving the IoT strategic initiatives, working closely with many of the IP product marketing managers. Prior to joining Synopsys, Lowman spent 16 years at Freescale within its MCU division. His background includes stints in strategy, business development, product marketing, and engineering roles supporting IC test for automotive engine controllers, and factory automation and controls design. He holds a Bachelor of Science degree in Electrical Engineering from The Colorado School of Mines and a Master’s degree in Business Administration from the University of Texas in Austin.

2 comments

Felix says:

August 10, 2018 at 1:55 pm

IP is for
— Intellectual Property
or for
— Internet Protocol ??

Please define your terminology

Ed Sperling says:

August 10, 2018 at 4:59 pm

Intellectual property, but that does get confusing, particularly when it’s connected to something over the Internet.

Impact Of IP On AI SoCs

Ron Lowman

2 comments

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

3D-IC For The Masses

Chiplets Add New Power Issues

Development Flows For Chiplets

New Data Center Protocols Tackle AI

Chiplet Tradeoffs And Limitations

Implementing AI Activation Functions

Die-to-die Interconnect Standards In Flux

Sponsors

Recent Comments

About

Navigation

Connect With Us

Impact Of IP On AI SoCs

Ron Lowman

2 comments

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

3D-IC For The Masses

Chiplets Add New Power Issues

Development Flows For Chiplets

New Data Center Protocols Tackle AI

Chiplet Tradeoffs And Limitations

Implementing AI Activation Functions

Die-to-die Interconnect Standards In Flux

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored