Providing An AI Accelerator Ecosystem

Using high-level synthesis along with IP libraries and toolkits to speed hardware accelerator development.

popularity

A key design area for AI systems is the creation of Machine Learning (ML) algorithms that can be accelerated in hardware to meet power and performance goals. Teams designing these algorithms find out quickly that a traditional RTL design flow will no longer work if they want to meet their delivery schedules. The algorithms are often subject to frequent changes, the performance requirements may evolve, and aspects of integrating into the target platform may change late in the design cycle. To complicate matters, the design team needs to recode the RTL to explore power, performance, and area trade-offs. Whether it is changing requirements or design space exploration, every change to the source design requires restarting the entire design and verification process, causing either unacceptable delays in the production schedule or missed opportunities to explore and create better hardware (Figure 1).


Figure 1: Changes cause delays in the RTL flow.

Instead, these teams turn to a high-level synthesis flow, such as that offered by the Catapult HLS Platform, for designing and verifying ML accelerators and connecting them to systems. The platform provides a complete design and verification flow from C++ to generated power and process technology optimized RTL.

Introducing the AI accelerator ecosystem
The Catapult HLS Platform provides a proven tool flow for IC designers. But, Mentor has taken a big step farther and offers an AI accelerator ecosystem (Figure 2) that provides AI designers with an environment to jumpstart projects.


Figure 2: The AI Accelerator Ecosystem.

IP library built on specialized data types
The ecosystem includes a rich IP library that enables a faster path to hardware acceleration by providing easy-to-understand, high-quality, fundamental building blocks that can be synthesized into either FPGA, eFPGA, or ASIC technologies. This library is built on a foundation of specialized data types for arbitrary length integer, fixed-point, floating-point, and complex numbers that allows code to synthesize into hardware with the same bit-for-bit behavior as the C++ model. The IP library includes:

  • Algorithmic C Math library: defines synthesizable C++ functions for math operators typically found in the standard C++ math.h header as well as a C++ matrix class and linear algebra functions.
  • The Algorithmic C DSP library: defines synthesizable C++ functions that DSP designers typically require, such as filters and Fast Fourier Transforms (FFT).
  • The Algorithmic C Image Processing Library: starts with a definition of some common pixel format type definitions. These types utilize C++ template parameters to configure the color depth and format (RGB, YUV). The library then provides various function blocks useful for image processing including color conversion, boundary processing, and windowing classes for use in 2-D convolution.

Toolkits
The AI accelerator ecosystem provides toolkits that are real-world, tested examples of accelerator-based reference designs that teams can study, modify, and copy to jumpstart projects. These kits include configurable C++/SystemC IP source code, documentation, testbenches, and scripts to move the design through the HLS synthesis and verification flow. The toolkits demonstrate various approaches and coding techniques for experimenting with tradeoffs for performance (latency), frame rates, area, or power. The toolkits currently available include:

  • Pixel-pipe video processing toolkit: demonstrates a live image processing application using a pixel-pipe accelerator. The block down-scales the image, converts it from color to monochrome in order to perform edge detection, and then up-scales the image.
  • 2-Convolution toolkit: demonstrates how to code an Eyeriss processing element (PE) array in C++ that implements a 2-D convolution to perform image enhancement (sharpen, blur, and edge-detect). The processing element can perform a 3×1 multiply accumulate (convolution). Stacking the processing elements vertically yields a 3×3 convolution where the kernel weights determine the image enhancement performed.
  • tinyYOLO object classification toolkit: demonstrates an object classification application using a convolution accelerator engine implemented with the PE array from the 2-D Eyeriss toolkit. It is based on the tinyYOLO (“You Only Look Once”) neural network architecture and it includes a video preprocessing block to scale the image before executing object classification. The toolkit shows how to obtain high-speed data routing through AXI4 interconnect and demonstrates how to define a high-performance memory architecture. The toolkit provides an integration to TensorFlow for inference testing of the network layers implemented in C++.

System integration
An accelerator block does not live in isolation; it needs to be connected to a system. Catapult HLS offers Interface Synthesis to add a timed protocol to untimed C++ function interface variables. Designers simply need to set architectural constraints for the protocol in the Catapult GUI. The tool supports typical protocols such as AXI4 video stream, request/acknowledge handshaking, and memory interfaces. This allows designers to explore interface protocols without changing the C++ source.

To support system integration, the AI accelerator ecosystem provides a set of fully-functional design examples:

  • AXI examples: show how to instantiate one or more accelerator components within an AXI SoC subsystem using the AXI interface IP that Catapult HLS generates. Master, slave, and streaming examples are available.
  • Base processor example: shows how to connect a ML accelerator into a complete processor-based system and it employs the AXI examples. The ML accelerator in this example employs a simple multiply/accumulate architecture with 2-D convolution and max pooling. Several 3rd-party processor IP models are supported and a software flow (with associated data) is included for bare metal programing.

Getting help
Design teams might find that they need help with their projects and the ecosystem provides assistance in several ways. Some examples include:

  • On-demand training: delivers a structured, self-paced, online training experience to learn how to efficiently use the tools.
  • Seminars: are offered live around the world and then captured on video for access online anytime.
  • Consulting: provides onsite help for getting the tool flow integrated into the team’s environment as well as help for designing C++ algorithms for optimal synthesis.

Combining the AI accelerator ecosystem with high-level synthesis allows teams to launch their AI projects. The ecosystem’s wide range of offerings inspires a team at any level of experience to select what they need to get their projects to market faster.

To learn more about the AI accelerator ecosystem, see this whitepaper.



Leave a Reply


(Note: This name will be displayed publicly)