Processing data from multiple cameras in real time is a challenge for surround-view systems.
Video applications, such as surveillance, object detection and motion analysis, rely on 360° embedded vision and high-resolution fish-eye cameras lenses with a wide-angle field of view (FOV). These systems have up to six real-time camera streams processing together frame by frame. Each frame is corrected for distortion and other image artifacts, adjusted for exposure and white balance, then stitched together dynamically into a single 360° panoramic view. Output at 4K 60 fps is projected on a spherical coordinate space.
Storing and accessing multiple-camera input data to and from external memory in real time and then processing it as a single frame is a vexing snarl for surround-view camera systems. The hardware needs to operate within one frame of latency between incoming raw sensor data from the input cameras and the stitched output video.
High-performance computing platforms are moving toward the use of FPGAs combined with CPUs to provide specialized hardware acceleration for real-time image processing tasks. In this configuration, CPUs focus on complex algorithms where they quickly switch threads and contexts, while repetitive tasks are relegated to an FPGA to function as a configurable hardware accelerator/coprocessor/offload engine.
Using FPGAs and CPUs as discrete devices increases overall efficiency to systems because the technologies fit together. For example, because images obtained from fish-eye lenses suffer from distortion, the stitching operation involving multiple cameras is a compute-intensive, per-pixel task and requires significant real-time image processing and a highly parallelized architecture. This application outstrips the ability of FPGAs to perform this role, primarily due to delays in moving data on and off chip, impacting the overall latency, throughput and performance of the system.
In a recent development, embedded vision designers are looking to eFPGAs as a practical solution. An eFPGA IP can be embedded along with a CPU in an SoC because its fabric offers unique advantages, including higher performance when compared to a standalone FPGA plus CPU solution.
Low latency is important for complex real-time processing of images –– when correcting fish-eye distortion, for example. An eFPGA is directly connected with no I/O buffers to the ASIC through a wide parallel interface for higher throughput with latency counted in single-digit clock cycles.
Another advantage of eFPGAs is that they can be sized to meet the specific application. For instance, Speedcore eFPGA IP users specify their logic, memory and DSP resource needs and the IP is configured to meet their individual requirements. Look-up-tables (LUTs), RAM blocks and DSP64 blocks can be assembled like building blocks to create an optimal programmable fabric for any given application.
Additionally, users can define their own custom functions to be included in the eFPGA fabric. They are integrated into the logic fabric alongside traditional building blocks, increasing the capability of the eFPGA by adding functions optimized to decrease area and/or increase performance of targeted applications, especially for embedded vision and image processing algorithms.
Implementing “you only look once” (YOLO), a state-of-the-art, real-time object detection algorithm using neural networks that increases performance over earlier methods is an example of how custom blocks enable high-performance image processing. This algorithm relies on a large number of matrix multipliers. When implemented in an FPGA, these matrix multipliers are built using DSP and RAM blocks.
A problem arises in the mismatch between the optimal configuration of DSP and RAM blocks needed by YOLO versus what is found in a typical FPGA fabric. Perhaps an FPGA fabric offers DSP blocks with 18 × 27 multiplication/accumulation and 32 × 128 RAMs. The optimal solution would be a fabric with 16 × 8 DSP blocks with 48 × 1024 RAMs. By creating custom blocks that implement the optimal DSP and RAM block configurations, the Speedcore fabric uses 40% less die area to implement the same functionality as well as achieving a higher level of system performance.
Embedding FPGA fabrics in SoCs provides two additional system-level benefits:
Ultra-low latency and real-time processing is driving the need for efficient implementation of 360° view vision-based systems. eFPGAs with custom blocks working with a CPU in the same host SoC are well suited to implement dedicated functionality such as object detection and image recognition, warping and distortion correction and stitching together final images.
Leave a Reply