Leveraging The Power Of VDMA Engines For Computer Vision Apps

Part 1: An in-depth overview of FPGA-based use cases for reference designs.


It’s pretty hard to overestimate the role of heterogeneous embedded systems based on Xilinx Zynq-7000 All-Programmable devices in tasks like computer vision. Many consumer electronics and specialized devices are emerging to facilitate and improve industries such as medical, automotive, security, and IoT.

The combination of high-performance ARM application processing and Xilinx programmable FPGA fabric turns the whole development process into a series of easy and efficient design steps. Computer vision engineers are able to choose the design flow that suits their needs: pure CPU application using popular computer vision libraries, pure RTL design in FPGA, or a mixed flow using the best from both worlds.

But what is behind the scenes? Designers require an image source (input) as well as an image visualization device (output) to check the results of the image processing algorithm. Typically, these are digital cameras and display panels. In general, an image processing algorithm implies the pixel data is stored into frame buffers allocated in main system DDR memory. But how should that data get to memory? How does the input from a camera ultimately reach the output on a display as a processed image? Introducing the Xilinx VDMA IP core.

VDMA stands for Video Direct Memory Access and is nothing but a modification of a general-use DMA engine for video/imaging applications. VDMA is used to get the pixel data from image source devices to the memory (write channel) or move that data from memory to the image displays (read channel). So let’s see how it’s actually done.


Video input subsystems
A video input subsystem is a set of IP cores used to grab, convert and store the image data into the memory buffers. The write channel of the VDMA engine takes the pixel data represented in AXI4-Stream form and translates it into a series of the incoming burst DDR transfers using AXI4-MM (Memory Mapped) protocol. As it follows from the description above, the initial pixel data must use an AXI4-Stream format which is a non-addressable modification of the standard AXI4 protocol for high-speed data transfers between image source/destination devices and the data processors. The VDMA engine recognizes the AXI4-Stream TUSER signal as SOF (Start Of Frame) as well as the TLAST signal as EOL (End Of Line), both of which are required for frame synchronization purposes. The pixel data comes through TDATA lines strobed by the TKEEP signal. A simple TVALID/TREADY handshake is used to synchronize the slave and master devices similarly to the MM version of AXI4 protocol conventions.

Video frame transfer over AXI4-Stream protocol. Source: Xilinx

Sometimes it can be a little bit tricky to convert the input data from camera sensor into the streaming format, which is shown in the next 2 cases of frame grabbing used for TySOM reference designs reviewed below.

Parallel camera interface: Digital Video Port (DVP). Target HW platform: TySOM-2-7Z045 & FMC-ADAS + Blue Eagle HDR camera.
Digital CMOS camera with a parallel DVP interface is a well-known image source for an advanced Zynq designer and is well suited for low-resolution solutions (up to 1 Mpx). It’s easy to use with Zynq because the raw pixel data is present on the camera interface data lines along with the line (HREF) and frame (VSYNC) synchronization signals. Serial_camera_interface1Each portion of pixel data is sampled by the pixel clock (PCLK) present on the interface. Actually that is the main limiting factor: growing image sensor resolution will result in a linearly growing pixel clock frequency making the data on lines unstable and hard to use. Zynq designers may meet a need to convert HREF signal into HSYNC form as well as a need for an additional clocking scheme. Such a subsystem is widely used for Aldec’s Automotive Solutions reference designs and the VDMA engine as the heart of the video input subsystem makes it possible.

Serial camera interface: MIPI CSI-2: Target HW platform: TySOM-1-7Z030 + RaspberryPi camera v1.3
When image sensor performance is increasing into the values above 1-2 megapixels, MIPI CSI-2 serial camera interface is there to help. In this case, pixel data and clock are distributed via differential lanes in a serial manner. The CSI-2 protocol can work in 2 modes: High Speed (HS) for an image payload transmission and Low Power (LP) for a backside data payload transmission through the same data lines. Parallel_camera_interface Yes, this makes the receiver device logic more complicated. But, on the bright side, there is no strict limitation for camera sensor performance allowing expansion of an image size up to 10-16 megapixels and more. Although the Zynq-7000 devices are not fully electrically compatible with the MIPI D-PHY physical protocol used for CSI-2, those data lines can be accessed by Zynq using the LVDS I/O standard running in HS mode. The RaspberryPi camera is a typical device which utilizes CSI-2 equipped image sensors and is used for IoT and embedded vision reference designs available for Aldec’s TySOM EDK. VDMA helps to bring the data received and converted into the DDR allocated frame buffers.