Maximize SoC Compatibility With Flexible Pre- And Post-Processing

Expand market applicability and increase security for modern SoCs featuring high compute acceleration engines, like AI and GPU ICs.


Building ASICs and custom ICs (integrated circuits) is becoming increasingly challenging. To create successful products with long-lasting market impact, it’s essential for the critical IP to be differentiated by performance, power, and features. It is difficult to predict and design for every potential application, especially considering each application has unique interfaces and processing requirements. Adding embedded programable logic enables designers to adapt their IC to support any interface and associated data processing. Embedded FPGA technology like Flex Logix’s IP enables greater market applicability and product differentiation.

Fig. 1: Typical applications for eFPGA in AI and signal processing ICs.

Consider figure 1 above, each of the interfaces requires different pre-processing. Let’s first examine data coming from network interfaces. These often require deterministic packet processing for parsing high speed data packets. These functions include, but are not limited to:

  • Packet forward and redirection (switching)
  • Detection and correction of data errors
  • Decryption and security policy enforcement
  • Transformation and data reduction
  • UDP/IP encapsulation

While many of these functions can be performed in a lookaside implementation, EFLX eFPGA IP can efficiently execute these functions in line. This increases effectiveness and performance, which can be essential in real-time applications like industrial networking. Figure 2 shows a hardware protocol stack engine from CAST, which only uses a few EFLX eFPGA tiles. Because it’s reconfigurable, this IP can scale from 1G up to 100G networks, increasing marketability.

Fig. 2: 40G/50G UDP/IP Hardware Protocol Stack Engine from CAST.

In addition to packet parsing, real-time security policies can be implemented preventing external threats from corrupting the device and incoming data. Figure 3 shows an implementation of a dynamic packet security engine from Dynanic. This utilizes the programmable logic to efficiently capture, filter, divert or tag all traffic of interest at very high speeds to detect network anomalies or malicious traffic. Moreover, it has the capability to continuously adapt to the entire solution to the target network and evolving threats.

Fig. 3: Dynanic SmartNIC solution.

This is only a small subset of packet processing applications of network data. Alternatively, another common application of these ICs is inferencing data from video streams. Video sources come from a variety of sensor interfaces including MIPI, USB, LVDS and Ethernet, which requires flexibility. Each can vary in resolution, frame rate and color depth. This affects the data path, both data width and data rate. Processing performance can be further accelerated by not only processing the pixels in parallel, but also multiple video channels in parallel. Figure 4 shows common IP utilized in image signal processing pipelines, which efficiently run in eFPGA IP.

Fig. 4: Components of a typical image signal processing pipeline.

Finally, the third application we examine is generic signal processing of sampled data from data converters. This data requires a completely different type of signal processing, which typically includes adaptive filters, transforms like FFTs and IFFTs, as well as generic matrix multiplication and inversion algorithms. Many of these signal processing applications can be implemented in embedded programmable logic and further accelerated with digital signal processing IP and TPU cores – both available from Flex Logix. In addition, these IP remain dynamic and flexible, enabling adaptability to evolving application demands.

Shown below is a simplified block diagram of Flex Logix’s DSP IP core. The 22×22-bit signed real multiplier is also configurable as an 11×11-bit signed complex multiplier without additional resources. Similarly, the pre-adder and the post-adder can do 11- and 24-bit complex signed additions/subtractions, respectively. Other DSP features include the rounding operations using the built-in sign-detection logic and the local carry-in signals.

Fig. 5: Simplified DSP diagram.

Multiple DSP blocks can be efficiently concatenated to realize larger multipliers and adders, in both real and complex modes. Figure 6 shows a 10-tap symmetric FIR filter using only five DSPs blocks.

Fig. 6: An example of a 22-bit 10-tap FIR filter utilizing only 5 DSP blocks.

Dynamic and reconfigurable in real-time, the DSP blocks enable adaptive filters as shown in figure 7.

Fig. 7: Common adaptive noise filter design.

As mentioned above, Flex Logix IP also offers a TPU core, InferX, ideal for any vector/matrix computation. InferX is effectively a scalable one-dimensional tensor processor (vector & matrix) controlled by the eFPGA fabric allowing IP adaptability to any signal processing algorithm implementation, including AI models. InferX has roughly 10 times the DSP performance of the aforementioned DSP IP and uses only one-quarter of the area.

Fig. 8: InferX IP scalable from 1/8th of a tile to > 8 tiles.

InferX achieves up to dozens of TeraMACs/second at TSMC 5nm node. It is ideal for applications including FFT, FIR, IIR, beam forming, matrix/vector operations, matrix inversions, Kalman functions and more. It can handle Real or Complex, INT16x16 with accumulation at INT40 for accuracy. Multiple DSP operations can be pipelined in streaming mode or packet mode. See below for more benchmarks for common algorithms running on TSMC’s 5nm node.

InferX DSP solutions are easily programmed via common tools like Matlab Simulink. Flex Logix built a ready-to-use standard Simulink block set that provides a simplified configuration, bit-accurate modeling with flexible precision.

This illustrates how Flex Logix IP can tackle any pre-processing algorithm while maintaining flexibility to adapt to emerging market demands. Beyond preprocessing data, whether it be network data, video streams, or sampled data, it’s also important to manage data into the central processor. Flex Logix IP can buffer data into the CPU to maximize efficiency and prevent starvation. And once computation has completed, Flex Logix IP can also assist in getting the data off chip by adapting the output data to any protocol and physical layer. By utilizing Flex Logix IP in your design, you can not only increase your market applicability but also adjust to novel protocols, evolving security threats, and most importantly, emerging market demands!

Want to learn more about Flex Logix IP for adaptable and high performance pre- and post-processing? Contact us at [email protected] to learn more or visit our website

Leave a Reply

(Note: This name will be displayed publicly)