SPONSOR BLOG

Maximize SoC Compatibility With Flexible Pre- And Post-Processing

Expand market applicability and increase security for modern SoCs featuring high compute acceleration engines, like AI and GPU ICs.

April 4th, 2024 - By: Jayson Bethurem

Building ASICs and custom ICs (integrated circuits) is becoming increasingly challenging. To create successful products with long-lasting market impact, it’s essential for the critical IP to be differentiated by performance, power, and features. It is difficult to predict and design for every potential application, especially considering each application has unique interfaces and processing requirements. Adding embedded programable logic enables designers to adapt their IC to support any interface and associated data processing. Embedded FPGA technology like Flex Logix’s IP enables greater market applicability and product differentiation.

Fig. 1: Typical applications for eFPGA in AI and signal processing ICs.

Consider figure 1 above, each of the interfaces requires different pre-processing. Let’s first examine data coming from network interfaces. These often require deterministic packet processing for parsing high speed data packets. These functions include, but are not limited to:

Packet forward and redirection (switching)
Detection and correction of data errors
Decryption and security policy enforcement
Transformation and data reduction
UDP/IP encapsulation

While many of these functions can be performed in a lookaside implementation, EFLX eFPGA IP can efficiently execute these functions in line. This increases effectiveness and performance, which can be essential in real-time applications like industrial networking. Figure 2 shows a hardware protocol stack engine from CAST, which only uses a few EFLX eFPGA tiles. Because it’s reconfigurable, this IP can scale from 1G up to 100G networks, increasing marketability.

Fig. 2: 40G/50G UDP/IP Hardware Protocol Stack Engine from CAST.

In addition to packet parsing, real-time security policies can be implemented preventing external threats from corrupting the device and incoming data. Figure 3 shows an implementation of a dynamic packet security engine from Dynanic. This utilizes the programmable logic to efficiently capture, filter, divert or tag all traffic of interest at very high speeds to detect network anomalies or malicious traffic. Moreover, it has the capability to continuously adapt to the entire solution to the target network and evolving threats.

Fig. 3: Dynanic SmartNIC solution.

This is only a small subset of packet processing applications of network data. Alternatively, another common application of these ICs is inferencing data from video streams. Video sources come from a variety of sensor interfaces including MIPI, USB, LVDS and Ethernet, which requires flexibility. Each can vary in resolution, frame rate and color depth. This affects the data path, both data width and data rate. Processing performance can be further accelerated by not only processing the pixels in parallel, but also multiple video channels in parallel. Figure 4 shows common IP utilized in image signal processing pipelines, which efficiently run in eFPGA IP.

Fig. 4: Components of a typical image signal processing pipeline.

Finally, the third application we examine is generic signal processing of sampled data from data converters. This data requires a completely different type of signal processing, which typically includes adaptive filters, transforms like FFTs and IFFTs, as well as generic matrix multiplication and inversion algorithms. Many of these signal processing applications can be implemented in embedded programmable logic and further accelerated with digital signal processing IP and TPU cores – both available from Flex Logix. In addition, these IP remain dynamic and flexible, enabling adaptability to evolving application demands.

Shown below is a simplified block diagram of Flex Logix’s DSP IP core. The 22×22-bit signed real multiplier is also configurable as an 11×11-bit signed complex multiplier without additional resources. Similarly, the pre-adder and the post-adder can do 11- and 24-bit complex signed additions/subtractions, respectively. Other DSP features include the rounding operations using the built-in sign-detection logic and the local carry-in signals.

Fig. 5: Simplified DSP diagram.

Multiple DSP blocks can be efficiently concatenated to realize larger multipliers and adders, in both real and complex modes. Figure 6 shows a 10-tap symmetric FIR filter using only five DSPs blocks.

Fig. 6: An example of a 22-bit 10-tap FIR filter utilizing only 5 DSP blocks.

Dynamic and reconfigurable in real-time, the DSP blocks enable adaptive filters as shown in figure 7.

Fig. 7: Common adaptive noise filter design.

As mentioned above, Flex Logix IP also offers a TPU core, InferX, ideal for any vector/matrix computation. InferX is effectively a scalable one-dimensional tensor processor (vector & matrix) controlled by the eFPGA fabric allowing IP adaptability to any signal processing algorithm implementation, including AI models. InferX has roughly 10 times the DSP performance of the aforementioned DSP IP and uses only one-quarter of the area.

Fig. 8: InferX IP scalable from 1/8th of a tile to > 8 tiles.

InferX achieves up to dozens of TeraMACs/second at TSMC 5nm node. It is ideal for applications including FFT, FIR, IIR, beam forming, matrix/vector operations, matrix inversions, Kalman functions and more. It can handle Real or Complex, INT16x16 with accumulation at INT40 for accuracy. Multiple DSP operations can be pipelined in streaming mode or packet mode. See below for more benchmarks for common algorithms running on TSMC’s 5nm node.

InferX DSP solutions are easily programmed via common tools like Matlab Simulink. Flex Logix built a ready-to-use standard Simulink block set that provides a simplified configuration, bit-accurate modeling with flexible precision.

This illustrates how Flex Logix IP can tackle any pre-processing algorithm while maintaining flexibility to adapt to emerging market demands. Beyond preprocessing data, whether it be network data, video streams, or sampled data, it’s also important to manage data into the central processor. Flex Logix IP can buffer data into the CPU to maximize efficiency and prevent starvation. And once computation has completed, Flex Logix IP can also assist in getting the data off chip by adapting the output data to any protocol and physical layer. By utilizing Flex Logix IP in your design, you can not only increase your market applicability but also adjust to novel protocols, evolving security threats, and most importantly, emerging market demands!

Want to learn more about Flex Logix IP for adaptable and high performance pre- and post-processing? Contact us at [email protected] to learn more or visit our website https://flex-logix.com.

Jayson Bethurem

(all posts)
Jayson Bethurem is vice president of marketing and business development at Flex Logix.

Knowledge Centers
Entities, people and technologies explored

Shift Left Is The Tip Of The Iceberg

A transformative change is underway for semiconductor design and EDA. New languages, models, and abstractions will need to be created.

by Brian Bailey

Partitioning In The Chiplet Era

Understanding how chiplets interact under different workloads is critical to ensuring signal integrity and optimal performance in heterogeneous designs.

by Ann Mutschler

3.5D: The Great Compromise

Pros and cons of a middle-ground chiplet assembly that combines 2.5D and 3D-IC.

by Ed Sperling

NAND Flash Targets 1,000 Layers

New techniques go beyond improved deposition and etching, but challenges stack up, too.

by Bryon Moyer

AI’s Role In Chip Design Widens, Drawing In New Startups

Focus is on letting engineers do much more with the same or fewer resources — and less drudgery.

by Karen Heyman

What Comes After HBM For Chiplets

The standard for high-bandwidth memory limits design freedom at many levels, but that is required for interoperability. What freedoms can be taken from other functions to make chiplets possible?

by Brian Bailey

Memory Fundamentals For Engineers

eBook: Nearly everything you need to know about memory, including detailed explanations of the different types of memory; how and where these are used today; what's changing, which memories are successful and which ones might be in the future; and the limitations of each memory type.

by The SE Staff

Why Small Fab And Assembly Houses Are Thriving

Megafabs churning out the most advanced processors are not the only game in town.

by Bryon Moyer

Maximize SoC Compatibility With Flexible Pre- And Post-Processing

Jayson Bethurem

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Shift Left Is The Tip Of The Iceberg

Partitioning In The Chiplet Era

3.5D: The Great Compromise

NAND Flash Targets 1,000 Layers

AI’s Role In Chip Design Widens, Drawing In New Startups

What Comes After HBM For Chiplets

Memory Fundamentals For Engineers

Why Small Fab And Assembly Houses Are Thriving

Sponsors

Recent Comments

About

Navigation

Connect With Us

Maximize SoC Compatibility With Flexible Pre- And Post-Processing

Jayson Bethurem

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Shift Left Is The Tip Of The Iceberg

Partitioning In The Chiplet Era

3.5D: The Great Compromise

NAND Flash Targets 1,000 Layers

AI’s Role In Chip Design Widens, Drawing In New Startups

What Comes After HBM For Chiplets

Memory Fundamentals For Engineers

Why Small Fab And Assembly Houses Are Thriving

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored