New Ways To Optimize GEMM-Based Applications Targeting Two Leading AI-Optimized FPGA Architectures


A technical paper titled “Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs” was published by researchers at The University of Texas at Austin and Arizona State University.


“FPGAs are a promising platform for accelerating Deep Learning (DL) applications, due to their high performance, low power consumption, and reconfigurability. Recently, the leading FPGA vendors have enhanced their architectures to more efficiently support the computational demands of DL workloads. However, the two most prominent AI-optimized FPGAs, i.e., AMD/Xilinx Versal ACAP and Intel Stratix 10 NX, employ significantly different architectural approaches. This paper presents novel systematic frameworks to optimize the performance of General Matrix Multiplication (GEMM), a fundamental operation in DL workloads, by exploiting the unique and distinct architectural characteristics of each FPGA. Our evaluation on GEMM workloads for int8 precision shows up to 77 and 68 TOPs (int8) throughput, with up to 0.94 and 1.35 TOPs/W energy efficiency for Versal VC1902 and Stratix 10 NX, respectively. This work provides insights and guidelines for optimizing GEMM-based applications on both platforms, while also delving into their programmability trade-offs and associated challenges.”

Find the technical paper here. Published April 2024 (preprint).

Taka, Endri, Dimitrios Gourounas, Andreas Gerstlauer, Diana Marculescu, and Aman Arora. “Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs.” arXiv preprint arXiv:2404.11066 (2024).

Related Reading
AI Accelerator Architectures Poised For Big Changes
Design teams are racing to boost speed and energy efficiency of AI as it begins shifting toward the edge.


Leave a Reply

(Note: This name will be displayed publicly)