Challenges In Using HLS For FPGA Design

How to improve productivity in FPGAs while opening them to a broader skill set.

popularity

High-level synthesis (HLS) tools, which transform C/C++ source code to Verilog/VHDL, have been commercially available for over 15 years. HLS tools from FPGA vendors and EDA companies promise improved productivity through a higher-level of abstraction, faster verification and quicker design iterations. For example, simulating your design in C/C++ can be 10 to 100x faster than simulating in RTL (Register-Transfer Level). In addition, many applications in image processing and computer vision require a visual verification. Such validation would be difficult to setup when performing RTL simulations, but can be easily implemented at the C level. The achieved gains in simulation speed while using an HLS based methodology lead to faster design iterations, significantly improving productivity. Quicker simulation leads to faster design iterations.

Using HLS potentially opens the use of FPGAs to a much broader base of engineers that specialize in embedded software programming. Traditional methods for designing FPGAs require a very specialized skill set and proficiency in describing the design in a Hardware Description Language (HDL) such as Verilog or VHDL. This skillset is quite rare when compared to embedded software developers, limiting the use of FPGAs to only those experienced in coding in an HDL. HLS allows embedded software developers (and hardware engineers) to implement their algorithms in hardware using a higher-level language, such as C/C++.

If the benefits of using HLS for FPGA design are so substantial, why hasn’t designing FPGAs in C/C++ become the standard design entry method?

The simple answer is that adopting an HLS design methodology in the real world does present unique challenges that must be considered and overcome during the design process. These challenges can lead to more work by the designer thus requiring more development time, which begins to negate the productivity gains of HLS. Let’s look at five of these challenges:

1. C/C++ code which is non-synthesizable by the HLS compiler
The C/C++ coding guidelines for HLS compilers are extensive and can be over 1000+ pages of documentation that needs to be comprehended when writing or refactoring C code for HLS synthesis. As an example, HLS does not support memory access on a variable within a dynamically sized array. And the amount of memory inside a given FPGA device is fixed, which means that code that dynamically allocates objects of variable sizes with calls to functions such as malloc, calloc and new, is not supported. The HLS tool must know the required memory resources required by an algorithm at compilation time, in order to produce an efficient hardware implementation.

2. “Non-Hardware aware” C/C++ code
Creating C/C++ code with various memory constructs, data types etc. that do not factor in the hardware implementation can have unintended consequences, including bloated device resources and slow performance. Care must be taken to avoid using data types that are too large and not needed. For example, using a 32-bit integer in software when only a 10-bit integer is required is inconsequential when mapping to a standard processor because the registers or memory locations already have a fixed size, however when implemented in hardware those unused bits become costly because they consume valuable FPGA resources.

3. Identifying parallelism
C/C++ code is typically executed sequentially on standard processors but implementing functions in logic gates allows operations to be executed in parallel, accelerating the execution of the code in hardware. Determining where potential parallelism exists in the design can be quite daunting and time consuming, especially as the complexity of the algorithm, the function, or the code base increases.

4. Software and hardware partitioning
For heterogenous designs (FPGAs with embedded processors in this case), identifying what to run on the processor and what to move to hardware to exploit the parallel nature of the FPGA fabric via HLS can take significant time and many iterations, even while conducting pre-synthesis simulations.

5. Inserting HLS compiler pragmas or directives into the C/C++ code
In order for the HLS compiler to effectively implement the software into hardware the user must provide guidance for the compiler, in the form of pragmas or directives. Determining when to use pragmas, how to set their parameters, where to insert them in to the code and simultaneously optimizing the pragmas on a system-level within an application is challenging and time consuming.

Collectively, these challenges present a significant barrier for those who want to take advantage of HLS design benefits. HLS vendors provide thorough documentation and training to educate customers on how to address these challenges, but it remains a manual process that takes time to master. Until now…

Silexica’s SLX FPGA tool, based on over 10 years of compiler technology research, provides practical solutions to the challenges discussed in this article by addressing these challenges through each step of the HLS design process.

First, SLX FPGA analyzes the C/C++ source code for synthesizability and provides automatic and guided code refactoring of the non-synthesizable code. SLX walks the user through each section of code that is non-synthesizable and automatically converts code or provides guidance on how to refactor the code to be synthesizable.

In a future release, SLX FPGA will identify non-HW friendly data constructs resulting in an inefficient hardware implementation. The SLX tool will also provide a method for conducting fast “what if” analysis to see what various data type options are feasible for more efficient implementation.

The next challenge that SLX FPGA addresses is analyzing the algorithm or application for parallelism that can be converted from sequential execution to parallel or pipelined execution. By identifying parallelism, SLX then provides the most efficient hardware implementation of the C/C++ code. If using an FPGA with an embedded processor system, SLX FPGA can also provide guidance on the most efficient distribution of the code between the SW and HW domains.

Finally, after the SW and HW partitioning has been defined, SLX FPGA then inserts the pragmas into the code so that the HLS compiler can then implement the optimizations in HW when compiling the C/C++.

SLX FPGA is the first tool in the industry that directly addresses the challenges of using HLS design flows, reducing the learning curve by providing actionable insights into converting C/C++ code into an optimized HW implementation.



Leave a Reply


(Note: This name will be displayed publicly)