How To Integrate An Embedded FPGA

Adding an eFPGA into an SoC is more complex than just adding an accelerator.


Choosing to add programmable logic into an SoC with an eFPGA is just the beginning. Other choices follow involving how many lookup tables (LUTs), how much routing and what topology, how will data be transferred in and out of the fabric, does data need to be coherent with system memory, how will it be programmed and tested, and what RTL functions need to be embedded into the programmable fabric itself?

Some of these decisions will be influenced by the application. Is it going to be a compute accelerator? Is it doing sensor processing, or perhaps part of a pipelined datapath? Providers of eFPGA technology are attempting to make their offerings as flexible as possible. But while the content going into the eFPGA can be deferred, decisions about the eFPGA cannot.

The impact does not stop there. The verification flow has to change, as well. Does the team continue to use an ASIC verification flow for the whole design, including the content expected to go into the fabric, or does the fabric content look more like an FPGA flow? There is potentially a new team member, one that begins to look a lot more like a software programmer, but at a level that is more akin to firmware and may need more hardware and architecture knowledge.

“Incorporating an FPGA fabric into a likely already complex ASIC design is no small feat,” says Brian Mathewson, verification technologist for Mentor, a Siemens Business. “Developers must ensure that the eFPGA fabric is implemented correctly and any logic intended for the programmable fabric is sized appropriately and can interface with the other parts of the ASIC.”

There are significant differences between embedding an eFPGA and having a two-chip solution where a generic FPGA is utilized. “When integrating an eFPGA, you need to consider the application,” advises Yoan Dupret, managing director and vice president of business development for Menta. “You have to explore the design corners, and that should be evaluated up front. We do not need to consider the case of a generic FPGA where you have one design today and, in the future, you want one that is totally different.”

Inside the fabric
The usage of an eFPGA is different from a standalone FPGA. “The mindset of an ASIC design engineer is a little different from that of an FPGA design engineer,” says Himanshu Sanghavi, senior director of engineering for programmable IP at Quicklogic. “The FPGA designer is used to working within the constraints of the FPGA in terms of the number of LUTs, size of RAMs available, capability of DSP/math blocks, etc., which are all limited resources. The ASIC designer, while obviously constrained by the overall SoC area/power constraints, typically has more flexibility to design the logic they need rather than always having to fit it on the pre-designed FPGA fabric.”

Fig 1: Mixing functions inside the fabric. Source: Menta.

Most eFPGA vendors allow intermixing of programmable blocks with memories and fixed function blocks, as shown in Figure 1. Choices can be made about the numbers of each and how they are distributed in the fabric. Some vendors allow you to supply custom blocks, while others have a list of the blocks that can be incorporated.

It also may be possible to define the geometry of the block. “People generally want things that are square or close to it,” says Geoff Tate, CEO of Flex Logix. “Being square will tend to minimize the longest potential critical paths. There are some cases where narrow and skinny make sense, such as for I/O functions.”

There are several variants used to interconnect the blocks. Some use a fabric similar to traditional FPGAs, as indicated on the left in figure 2. Others, like Efinix, treat the routing resources in a similar manner to programmable blocks so they can be placed exactly where they are needed. Getting the right balance between logic functions and interconnect can significantly impact the ability to get high resource utilization.

Fig 2: Connecting the logic functions. Source: Efinix.

Yet another option, which may become available for eFPGAs, is the utilization of a NoC, as shown in figure 3.

Fig 3: Integrating a Network-on-Chip into an FPGA fabric. Source: Achronix.

Achronix has just announced this capability in its standalone FPGA products. “The NoC distributes data throughout the FPGA fabric using a series of high-speed row and column network conduits, distributing data traffic horizontally and vertically throughout the FPGA fabric,” says Kent Orthner, systems architect at Achronix.

Utilization of a NoC also may provide the connectivity to the rest of the SoC.

Connecting to the fabric
At some point, the programmable fabric has to be integrated into the SoC. In Figure 4, Quicklogic illustrates a variety of ways to perform that integration. While this picture is a conceptual block diagram, every interface shown is representative of interfaces that have been used in actual SoC designs.

Fig 4: Integrating the eFPGA fabric. Source: QuickLogic.

The most likely path to be taken by many applications is to connect the fabric to the system bus. “Most solutions have the fabric sitting on a bus,” says Menta’s Dupret. “We have also seen solutions that are more deeply embedded. Some people want memories that are shared with CPUs, maybe even some cache. However, most solution will have memory within the datapath.”

An interface such as AXI provides a high-bandwidth channel into the fabric. “Most people want a faster, cheaper version of what they have today,” says Flex Logix’s Tate. “Most do not think about radical architectural changes. They are used to an FPGA working with SRAM memory in the chip form, and so that is how they think about it when it becomes a block within a chip. The FPGA will be some form of accelerator or I/O block hanging off an AXI or AHB bus.”

Different interfaces also can be used. “The eFPGA would need an AXI slave interface at a minimum, and if the potential use cases require bus mastering capability, then a master interface is also required,” adds Quicklogic’s Sanghavi. “The use of AXI is for illustration, but it could be a different bus protocol based on the overall SoC data transfer requirements and IP availability.”

How the memory is connected can create a lot of complications. “Assume you have a structure where the embedded FPGA has a closely associated memory and that is a sub-system within a larger context,” says Frank Schirrmeister, senior group director for product management and marketing at Cadence. “You now have a hierarchy of memories. There is a cache for the embedded FPGA, so how do you deal with coherency? This is why they are creating cache coherent interconnects such as CCIX.”

Another complication is related to the speed of the interface and the rate at which the eFPGA can operate. “The system AXI interface is likely to clock at a relatively high frequency, and the logic on the eFPGA may or may not run at the same frequency,” says Sanghavi. “Thus, the figure shows a block that is used for clock rate conversion and synchronization between the FPGA fabric and the AXI master/slave module. This is an optional block, only necessary if such clock rate conversion is required.”

How fast can the eFPGA run? “After they write the RTL, they run it through the tools and they get a maximum frequency number for the PVT conditions they have selected,” says Tate. “If they want higher throughput, they would need to do some optimization of the RTL.”

While that sounds like a problem, the amount of variation in operating frequency is likely to be limited. “They are likely to be using the same kind of application, and while they may update it in the future, it remains fairly predictable,” adds Dupret. “The application will never change completely. So, we will not see a future application that only runs at half the speed.”

That may require defining a static relationship. “If the SoC clock is a certain frequency, and you could run the FPGA at another frequency, you should look for some integer multiple of the two,” suggests Tate. “If the system clock is 1GHz and the FPGA is capable of running at 600MHz, you would most likely run the FPGA at 500MHz so that it can be synchronous with the system clock. If the customer wants to have asynchronous interfaces between block – they can do that. It just increases the complexity.”

There are lots of options, but it doesn’t have to be that complicated.

“People usually think about this as a processing block, and they have to get the data in and out and they need it to run at a particular frequency,” says Joe Mallett, senior product marketing manager for FPGA-based synthesis software tools at Synopsys. “From a synthesis perspective you may have to push on it harder to get the required number. If you have a synchronous interface, it makes it easier because you no longer have to worry as much about domain crossings and synchronizing data between domains.”

Sanghavi discusses some of the other interface options. “In addition to the high bandwidth data channel, the eFPGA may interface to slow serial peripherals such as sensors. SPI interfaces typically run at low clock frequencies that the eFPGA can likely match, so clock rate conversion is typically not needed. It can be very useful to have the eFPGA raise interrupts to CPU/DSP type processing blocks on the SoC. This would be used, for example, to inform the CPU that the eFPGA has completed the data processing task assigned to it and is idle.”

And this is where programmability becomes especially important.

“Most SoCs include several fixed-function blocks, which are custom RTL blocks, designed to perform a certain computation efficiently,” Sanghavi says. “While the data processing done by this fixed-function block is known upfront, it is likely that a variety of pre- or post-processing tasks may be needed, some of which are not known at the SoC design time. An eFPGA could be an excellent option to perform such tasks, and in this scenario it may be advantageous to have a direct connection between the eFPGA and the fixed-function block. This can be a custom point-to-point interface that does not use up the AXI bus bandwidth for this dedicated data transfer task.”

In some situations, the task assigned to the eFPGA may require it to access one or more primary I/Os or pins in an SoC. This could be, for example, communicating status information through a general-purpose I/O pin, or implementing an additional serial interface in the eFPGA that communicates with logic outside of the SoC. It is very likely this would use pins that are shared with other functions, through some I/O muxing structure configured at the SoC level.

“An eFPGA on an SoC requires a programming interface to load the design to be implemented on it,” Sanghavi explains. “Similarly, the functionality implemented on the eFPGA may require some additional runtime configuration bits, such as setting various mode and other control register bits for the logic implemented on the FPGA. This task can be accomplished through a peripheral interface such as APB. An APB slave interface within the eFPGA sub-system connects to the system APB bus, and the system APB master would program the eFPGA through this interface. Once again, the use of APB is illustrative, and similar functionality can be achieved through a serial bus such as SPI or various other means.”

Programming the system
The addition of an eFPGA adds a new dimension to the programming task. “With embedded FPGAs, it becomes a little more challenging, but in other senses easier,” says Synopsys’ Mallett. “An eFPGA block is already connected to a bus along with the interrupts, signals and dataflows to memory. All they have to think about is what do I put in there. The portion that gets harder is that a lot of these chips are more software centric meaning that the programmable pieces of an SoC are software. You will have drivers and operating systems and an application and now you have a programmable piece that is configurable hardware.”

Who will be coding the eFPGA content? “You are adding an additional audience that is traditionally software developers and they want to use this hardware accelerator,” adds Mallett. “They may just program it like firmware and expect a certain level of operation and be able to write to APIs. New users may not be as hardware savvy. The pressure has been building to support this class of user.”

That may require a change in programming model. “One way to simplify the programming model is to design at a higher level of abstraction so that there is a common software-based design entry method,” says Max Odendahl, CEO for Silexica. “Then use high-level synthesis (HLS) to convert the software code into an FPGA implementation. A tool is required that helps engineers define how the software and hardware are partitioned, and it should provide actionable insights into the design dependencies and guide the HLS compiler to implement the hardware efficiently.”

For some design teams, new skills will be required. “Some teams have been using FPGAs at the system level connected to their ASICs,” Odendahl said. “They have a lot of knowledge of FPGA in chip form and they understand FPGA programming. Other teams, who do not use FPGAs today, are used to using RTL for hardwired logic and they have a steeper learning curve. With FPGAs, to get high performance, you need to do more pipelining. FPGA have flip flops on the outputs of all of the LUTs and so pipelining basically comes for free. But you can’t take existing hardwired ASIC RTL and throw it into and FPGA and expect to get good results. They have a longer learning curve.”

This is where FPGA expertise comes in. “It helps to have an engineer on the design team who is familiar with the use of FPGA place and route tools,” advises Sanghavi. “Many SoC design teams outsource physical implementation of their ASIC to other teams, and thus may not have any in-house expertise in this area.”

Verifying the System
The verification flow also has to change. The fabric, the interface between the fabric and the SoC, and the mapped functions all have to be verified. “EDA flows have expanded to incorporate the drastic increase in configuration space that eFPGAs demand,” says Mentor’s Mathewson. “Not only must the instantiation of the eFPGA be validated, but potential uses of the eFPGA must be explored.”

Teams have to decide whether they want to verify the unprogrammed fabric or if they will verify it with several possible mapping. “From a basic hardware perspective, I am going to write the accelerator that will go into the fabric and I will test and verify those pieces,” says Mallett. “This is similar to the existing standalone FPGA view of the world. I will simulate it, debug it, and I will use the tools that exist today. But if you have several processing blocks and accelerators in your chip, and the blocks are interacting with each other, you have to run it long enough to find out if it is functionally doing what it is supposed to.”

FPGA verification typically has lagged SoC methodologies. “The FPGA attitude of code, map, turn on, debug, repeat, has not changed much,” says Cadence’s Schirrmeister. “Designs have become more complex for the FPGA portion and could benefit from a more top-down, orderly approach, but it has not improved much.”

This is partly due to the fact that eFPGAs can be used across a variety of applications. “While the supplier of the embedded FPGA resource will stand behind the IP it provides, it cannot foresee how it will be integrated and connected with other functions in the target SoC,” says Deepak Kumar Tala, Chairman of SmartDV. “VIP plays an important role in verifying that the standard interconnects and communications protocols between the embedded FPGA and the other blocks in the SoC behave as expected or to spec.”

Formal verification can help. “The FPGA fabric must be verified twice, first by the vendor and then by the user programming it,” explains Tobias Welp, engineering manager for OneSpin Solutions. “Formal equivalence checking, a key verification step, is even more important when fabric is involved because FPGA synthesis tools offer advanced optimizations to meet power, performance, and area (PPA) goals. Some of these optimizations change the state space of the design and move logic across register boundaries, so sequential equivalence checking is required. This should be performed in multiple stages to ensure that the input RTL, the post-synthesis netlist, the placed-and-routed netlist, and the programming bitstream are all functionally equivalent.”

Incorporating an FPGA fabric into an SoC requires significant change in the design and verification flows, and should only be undertaken when all the risks are understood. It requires assembling a team with all of the required skills. And it may require different architectural thinking to get to an optimized solution. But those who manage it will gain levels of flexibility and advantages in the ability to future-proof their products, and that will provide a significant cost of ownership advantage.

Related Articles
eFPGA & FPGA Knowledge Center
FPGA Design Tradeoffs Getting Tougher
As chips grow in size, optimizing performance and power requires a bunch of new options and methodology changes.
The Case For Embedded FPGAs Strengthens And Widens
Combining the flexibility of a FPGA with the performance and cost benefits of an SoC is pushing this technology well into the mainstream.


Lydell Aaron says:

I like your observation:

“There is potentially a new team member, one that begins to look a lot more like a software programmer, but at a level that is more akin to firmware and may need more hardware and architecture knowledge.”

Leave a Reply

(Note: This name will be displayed publicly)