Why interconnects are important to eFPGAs.
Intel builds processor chips and Arm provides processor cores to integrate into chips.
Xilinx and Intel (nee Altera) build FPGAs and a range of new startups provide embedded FPGA (eFPGA) to integrate into chips: Achronix, Flex Logix, Menta and QuickLogic.
As the diagram above shows, an FPGA chip is a core (the “fabric”) which is surrounded by various kinds of I/O including SERDES, DDR PHYs, USB, PCI-Express and GPIO, among others.
An eFPGA is just the core, without the analog I/O, which can be integrated into a chip using high speed CMOS inputs and outputs. As shown below, an eFPGA can be connected directly to processor buses of any kind, in the control path, in the data path and/or to the digital inputs of I/O blocks like SERDES, DDR PHYs, etc. as well as directly to RAM.
Like FPGA cores, eFPGA are constructed of some building blocks. There is always programmable logic and programmable interconnect surrounded by CMOS I/O blocks.
The very small eFPGA above has about a hundred LUTs grouped into four LUT6 (6-input LUTs) with optional carry chain and then two optional flip-flops on the output of each LUT. The inputs of the LUT come from the programmable interconnect network and the outputs go to the programmable interconnect network. The eFPGA has about a hundred CMOS inputs and a hundred CMOS outputs surrounding the fabric to connect to the rest of the chip.
So far, every supplier does things similarly. LUTs are LUTs, the inputs/outputs are CMOS and there is always an interconnect network.
What varies is LUT size: some suppliers use 4-input LUTs and some use 6-input LUTs. The large FPGA chip companies have all shifted to 6-input LUTs to get higher density and higher performance (even though they all talk about capacity in LUT4 equivalents). As the diagram below shows, using LUT6 will always result in higher speeds for wide logic cones.
The programmable interconnect network for almost all suppliers is a mesh grid such has been used in FPGAs since the first Xilinx arrays in the 1980s.
Flex Logix uses a new interconnect network, described in a 2014 ISSCC Paper (which won the Outstanding Paper award) by Cheng, et al:
The advantage of this interconnect is that as the array size grows (N LUTs), the size of the interconnect grows much less than the traditional mesh interconnect. Flex Logix estimates its interconnect is about 2x the density of mesh: this is very significant because in a traditional FPGA fabric, 80% of the area is interconnect, only 20% is programmable logic.
In some applications there is intensive math processing like DSP: FFT, FIR, IIR, etc. For these applications there are MACs (multiplier-accumulators, also with a pre-adder in some cases) which can replace some of the programmable logic. Here is a simplified block diagram:
The MACs have connections both to the interconnect network and also from MACs adjacent on both sides for pipelining. Below is a very small eFPGA with two MACs (called DSPs here) taking the place of some of the LUT logic.
In an FPGA chip there is usually a fixed ratio of LUTs and MACs, but as you see in eFPGA, it is possible to have all-LUTs or a mix of LUTs/MACs as the application requires. Below is an example of a 3×3 array of small (EFLX-100, i.e. 100 LUT6) eFPGAs showing that each “tile” or “core” can be either all-logic or DSP (i.e. logic with some MACs) to give a wide range of ratios as needed.
Similarly, in an FPGA chip there is always also some amount of Block-RAM (BRAM), a dual port RAM that can be configured to look wide/shallow or narrow/deep.
In the eFPGA examples shown so far there is no BRAM. Again, eFPGA allows more ability to tailor the hardware resources to match the application.
Of course, RAM can be attached to the edge of the eFPGA:
But RAM can also be integrated within the eFPGA between blocks of programmable logic:
Some vendors of eFPGA can integrate any kind of RAM as specified: dual-port, single-port, ECC, parity, even specialized memory like TCAM.
eFPGA is available in sizes from hundreds of LUTs to hundreds of thousands of LUTs.
A standalone eFPGA can be abutted, using a top-level interconnect that extends the network across the larger array, into >50 array sizes up to 200K LUT4s. This approach has the advantage that the larger arrays are made up of 100%-silicon proven GDS blocks to minimize risk.
Flex Logix has fabricated the 16nm EFLX200K array above (using a mix of logic and DSP cores) and has fully validated the electrical specs using the chip:
Some vendors provide evaluation boards to enable customers to test their RTL on the actual silicon they’ll then integrate into their chips, such as this evaluation board which has the 16nm EFLX200K array with PC/USB interface for loading bitstreams and debug and various daughter card options for GPIO interface into the array.
Next month, we will discuss eFPGA software tools.
Leave a Reply