Embedded FPGAs Going Mainstream?

Programmable devices are being adopted in more market segments, but they still haven’t been included in major SoCs. That could change.


Systems on chip have been made with many processing variants ranging from general-purpose CPUs to DSPs, GPUs, and custom processors that are highly optimized for certain tasks. When none of these options provide the necessary performance or consumes too much power, custom hardware takes over. But there is one type of processing element that has rarely been used in a major SoC— the FPGA.

Solutions implemented in FPGAs are often faster than any of the instruction-set processors, and in most cases they complete a computation with lower total energy consumption. However, their overall power consumption is higher, and performance is slower than custom hardware. In addition, they use a lot more silicon area because the FPGA is a fixed resource, so enough of it must be put onto a chip for what is believed to be the worst-case usage scenario.

The standalone FPGA market currently is dominated by two companies, Xilinx and Altera (Intel). Part of the reason for this is they do not just produce chips. FPGAs require a complex ecosystem to make them useable. This ecosystem is very similar to those required for supporting processors. The transformation from FPGA to eFPGA adds even more complexity to this ecosystem because it requires a customized toolchain for each IP core that is licensed.

In the past, several companies have attempted to pioneer the embedded FPGA space, but none have been successful. So what has changed, and is this likely to become a new processor type considered in a growing number of SoCs?

Changing landscape
To understand why eFPGAs may succeed this time around requires an understanding of both the changes happening across the industry at large and within specific markets. Many markets have relied on the progress of Moore’s Law, which provided smaller, cheaper, faster copies of what had been done in the past, which enabled increasing levels of integration as well as lower power. But much of that has ended.

At the top end of the market, product cycles are slowing down. “Networking and communications chips have long design cycles and are typically fabricated in advanced process nodes with $2 million to $5M mask costs,” says Geoffrey Tate, CEO of Flex Logix. “The problem with this is that standards such as protocols and packets are changing rapidly. It used to be that these chips would be redesigned every couple of years to keep up, which is an increasingly expensive proposition. In addition, data centers are pushing to make chips programmable so they can be upgraded in-system automatically, thereby improving the economics of data centers and enabling them to do their own customization and optimization for a competitive edge.”

The success of eFPGAs also may allow other markets to expand. “The customers are predominantly in the networking and wireline infrastructure, including Ethernet switching and routing applications,” says Robert Blake, president and CEO of Achronix. “The fundamental reprogrammable nature of the FPGA will be used in computation, and thus will cause a substantial increase in market size. This includes encryption and decryption, compression and decompression, unstructured search, machine learning and artificial intelligence.”

FPGA applications and growth phases: courtesy of Achronix

At the other end of the spectrum are small microcontrollers. “There are decreasing volumes for each product, particularly in the IoT domain,” says Yoan Dupret, business development manager for Menta. “Embedded FPGAs are getting traction because people are trying to expand the margins by increasing the overall volume of their chips. This is done by using the FPGA fabric to create multiple variants.”

Adds Tate: “Microcontrollers often have dozens of variations, with different serial I/O protocols (I2C, SPI, UART, etc.) in older process nodes. Now that advanced microcontrollers are moving into 40nm, where mask costs are about $1M, customization can be done effectively using embedded FPGAs.”

Hugh Durdan, vice president of design IP marketing at Cadence, agrees. “The traditional microcontroller market created lots of variants of devices, each one tailored by the number and types of I/O interfaces. They did that because the applications were cost sensitive, and more general-purpose parts couldn’t bear the cost. That worked fine for .25 microns, where a mask set cost $50,000 and you could crank out a device with a month of engineering. But that has all changed. Even simple IoT devices are 55nm going to 40nm,so mask costs are rising. An FPGA could be one way to address that. Even though it would add fabrications costs, it would decrease NRE—one device which is re-purposable for many applications, thus saving the upfront costs.”

Independent of the intended market, the makers of embedded FPGAs are finding that they are no longer being targeted at low- and moderate-volume applications. “Embedded FPGAs are being integrated into high-volume SoC and MCU chips and will enable new applications and architectures not previously feasible,” says Tate.

Bryan Ramirez, emerging markets group manager at Mentor Graphics, sees many potential areas where they may be successful. “FPGAs excel at data processing, applications where massive parallelism can be used, and anywhere that re-configurability is important. Applications that will benefit from embedded FPGAs are those with heterogeneous processing applications. Data center acceleration is an obvious choice because of the ability to accelerate various search algorithms on the FPGA by running alongside the main processor. eFPGAs could also be used for video processing alongside an MCU used in autonomous driving.”

Machine learning, big data
One area where FPGAs could find a role is in machine learning, which is one of the hot buttons for technology these days.

“We are just learning how to build machines that do not need to be programmed,” says Drew Wingard, chief technology officer at Sonics. “There is a huge amount of discussion about the most effective machine topologies and the necessary precision of the arithmetic that is needed, and everything else in between. Using FPGAs while network topologies are still being learned seems to make a fair amount of sense. So they do require the flexibility of an FPGA because aspects of the hardware are not fully understood yet.”

They also could find a role in data centers, where an explosion in data is driving a big effort to reduce the cost of powering and cooling servers.

“With FPGAs, you are taking what would have been implemented in software on a host processor and executing that same function in a more power-efficient manner in the hardware programmed into the FPGA,” says. Durdan. “But the algorithm still gets updated regularly. It still needs to be a programmable software model. Even in the data center it’s not just the algorithms for what they want to do. There are multiple algorithms that they may want to use at different times.”

All of this is currently being achieved with a high-end CPU, and an FPGA connected to that CPU on a board. “Microsoft (Burger FPL2016) and others have shown that FPGAs have significant benefit as co-processors, but the FPGA-to-CPU connection is a performance bottleneck,” says Tate. “By building a Xeon-plus-FPGA chip, first with chip-to-chip interconnect, and then later with a single die, the bottleneck is broken. Performance goes up, and cost/power goes down as packaging and high-power SerDes disappear.”

One of the big advantages of embedding the FPGA into the SoC fabric is the number of pins that become available. “Consider a typical mid-level FPGA device that may have around 300 pins,” says Blake. “When embedded it is possible to have a lot more pins—around 16,000. And even though they may operate at a lower frequency, they still provide over an order of magnitude more bandwidth and much lower latency.”

But of course, there is a price to pay. “An FPGA is roughly 25X less efficient in area than custom logic,” says Wingard. “A processor is not 25X worse than an FPGA. It could be 5X worse, but that depends upon what you are trying to do. A small embedded FPGA is going to have how many equivalent gates? If it is 25X less dense and you want 10,000 ASIC gates, then 10,000 ASIC gates in an FPGA is 250,000 gates. I could build 10 small 32-bit processors in that area.”

Wingard adds that FPGAs tend to want the largest number of metal layers a foundry can provide. “Embedded FPGAs will not escape this problem.”

Menta’s Dupret is not fazed by that. “With the newer technologies, the metal stack is quite high for most nodes, with at least eight or nine levels when talking about 28nm or 14nm. This is an advantage for embedded FPGA because this means that you can decrease the area taken by the FPGA by having easier routing.”

The ecosystem
The success of an embedded FPGA will depend on many factors, and the ecosystem that surrounds the product is complex. Consider first that an FPGA IP core is a lot more complex than a memory compiler. Not only do you have to supply the number of logic blocks that you may want, but the amount of memory, the form factor, the number of type of embedded hard cores, such as DSPs, as well as factors such as the technology node that you want the IP delivered in. That requires a complex tool suite, including all of the models and views necessary for the integration.

“It is not quite an FPGA compiler in the same way as you would think of a memory compiler that may complete in minutes,” says Blake. “But it can be done in weeks. What we deliver is a GDSII full custom IP building block and the necessary models for signal integrity, timing, test models and programs and documentation.”

Vendors also are trying to add the maximum amount of flexibility into that process. “While we have a DSP block, and are working on others that are more efficient for certain types of applications, we can also use the customer DSP and integrate that as a black box inside the eFPGA,” says Dupret. “This means they can keep their competitive advantage because they often have the knowledge and differentiation with their competition.”

Then the integrator of the core has to be able to program the FPGA. The standalone FPGA vendors invest just as much in that toolchain as they do in the hardware itself. It requires logic synthesis engines, place and route, Design for Test (DFT) methodologies, timing engines and a lot more. Each of these tools chains is custom for the core that has been created.

Consider just one tool that may be required. “Effective synthesis often provides as much design optimization for power and performance than even the physical core itself, and FPGA synthesis is often more advanced than ASIC,” points out David Kelf, vice president of marketing for OneSpin Solutions. “Given the level of synthesis optimizations provided, a sanity check on the design logic versus the RTL code is key. This requires the use of specialized Equivalency Checking (EC) that can cope with the sequential nature of these aggressive optimizations. Indeed, as these cores find their way into ASIC designs, it may be harder to prototype the design on the FPGA. That’s because the ASIC that contains the core may still be in production, making EC for the FPGA design as essential as EC for the ASIC itself.”

An extra layer of complication is added if the end customer of the chip that contains the embedded FPGA is also to be allowed to reprogram it.

The entire ecosystem is one area in which many of the eFPGA providers will find themselves competing aggressively. Initially, it appears as if their focus is the one where the SoC developer will be doing the programming. This is because of the tool chain provided. “The ability to unroll things, as is often done in a programming language, and build a parallel implementation for those functions could offer substantial acceleration,” says Achronix’s Blake. “Today, we program them using hardware description languages (Verilog and VHDL), and we offer libraries of functions. This is an evolving space, and as languages improve for writing and translating software to hardware, those things will improve.”

Menta’s Dupret agrees. “We are based on standard HDL, so it is a standard FPGA flow. For datacenters there will be a need to go from a higher-level language, and to be able to separate what will run on the CPU and what will be in the FPGA.”

Many of the existing High Level Synthesis (HLS) flows are not yet targeting these devices. “With the advent of HLS, the ability to write in a software-like language and turn it into compiled hardware has made a lot of strides,” explains Cadence’s Durdan. “With the advent of machine learning and neural networks, the application space has become hot. Today, most of the deployments are not with an embedded solution but with the FPGA sitting on a board next to the processor.”

The market is still very young and changing rapidly. “We are now at the tipping point for the cost of the technology,” says Blake. “At 16nm you can integrate enough functionality to cost effectively add FPGA acceleration to an SoC.”

But this is not for everyone. “If you don’t need the flexibility or don’t need to change the function of the algorithms, then eFPGA does not make sense,” says Dupret. “It will always be more efficient with a standard design. But if you are working with cryptography or neural networks, they are pretty inefficient to run on a CPU.”

But perhaps more skepticism remains with the necessary tools. “I’m not convinced that it is possible today,” says Mentor’s Ramirez. “This is because of the difficulties associated with creating effective and usable implementation software. I struggle to see how embedded FPGAs can succeed unless one of the established FPGA vendors provides implementation software along with their embedded FPGA fabric.”

And that is always another possibility. With the rapid costs declines in 2.5D integration, and with the FPGA vendors having proven to be technology leaders in this area, they could come out with their own solutions without some of the headaches.

Related Stories
Preparing For The IoT Data Tsunami
As the number of interconnected devices increases, technology challenges grow.
Need a low-power device design? What type of processor should you choose?
FPGA Prototyping Gains Ground
The popular design methodology enables more sophisticated hardware/software verification before first silicon becomes available.
Menta Embeds FPGA Programmability
What could be worse than developing the infrastructure that goes along with processor IP? Perhaps the tool chain required for an embedded FPGA.