Embedded FPGAs Come Of Age

These devices are gaining in popularity for more critical functions as chip and system designs become more heterogeneous.


FPGAs increasingly are being viewed as a critical component in heterogeneous designs, ratcheting up their stature and the amount of attention being given to programmable devices.

Once relegated to test chips that ultimately would be replaced by lower-power and higher-performance ASICs if volumes were sufficient, FPGAs have come a long way. Over the last 20 years programmable devices have moved steadily up the food chain from glue logic to co-processors, and they have been utilized in a variety of high-performance, mission-critical applications from data centers to supercomputers.

Now they are being embedded into devices alongside a cluster of CPUs, utilizing the same bus structure for pre- or post-processing, as a way of reducing reduce the main processor cluster’s load. Embedded FPGAs also are being used for network acceleration, performing packet processing, deep packet inspection, encryption/compression or other types of package processing before the switch or CPU structure has to decide what to do with that information. And in the wireless segment, embedded FPGAs are being used as a digital front end, performing linearization, pre-distortion, and other tasks between the power amp and the radio card, or in the communication link.

Because these devices are programmable, they can be used to optimize systems as they are being used. Mobile base-stations have relied on FPGAs for years, primarily because the volume was low and power is not a major issue and pricing is not nearly as critical as in consumer or mobile devices. “But what they’d like to do is reconfigure the FPGA as the thing is actually running,” said Dave Kelf, vice president of marketing at OneSpin Solutions.

For some time, base-station providers have been interested in embedded FPGAs because processors are needed for handing the protocols back; as well, they use a lot of fixed functions for the modulations schemes and error corrections, among other tasks. “This is a fixed function that has to work very efficiently and very quickly, so that belongs on a ASIC to get the benefit of the ASIC process to make that work as fast as possible,” said Kelf. “But there may be times when different modulations schemes are used, and it would be nice to be able to change that when needed.”

Embedding chips inside other chips is nothing new. Starting in the 1990s, Xilinx embedded PowerPC processors inside of FPGAs, and both Altera and Xilinx embedded ARM CPUs inside of their FPGAs. What has changed is the dedicated SoC functionality of FPGAs, where there are multiple processors plus the caches, along hardened interfaces where there is a little bit of an SoC function instead of just embedded processing, said Joe Mallett, senior product marketing manager at Synopsys.

“When you embed SoC functionality into an FPGA, you simplify the board design,” Mallett said. “You eliminate parts off the BOM (bill of materials) so you don’t need a microprocessor sitting there anymore booting the FPGA, and getting everything up and running.”

Embedded FPGAs also bring power and cost benefits, said Steve Mensor, vice president of marketing at Achronix. “This approach does increase the ASIC size, but at the same time you’re getting rid of a very expensive component. You’re reducing the board area. You’re getting rid of a lot of periphery components, and you’re pulling out a massive amount of cost. Also, the power is cut in half.”

Whether or not to embed the FPGA comes down to how the system will be enhanced.

“When you do chip-to-chip communications, regardless of whether it is FPGA-to-ASIC or any chip-to-chip, particularly in high bandwidth applications, you’re going through some type of high-speed pipe,” said Mensor. “That latency doesn’t slow down the maximum processing speed, but because there are interactions back and forth, it slows down the performance of the overall system. If you can eliminate that latency, there are huge increases in system performance. As such, with an embedded FPGA, it’s something that’s much more akin to infinite bandwidth between the ASIC. Here, the FPGA functionality, because it is a wire-to-wire connection inside the ASIC, has a maximum latency of one clock cycle. If you want to use the periphery register, it is zero clock cycles if you want to go straight into the logic section.”

There are also gains to be had as far as bandwidth is concerned, Mensor noted. “If you look at FPGAs in particular, the more total throughput on the chip, the more expensive that chip is. This is thanks to a plumbing problem. If you have to put on more pins, the bigger the die area gets, and the bigger the package.”

Mallett agreed. “The first thing you see in an embedded FPGA is that it’s simplifying the BOM, and reducing cost. Embedded FPGAs are also being implemented in advanced technologies such as 28nm and beyond, so you’re getting the performance capabilities and low power advantages of the lower process nodes. There’s a lot of general functionality that people traditionally had been putting in an FPGA, and it just makes sense to make it part of the SoC and provide that value as the hardened subsystem. Then the FPGA fabric is recouped for other functionality that you may be able to utilize in your system.”

Another unique feature of embedded FPGAs is that they are tightly coupled with the processor. “They are not coupled through a high-speed interface, so the performance is better if you are doing things in software and hardware as a combination. Then the scalability of the FPGA means it can be fit to a particular application,” Mallett said.

Interconnect is key
All of this may sound straightforward enough, but the mindset for using FPGAs has changed significantly.

“FPGAs used to be second-class citizens, where the main cool thing—the main performance-related processing stuff—was done by the CPU or the main ASIC or SoC, and some minimum tasks were given to the FPGA,” said Anush Mohandass, vice president of marketing and business development at NetSpeed Systems. “What the CCIX Initiative did was make FPGA a first-class citizen. It now accesses the same memory. It accesses the same information as the main processor would. Companies like Microsoft figured out they could accelerate workloads and how processing was done by giving it to an FPGA.” (Microsoft Labs published a paper at Hotchips 2014, detailing an almost 20% reduction in power and improvement in performance by offloading to an FPGA.)

This spawned the idea that it was possible to do things in an FPGA, and have a chip-to-chip interconnect that connects with the main processor, he said.

“The embedded FPGA takes that one level higher, and this is what many datacenter providers such as Facebook, Alibaba, and Amazon are realizing,” Mohandass said. “Their software is moving is very fast. There are new algorithms. They want to accelerate searches, or how fast they can get to a shopping list, and that changes in a one-month or three-month cadence. If you want to do a hardware chip for this, it will take three years, and by the time you get it the algorithm is out of date. An embedded FPGA allows for a piece of an FPGA to be put in an SoC, and all they need to change is that portion. That’s software-programmable hardware, where you gain the benefits of the rest of ASIC or SoC and still get the programmability.”

Verification challenges
When it comes to verification challenges for embedded FPGAs there are a few significant ones, Frank Schirrmeister, senior group director, product management in the System & Verification Group at Cadence pointed out. “The first is how to verify the chip while you’re building it, and that’s really a very singular problem. There is also the verification challenge for people who use that chip with an embedded FPGA. If this is a chip which has an embedded FPGA like a Xilinx Zynq, they have a fairly comprehensive verification flow that they provide including, for example, representations of the processor subsystem that is in there, along with a software view for software development. The verification challenge really comes to what you are now adding on to the existing system.”

And, when it comes down to it, for rightsizing memory and performance, assessments are done to determine, for example, whether to add a piece of functionality in software, or add a dedicated accelerator. “The interesting thing when you have an embedded FPGA is that the user can really try it out, so to speak. In the ASIC world, everything is pretty final when you’re getting to tapeout,” he added. “But in the embedded FPGA world, when you have an FPGA there you really have to understand how moving the functionality out into hardware helps the performance, and so forth. Some of it can be tried out, and that’s where virtual platforms play a bit, as well, where TLM models are added in. For the FPGA vendors, much of this is high level synthesis, which allows the engineer to get into the block. Another intriguing and interesting concept here is using OpenCL programming for these devices.”

Related Stories
Embedded FPGAs Going Mainstream?
Programmable devices are being adopted in more market segments, but they still haven’t been included in major SoCs. That could change.
Need a low-power device design? What type of processor should you choose?
FPGA Prototyping Gains Ground
The popular design methodology enables more sophisticated hardware/software verification before first silicon becomes available.