Adding NoCs To FPGA SoCs

As complexity and device sizes rise, so does the need for an on-chip network.


FPGA SoCs straddle the line between flexibility and performance by combining elements of both FPGAs and ASICs. But as they find a home in more safety- and mission-critical markets, they also are facing some of the same issues as standard SoCs, including the ability to move larger and larger amounts of data quickly throughout an increasingly complex device, and the difficulty in verifying and debugging any problems that might show up along the way.

FPGA SoCs are a hybrid device, and they are gaining traction as chipmakers and systems companies are tasked with completing more designs per year, often in markets where protocols and algorithms are still in flux, such as automotive, medical devices and security. Using a standard FPGA chip can provide the necessary flexibility, but only an ASIC can meet the higher performance requirements, both for new and existing markets such as aerospace. FPGA SoCs offer a compromise solution that basically splits the difference, providing some of the performance and low-power benefits of an ASIC and the flexibility to avoid early obsolescence.

But this level of complexity also adds issues that are very familiar to SoC design teams.

“The complexity and capabilities of FPGA have grown so much that you can build big systems with multiple interfaces and protocols in a single FPGA, and such designs require a fabric to integrate different IP and hardware modules working at various clock domains and data protocols,” said Zibi Zalewski, general manager for Aldec’s Hardware Division.

Modern FPGAs — especially those with hard embedded processors and controllers — fit somewhere between traditional logic FPGAs and ASICs, with a nod to the ASIC direction. “A NoC is definitely needed, because having a NoC simplifies the interfacing from the verification point of view,” Zalewski said. “A NoC in the design allows the engineering team to manage the top-level interfacing, which can be further used to create a main prototyping channel to the host computer or a transactor for emulation, instead of multiple interfaces that increase the complexity, time and cost of the verification process.”

This has some interesting implications for FPGA SoC tooling. FPGA vendors generally sell their own tools with their hardware, which has made it difficult for EDA vendors to make a significant dent in that market. But as these two worlds begin to merge, there are questions about whether the kind of complex tooling and IP that makes a finFET possible, for example, also may be required inn an FPGA SoC—particularly in safety critical applications where traceability is required.

“When using high-capacity FPGAs for design verification and prototyping purposes, one of the key requests is for appropriate debug capabilities,” said Juergen Jaeger, product management director at Cadence. “However, the architecture in today’s no-NoC FPGAs makes it challenging to provide such debug features, mostly due to finite (limited) connectivity resources in the FPGA, especially as all the FPGA-internal routing resources are needed to implement the design itself and run it at sufficient-enough performance. Also, debug requires being able to access as many internal design nodes as possible, ideally all, and route those probe points to the outside. This is almost impossible, and results in many challenges and debug shortcomings. This is where an FPGA-internal NoC could help, as it would provide the ability to probe many nodes locally, route the data through the NoC to an aggregator without wasting precious FPGA routing resources, and then export the debug data through some standard interface, such as gigabit Ethernet, to the outside world.”

Not all FPGAs will need NoCs, however. “It might help if the design is a data-path heavy design, moving a lot of data around,” Jaeger said. “However, if the design is more control-centric, and/or requires the highest possible performance, the inherent latency and non-deterministic nature of a NoC might be counterproductive. It will also require new FPGA design tools that can take advantage of a NoC component inside an FPGA.”

Fig. 1: Intel’s FPGA SoC lineup. Source: Intel

Lower power
ASICs inherently are more power-efficient than FPGAs. Now the question is how much power overhead can be shaved off by combining these devices and utilizing some of the low-power techniques that have been developed for SoCs, such as more efficient signal routing through a NoC.

“The NoC enables FPGA resources to be shared by IP cores and external interfaces and facilitates power management techniques,” said Aldec’s Zalewski. “With a NoC, the FPGA logic can be divided into regions, each of which can be handled by individual NoC nodes called routers and turned off selectively into sleep mode if not used.”

This notion of flexibility is what drove the formation of the CCIX Consortium, which was founded to enable a new class of interconnect focused on emerging acceleration applications such as machine learning, network processing, storage off-load, in-memory data base and 4G/5G wireless technology.

The standard is meant to allow processors based on different instruction set architectures to extend the benefits of cache coherent, peer processing to a number of acceleration devices including FPGAs, GPUs, network/storage adapters, intelligent networks and custom ASICs.

This is especially key when using FPGA to accelerate a workload. Anush Mohandass, vice president of marketing at NetSpeed Systems, noted that during the Hot Chip conference a few years ago, Microsoft said it wanted to accelerate image search in Bing using FPGAs rather than running it in a regular server. “They found higher efficiency and lower latency using FPGA acceleration for images, so that’s a place where FPGAs can come into the forefront. Instead of using it as a general-purpose compute, you use it for acceleration.”

In fact, Mohandass suggests this is the genesis behind the CCIX moment. “Even when Microsoft did it and said, ‘We have the Xeon processor, that’s the main CPU, that’s the main engine — when it detects something that the FPGA can do, it offloads it to the FPGA. If that is the case, why should you treat the accelerator as a second-class citizen? In CCIX, acceleration literally has the same privileges as your core compute cluster.”

There are other technical issues with today’s advanced FPGAs that may benefit from the structure of a NoC, as well.

“Each FPGA fabric can look like an SoC just in terms of sheer gate count and complexity,” said Piyush Sancheti, senior director of marketing at Synopsys. “But now that you have all this real estate available, you’re obviously jamming more function into a single device, and that’s creating multifunctional complexity as well as things like clocking. We see that clocking structures in FPGAs are becoming much more complex, which creates a whole bunch of new issues.”

IP reuse
This simplifies design reuse, as well. “Typically, if the design is in any kind of a SoC environment, whether that’s implemented on ASIC or FPGA, the more IPs that are integrated, the more asynchronous clocks there are in the design,” Sancheti said. “There may be a PCIe running at 66 MHz, there may be other aspects of the design that are running at a much higher frequency, and these by design are not synchronous with each other. What that means, essentially, is that there is logic operating at different frequencies, but this logic is communicating with each other. This causes clock domain crossing issues. How do you make sure that when a signal goes from a fast clock domain to slow, and vice versa, that the signal is reliable, and that you don’t have meta stable signals, where essentially the timing of those signals is not completely synchronized?”

Just like an SoC design, a very complex synchronization scheme is needed, along with the tools and methodologies to ensure the proper synchronization is in place. “Everybody who’s doing anything more than jelly bean FPGAs has a complete methodology around the clock domain crossing verification, which is actually somewhat new to the FPGA design community,” he said. “If you map all of these challenges to design flows and methodologies, there are new things being added to their flows that historically they didn’t need to worry about purely because they didn’t have that many IPs and they didn’t have that many clock domains to deal with. It goes back to the simplicity of the design and the end application. As FPGAs become more SoC-like, unfortunately they have to deal with all the challenges of doing SoC design.”

Bridging the gap
So are today’s FPGA SoCs enough like traditional, digital SoCs that all the same rules apply for a network on chip? The answer appears to be somewhat, but not completely.

“Both of the main FPGA vendors have proprietary network-on-chip tools, and if a user chooses to use one of those, they can hook up their functions using a form of network on chip,” said Ty Garibay, CTO of ArterisIP. “It is more of a conceptual approach to the system. Does it look enough like a standard SoC that it makes more sense to think of it as having a NoC as the connectivity backbone? Many FPGA applications do not. They look a lot more like a networking chips or backbone chips that are fundamentally data flow. Data comes in the left, you have a whole bunch of a munging units, and data goes out the right. That is not a traditional SoC. That’s a normal network processor or baseband modem or something like that, where it’s a data flow chip. So in those types of FPGA soft designs, there’s no need for a network on chip.”

But if it conceptually looks like a bunch of independent functional units that communicate with each other and are controlled generally by a central point, then it does make sense to have those connected with a soft network on chip, he said. “The next generation of high-performance of FPGAs are expected to contain hard NoCs built into the chip because they are getting to the point where the data flow is at such a high rate—especially when you have 100-gigabit SerDes and HBM2, where trying to pipe a terabit or two per channel through soft logic essentially uses all the soft logic and you’ve got nothing left to be processing with.”

As a result, that bandwidth is going to require a hardening of the data movement that is enforced in much the same way that processing enforces hard DSPs or hard memory controllers. Successive generations of FPGAs may be expected to look like a checkerboard of streets, where the streets are hard 128, 256, 512 12-bit buses that go from end to end in one or two cycles and don’t use up any soft logic to do it.

“Along with this would be the synthesis function that allocates on-ramps and off-ramps to those channels as part of hardening the function onto the FPGAs, because we’re moving so much data around I just don’t see how they can continue to do that in soft logic,” Garibay said. “That will be the coming of real NoCs onto FPGAs, because NoCs are always a good idea.”

Related Stories
FPGAs Becoming More SoC-Like
Embedded FPGAs Come Of Age
Tech Talk: EFPGA Verification
Tech Talk: EFPGA Timing
Tech Talk: EFPGA Programming

Leave a Reply

(Note: This name will be displayed publicly)