Prototyping Partitioning Problems

Gap widens between increasing design complexity and FPGA capabilities, making this a lot harder than it used to be.

popularity

Gaps are widening in the prototyping of large, complex chips because the speed and capacity of the FPGA is not keeping pace with rapid rollout pace of advanced ASICs.

This is a new twist for a well-established market.

Indeed, prototyping with FPGAs is as old as the FPGAs themselves. Even before they were called FPGAs, logic accelerators or LCAs (logic cell arrays) were used by engineering teams to put together prototypes of their designs. The engineers immediately latched onto the fact that they could make a temporary model of what their chip was going to be, which back then was something like a 2K gate array.

“Engineering teams were saving maybe a few tens of thousands of dollars in NRE by doing FPGA logic accelerators first,” observed Doug Amos, product marketing manager for ASIC prototyping at Mentor, a Siemens Business.

Fast forward a few dozen years, and not much has changed on the FPGA side. But during that time, designs have gotten bigger and far more complex.

“The biggest FPGA, which is what we care about, goes in steps,” Amos said. “At the moment, we’ve been on the same FPGA for the last two years, and that probably won’t change for another year or so. You end up with what equates to an elastic tension between what an FPGA can do and what the SoC needs it to do. The FPGA falls behind, and then one day catches up again in a big step. This discontinuity happens every three or four years because there is a lot involved in bringing a new FPGA to market.”

The usage of prototyping boards is definitely growing especially with the growth of FPGA capacity and I/O, pointed out Zibi Zalewski, general manager for Aldec’s Hardware Division. “The problem is complexity of the designs is also growing which has made the prototyping boards a very advanced, multi-discipline project itself. Verification teams who require such tools face a challenging decision to make – design in-house prototyping board or buy one from a specialized vendor? In-house designed boards require to have the additional engineering team focused only on designing, producing and testing of such boards, with multiple problems to resolve on the way related to even such things like parts logistics or purchasing. Not to mention before we start using the board for our target project, we actually need to develop diagnostic designs which can verify the board after the production and many times during the typical usage.”

This is not as trivial as it looks, he stressed. “When there are several big FPGAs with hundreds of interconnection lines and multiple peripherals to validate, the result will be the board required for the company needs, but at much higher cost and longer lead time than purchasing the boards from a prototyping vendor. The board purchased from a prototyping systems provider is usually a universal board, which could be reused in multiple projects and even in different verification modes, it doesn’t require to have the separate design team, warehouse and delivery time is simply shorter. If you look well some of such vendors provide even FPGA partitioning tools which is another challenge to face in the prototyping project, not at all less complicated than the board itself.”
 
Both Xilinx and Intel (which bought Altera in 2015) consider prototyping in FPGAs to be a special-use model. These are typically high-priced, low-volume applications with some very specific needs.

“The prototype market is really about creating the biggest plate of gates that you can, and what you do on that plate of gates is probably different from how you would do things for other markets,” said Amos. “If you look at a device like the Xilinx UltraScale VU 440, it’s much too expensive to consider using in production applications. There might be some weird corner cases in the network area where you might use it as a switch, but it’s intended for FPGA prototyping. Taking that to logical extremes, a good FPGA company would be looking at how to make another device maybe even bigger or maybe even more tailored to prototyping. To some extent, the FPGA vendors realize this is a market that’s here to stay. People have been prototyping in FPGAs for 30-plus years now, but how do they do something better than the other guy so they can bring in tens of millions of dollars? The silicon sold into the space is worth a lot of money.”

There is no simple answer to that question, because none of them is keeping pace with ASICs. At this point, ASIC designs are faster than FPGAs, and both have different approaches to clocking and memory subsystems. Moreover, because the ASIC designs are larger than FPGAs at this point, they need to be mapped onto multiple FPGAs with the maximum number of gates available today on an FPGA prototyping board up to about 20 million, Amos said.

Recombination issues
That creates problems when those different pieces are put back together. “Designs have to be broken into FPGA-sized pieces, and then connected together, but those connections run at a much slower rate than inside of the FPGA,” said Dave Kelf, vice president of marketing at OneSpin Solutions. “There are tools that try to automatically do this partitioning. Other engineering groups do this manually, which means they are looking at their design and breaking it up themselves, which is a real pain. It’s very hard to do and leads to all kinds of trouble. Usually it means the rapid prototype’s design isn’t like the final design is going to be, which creates all sorts of other issues.”

Equivalency checking can help here. “If you’re dealing with a synthesis tool that does aggressive optimizations, the more aggressive they are, the more likely it might break the design,” Kelf said. “Equivalency checking can make sure the design is still the same as it was from the RTL to the gates after the optimizations. If you have that equivalency checking and it will point out where the errors are, then it allows you to use all of the optimizations while safely knowing the design functionality is preserved. Therefore, the optimizations can be switched on. Almost certainly there will be errors in there that can be quickly diagnosed, tweaked and figured out.”

Juergen Jaeger, product management director in the System & Verification Group at Cadence, agrees that logic equivalence checking can be used to a certain degree, especially when going through the RTL transformation—from RTL synthesis to the gate level netlist. But he said there are limits to what formal verification can do here.

“You can constrain that enough so you don’t get too many false positives from the formal verification tool,” Jaeger said. “But where it breaks down is when you go through the implementation step, where you now take that netlist and map it into the FPGA structure. There, you need different ways of doing it because formal verification would create nothing but false positives. Now you are mapping into a totally different structure.”

A technology called post-partition verification creates a model (Verilog netlist) that includes all of the FPGA-specific transformation in terms of clocks, pin multiplexing, multi-FPGA partitioning, memories, and it can be used either in an emulator or simulator to verify that the functionality got preserved, Jaeger said.

Partitioning issues
The partitioning of the design creates another set of challenges.

“One approach is to say you’ve got a huge SoC made up of blocks of IP with some kind of interconnect between them,” Kelf noted. “You could easily take a card with a whole bunch of FPGAs on it, and for each block in the SoC you place into one FPGA you can wire them up the way the SoC is wired up. There’s your rapid prototype. That would make sense. Years ago, that would be exactly what they did and it worked fine because the interconnect on the ASIC was asynchronous and slow, so you could get away with having the slow interconnect between the FPGAs. Nowadays, the interconnect has sped up dramatically with networks on chip, and the blocks on these SoCs are far bigger than what you can fit on the biggest FPGA devices, so they can’t get away doing that anymore; they can’t model an SoC in this kind of fashion. This is the big difference that design teams have come across in the last few years. They have to go in there and break up this device. That’s a huge problem.”

Another problem is that SoC teams need to run the software as well as hardware.

“They want to boot Linux, among other things, so a lot of the FPGAs have an Arm processor built in them, which can then connect up with the logic,” Kelf said. “That works pretty well if the final device also has an Arm processor that is running a protocol stack. But because the device doesn’t fit in an FPGA, you may be using a protocol stack on one particular FPGA and communicating that with the other FPGA, which gets to be complex. In this way, the hardware/software system becomes broken by the fact that it gets spread across more than one FPGA.”

FPGAs are getting faster, but there’s still a big gap.

FPGAs are getting faster, but there’s still a big gap. As a result, designers have to allow for the timing issues, as well as make sure there are no bugs or something that looks like a bug.

Often, designers change the design to allow it to be prototyped. “They’ll say, ‘I’ve got this great baseband; it’s working really well but I can’t put it on one FPGA, I’ve got to put it across two,” he said. “Let’s figure out two logical sections. One can go on one FPGA, the other can go on the other FPGA, and set it up so the communication between those two sections can be quite slow and asynchronous, and it will work. If they do that up front, then putting it back together after it’s prototyped isn’t too difficult because they’ve already thought it through. Another group of engineers will have a design, not have thought about how they’d prototype it, get the whole thing working in a simulator and maybe even an emulator, then when they want to run the software they realize they have to break it up. So they start fiddling with it to get it on the prototyping board, they have to make changes inevitably, and then they find they can’t take the original design and fabricate it. They have to go back because they’ve tested it but the design has changed to make sure the software is working and they can’t recreate it.”

This essentially breaks the design flow, forcing teams to go way back up front and figure out how to redo the design so it works for the synthesis process and place and route, and that it works in the prototyping system. The impact on time to market can be disastrous.

Amos agreed the best place to think about this issues is at the very beginning of the project. He even wrote a book on this topic. (“FPGA-Based Prototyping Methodology Manual: Best Practices in Design-For-Prototyping“)

“The idea was, exactly that,” Amos said. “It was, ‘Hey guys, we know there are all sorts of design for’s—design for test, design for low power. Design for prototyping just means keeping things portable. Don’t put anything into the design that’s specific to your end silicon if you at all can avoid it. If you have to do it, put it in a way that’s easily swappable with something that’s equivalent. Ask yourself: How do I write code that is portable and all these good things? Additionally, how do I make sure that the guys at the start of the project care about me if I am doing the FPGA prototype job and if the software guys want something fast that they can run on? How do I make sure that that happens much earlier in the process? Shift Left is the cliché. We want all of this to be considered, and it is the procedural stuff within the company and the way companies are organized that prevents this happening. The RTL writers and the people creating the chips are under pressure, too. They have to get their job done, and the FPGA guy comes in the room and says, ‘If you do it a little bit differently, it makes my job easier.’ The RTL guys are going to say, ‘Get out of here. I’ve got enough to do getting this chip working without making your job easier.’”

However, Amos stresses perseverance. “If you get to the right places within these companies there is somebody somewhere who has oversight of the entire thing and he understands. Software guys, they are late in the project but yet they have a lot of work to do. Increasingly, the value of the system you’re going to sell is in the software so you have to be thinking about the software guys sooner in the process. If that means giving them a prototype sooner, if that means making the RTL easier to prototype, it ripples back to the left side of the program that somebody somewhere will say, ‘Let’s help these software people get something done quicker and if that means Design For Prototyping, great, let’s do it.’”

Another approach is to use the prototype to do block-level prototyping. Then the entire design can be taken across to the emulator, which is fine if what you’re doing is normal RTL verification. But the goal here is to create a model for the software engineers. There are a lot of software engineers, but there aren’t so many emulators, so you end up with a bottleneck.

Hybrid solutions also are being adopted, whereby a design team can keep part of the design on a host and run a less accurate version of that part of the design. “This is typically where people use Arm Fast Models, and keep the processor system running over here in a virtual environment in SystemC, and then have transactor interfaces into the novel bit – the RTL that they want to test at the block level – in the prototype. People have been doing this with emulation for years. Now they are doing it with prototypes, as well, so that they don’t have to worry about squeezing all of the design into one FPGA,” Amos added.

The future
Looking ahead, higher-density FPGAs will achieve much higher capacity ranges.

“Traditionally, FPGA-based prototypes were used for subsystems, IP blocks, and just parts of the design due to the various partitioning and mapping issues,” said Cadence’s Jaeger. “Now, with the partitioning and clocking technologies, the next step will be prototype systems with hundreds of FPGAs instead of four, five or six, as we do today. That will allow users to prototype multi-billion-gate homogeneous designs in it.”

This may infringe on emulation systems, so users need to understand which system does what best. Emulation will still have better hardware debug capabilities, while high-capacity prototype systems will allow higher performance for targeting hardware/software integration and software bring-up speed. In that way, they are complementary, he reminded.

Capacity is the other half of this equation, Jaeger added. “If you think about large ASICs today—for instance, what goes into a cell phone—there are a lot of different interfaces, such as keyboard interaction, screen, Bluetooth, wireless, LTE, USB, audio, video. In order to verify all of these external interfaces with the software that drives those interfaces, you need some level of virtualization for those interfaces. It’s simply not feasible to have all of these interfaces connected through an emulator or prototype system as physical hardware, so more and more virtualization will happen on the periphery, on the peripheral devices, and on the interfaces in order to be able to verify these larger and more complex ASICs and devices.”

As the industry awaits the next bigger and better FPGA for their prototypes, there is plenty of technology that takes the challenges into account to keep engineering teams busy and fill in the gaps where the FPGAs fall short.