The best option for users is simply to use processor-based emulation and FPGA-based prototyping in a combined flow.
As the EDA is gearing up for its biggest industry event, the Design Automation Conference (DAC), this year in Las Vegas, it is interesting to observe what is going on in hardware-based development of emulation and prototyping. The trends I had outlined after last DAC in 2018—system design, cloud, and machine learning—have only grown stronger and are causing changes in the development landscape.
One of the key changes is how emulation and prototyping interact. Expect to see more of that at DAC 2019 as well. A great example from April this year of how emulation and prototyping are growing closer together and are used in a seamless flow was highlighted in a presentation from Toshiba at CDNLive Silicon Valley. They develop storage controllers. The following graph illustrates the joint usage of emulation and prototyping:
Source: Toshiba Memory America, Inc.
The initial bring-up happens on emulation. At one point, the firmware teams are enabled using full system emulation, followed by prototyping as a high-speed option for faster execution of the hardware under development. Toshiba achieved a factor of 4.6X in prototyping over emulation. From here on, the hardware teams continue to use emulation, and then later, prototyping for hardware regressions that use less debug. Even post-silicon, firmware development continues prototyping until the silicon platform is stable. And even then, to debug issues found post-silicon, users go back to emulation due to its superior debug. Toshiba describes this in more detail in an Expert Insights video called “Early Firmware Development on Palladium and Protium, Enables 1st Silicon Success at Toshiba Memory.”
So, is this proof that we are all done and good to go? Are we done with innovation in this space? Far from it!
Probably the first significant innovation in FPGA-based prototyping in more than a decade was the reduction of bring-up time by an average of 80% with the Protium S1 FPGA-Based Prototyping Platform in 2017. This was achieved by automating the bring-up process using techniques known from emulation, especially around clocking and the seamless integration of the native FPGA layout tools into the flow. Still, at the end of the day, FPGA routing needs to happen, while bring-up—especially if a design runs in emulation already—is greatly accelerated, and it will always have a hard time to match the bring-up time of processor-based emulation. Debug for hardware in FPGA systems either use intrusive debug that slows down the execution or uses capabilities like the native read-back functionality of the FPGAs, which is much slower than emulation.
To augment this, the item often misunderstood is the crucial difference between emulation with custom silicon and emulation-based on FPGAs, as illustrated in the following graph, showing how the user’s design logic is mapped into the underlying fabrics of custom processor and FPGA:
Source: Cadence Design Systems
In processor-based emulation like the Palladium Z1 Enterprise Emulation Platform, the logic is processed as a stream of executions mapped into a massively parallel array of our custom processors. For four million gates of a user’s design, we will use about 3000 custom processors. Compile is fast—we have pushed it well above 140 million gates per hour on a single workstation—and parallelization helps even further. If the design compiles, then it will run, because it is pre-scheduled cycle-based execution. And we have awesome debug with streaming, partial full vision, and full vision into all signals. The only downside is that the execution speed of the user design is capped in the low-MHz range. For more details, refer to the chapter I co-authored describing this process in the second edition of the EDA Handbook.
In contrast, in FPGA-based execution, one always needs to run an ASIC-style full P&R. With our automated flow, users see out-of-the-box speeds of about 5X that of emulation. Still, the compile is much slower than in processor-based systems—even with parallel execution, debug is less flexible—and when timing closure fails, the process must re-start.
Bottom line, the best option for users is simply to use processor-based emulation and FPGA-based prototyping in a combined flow, as shown by Toshiba in their example. And while in the above graph I am assigning hardware debug plainly to emulation and software bring-up to prototyping, this is simply meant to identify primary use models. In all practicality the lines can be blurry. Emulation will extend into software development, especially in hybrid configurations with virtual paltforms, and FPGA-based prototyping will extend into hardware regressions.
What’s next in this space? Well, DAC will likely show more innovation in this space. There are always more use models that can be added to emulation. There is more and more virtualization of the interfaces happening—not only in emulation but also in prototyping. And finally, in “AI and Machine Learning Drive New SoC Verification Choices,” I had outlined the traditional speed degradation of prototyping performance as designs grow bigger, specifically beyond 400 million gates. It’s about time that somebody addresses this issue.
See you at DAC!
Leave a Reply