The Agony Of Hardware-Assisted Development Choices

Users are adopting a continuum of development engines for virtual prototyping, RTL simulation, acceleration, emulation and FPGA prototyping.


“When defining a product, if you haven’t upset at least one part of the organization, then the product is probably ill defined and tries to address too many things!” That’s what one of my mentors taught me early on in my career as product manager. Ever since then I have been interested in portfolio management. The most recent announcement that we made on the Protium Rapid Prototyping Platform—our next-generation FPGA-based prototyping system—is a great example on how to address multiple customer requirements with a portfolio of adjacent products.

I have written before and extensively about the different pre-silicon execution engines and how one does not fit all, as part of the Cadence System Development Suite. The spider chart below shows 10 criteria that are important to customers, and how processor-based emulation like the Palladium XOP Series and FPGA-based prototyping measure up. Only emulation and FPGA-based prototyping are shown here. They also have to be considered in the larger context of virtual prototypes, with or without available models, RTL simulation, acceleration (as the combination of RTL simulation and emulation), as well as the actual silicon when it becomes available as engineering samples.


Major Requirements
The top three requirements on customer’s minds are execution speed, capacity, and cost.

“Execution speed” is often looked at as a primary concern for software developers, which is not surprising given the number cycles necessary to boot an operating system (OS). However, as my previous blog on how many cycles it took to verify the ARM big.LITTLE processor configuration, it is also an important parameter for verification. Traditionally, emulation is in the MHz range and FPGA-based prototyping in the range of 10s of MHz.

Capacity—the design size that can be mapped—is often a key criterion. In emulation, users can map billion-gate designs. For FPGA-based prototyping, the upper limit is typically in the area of 100 million gates.

The “marginal cost per unit” is usually measured in dollars per gate and is of key interest in balancing the investment customers make. Emulation reflects its value of advanced debug, shorter bring-up, earlier time of availability, and, of course, its versatility of use models in higher cost compared to FPGA-based prototyping.

Additional Requirements
Besides the top three requirements above, seven more requirements help to make the decision for the appropriate engine and are becoming cost functions helping to moderate the top three.

The “effort in addition to RTL” is an often-overlooked component. Emulation consumes RTL in an almost un-modified form to map it into the hardware. FPGA-based prototyping requires users to manually modify the RTL to map into the FPGA, as opposed to the actual target technology for which the RTL was meant. Memories have to be re-modeled, the design needs to be partitioned between FPGAs, clock domains must be managed, etc. Both FPGA-based prototyping and FPGA-based emulation require an actual layout of the partitioned design into the individual FPGAs, a process that often does not close on the first try and is very time consuming, even when using an array of PCs. In stark contrast, processor-based emulation like the Palladium solution maps into the hardware at 75 million gates per hour on a single workstation. This impacts power and cost considerations.

Hardware debug, a key requirement for hardware verification, is almost as great in emulation as it is in RTL simulation, with full visibility and interactivity while not slowing down the execution. In FPGA-based prototyping, hardware debug is less advanced, with probes inserted in advance and waveforms analyzed off-line after execution. In both FPGA-based prototyping and FPGA-based emulation, the instrumentation for debug slows down the execution and modifies the actual RTL.

Software debug is a criterion in which emulation and FPGA-based prototyping are similar in capabilities. Standard software debuggers like the ARM DS-5, GDB, and Lauterbach’s Trace32 can be attached via JTAG. Techniques are emerging that allow synchronized off-line debug of software traces together with hardware, perhaps giving emulation a slight edge.

Bring-up time is closely tied to the modifications that are necessary to map RTL into the hardware. We have users of processor-based emulation that map new RTL several times a day. In traditional FPGA-based prototyping, users would have to wait several months until bring-up.

Time of availability is a key difference and is tied to maturity of the RTL being mapped. Due to its fast compile time, processor-based emulation can be used very early in the development cycle when RTL becomes available. Due to its longer bring-up and efforts to modify RTL, FPGA-based systems are more suitable once RTL is stable, later in the design flow, but still prior to silicon availability.

Hardware accuracy is not a big differentiator between emulation and FPGA, and is in the diagram mostly to differentiate from other related technologies like virtual prototyping for which the hardware is abstracted. Emulation has a slight edge because of fewer modifications to RTL.

Finally, system connections represent how an engine connects to the actual chip-environment. Both emulation and FPGA-based prototyping fare well here as they can be connected to the system environment using rate adaptors (SpeedBridge interfaces). FPGA-based prototyping has a slight edge due to the higher possible speed that allows some interfaces to be executed natively.

Requirement Interaction
From a portfolio perspective, the interaction of the different requirements becomes important. In traditional FPGA-based prototyping, speed is achieved largely with significant effort in bring-up and manual optimization.

With the Protium platform, we are giving users a choice, indicated with the oval in the diagram. With an “out of the box”, automated flow, “adjacent” to Palladium emulation and reducing the bring-up time from months to weeks, users get speed significantly higher than emulation, but remain in the single-digit-MHz range. With more manual optimization—spending more time on bring-up—they can optimize the speed well into the 10s of MHz range, with black boxing and manual clock optimization, even into the 100MHz range. It’s a time-versus-speed tradeoff.

The other arrows in the diagram indicate some of the resulting improvements: less effort to re-model RTL, resulting in reduced bring-up time and earlier availability. We also improved debug quite a bit. The Protium platform allows monitoring and force/releasing of signals, internal memory upload and download, probes that capture data of predefined signals for off-line waveform viewing, waveforms across partitions showing a design-centric view rather than a FPGA-centric view, as well as runtime capabilities starting and stopping clocks (i.e., run “N” cycles).

The resulting change in our portfolio really adds in the use model that we call “throughput regressions.” Users today often consider the FPGA-based emulators in that domain—they have longer bring-up time than processor-based emulation and debug slows them down—but in this phase, hardware debug is really a bit less important as the majority of hardware defects in the design have been removed already and users are really focused on optimization of the throughput of regressions at optimized speed and cost points.

Bottom line, as part of their efforts to achieve a “shift-left” for software development, hardware verification, and hardware/software integration, users are adopting a continuum of development engines from virtual prototyping, through RTL simulation, acceleration, and emulation, to FPGA-based prototyping. The different engines are used both serially and in hybrid combinations. In our updated portfolio with the Protium platform’s compatibility with Palladium emulation, including the bring-up flow and the ability to re-use the Palladium verification environment, we just significantly eased the flow from acceleration and emulation, to throughput regressions requiring less hardware debug (users jump back to the Palladium solution if needed), to software development at higher speeds (the Protium platform works without the Palladium environment in the flow, too).

For more information, there are some great write-ups on the Palladium/Protium flow as well as a detailed presentation by Juergen Jaeger.

The curse of the product manager is that the results of portfolio decisions are a matter of the future, which is traditionally hard to predict! In this case, given the competitive landscape—one competitor focusing on FPGA-based emulation and not offering FPGA-based prototyping at all, the other competitor offering FPGA-based emulation and FPGA-based prototyping on two different systems based on commercial FPGAs—I am confident my team and I have positioned us well. Time will tell.