Are You Virtually Fast Enough?

Key factors to understand for speedy virtual prototypes.

popularity

During high school my class was divided into two camps: Ayrton Senna and Michael Schumacher fans. Regardless of either camp, the real heated question was whether the driver or the race car is more important. Over the years I lost track and I don’t remember if any of those discussions ever concluded. So now, 25 years later, let me try to engage in a slightly similar but hopefully more balanced investigation: What makes a virtual prototype fast?

Let’s be clear, being fast during simulation is not enough to win with virtual prototyping. Being early with the creation of the virtual prototype and supporting a wide range of use-cases are more important factors to consider. Also, virtual prototype speed really depends on its intended use case.

  • Interactive use cases require running hundreds of iterative code, compile, debug cycles without delay.
  • Regression use cases are bounded by the number of tests, the number of available machines and available server time.

Furthermore, the required simulation horse power certainly depends on the complexity of the system that needs to be simulated. Below is a list of key technology factors that will turn the needle:

  • Optimizing single threaded simulation performance
  • Applying parallel simulation concepts to the SystemC kernel
  • Using Checkpoint strategies to avoid simulation runs
  • Using hardware acceleration capabilities

Optimized single threaded simulation performance
SystemC is essentially a single threaded application and the CPU cycles are either spent in the kernel or in the SystemC models. Concepts like temporal decoupling, quantum and direct memory accesses (DMI) are described in the TLM 2.0 standard to reduce the kernel overhead to a minimum. This leaves the biggest impact with the Instruction Set Simulator (ISS) and with the user’s SystemC modules. Writing optimal SystemC code requires good coding methodology and coding diligence. It’s hard to spot bottlenecks just by looking at the code so one of the key tools to improve single threaded simulation performance is a SystemC aware profiler. A good profiler lets you monitor parameters like real time performance or kernel activation count and drill down by either SystemC module or SystemC time. We have seen even senior virtual prototyping teams improve their platforms performance by 2-5x just by being able to locate and isolate their bottlenecks. A good profiler lets you harvest significant speed-ups by identifying the low hanging fruit.


SimSight Profiler in Synopsys Virtualizer typically unlocks 2-5x speed-up

Parallel simulation concepts for the SystemC kernel
The basic idea of parallel simulation technologies is to distribute the simulation onto multiple host CPUs. The following two technologies help to yield good results:

  1. Partitioning the virtual prototype into separate subsystems, often at the natural boundaries of off-chip interfaces. Each chip then runs as a separate simulation and the environment takes care of dynamically synchronizing the separate simulations kernels. Depending on the partitioning and the nature of the application this technology can yield a linear speed-up.
  2. Automatic mapping of the compute intensive parts inside a singular virtual prototype onto different host CPU cores. This is done primarily for the ISS, which is naturally the most compute intensive piece inside the virtual prototype. The ISS needs to be enabled for this purpose. This option yields the largest benefit when there is high parallelism in the embedded software that is running on the ISS.


Example of MultiSim to manage synchronization and communication of separate virtual prototypes that run on different host cores and MultiCore to distribute simulated cores onto different host cores.

Checkpoint restart to avoid simulation
Checkpointing enables skipping simulation by reloading a previously saved simulation state of the virtual prototype. This is a very powerful approach for a fault injection use case where it takes significant time to simulate to the initial fault injections point. This can only be used when the virtual prototype and software don’t change between runs. The software developer can now reload the checkpoint and inject thousands of different error patterns.

Hardware acceleration capabilities
To win the race to shift left, prototyping teams are often faced with the trade-off decision to either develop a SystemC model from scratch of reuse existing legacy RTL code. Creating a fast model takes effort, reusing RTL in co-simulation has a huge simulation speed penalty. For these situations hybrid hardware acceleration technologies deliver huge benefits where the existing RTL is running on FPGA or emulator that synchronizes with the virtual prototype at the transaction boundary.

So what makes a virtual prototype fast? It depends on the use case. With the right investment, software developers can have all the important technologies at their fingertips and accelerate where needed.



Leave a Reply


(Note: This name will be displayed publicly)