Exascale Emulation Debug Challenges

Length of tests, failure reproduction, and the sheer amount of data generated pose problems for emulation.


For years, semiconductor industry surveys have shown that functional verification is the dominant phase in chip development and that debug is the most time-consuming task for verification. The problem is getting worse in today’s era of exascale debug, in which software applications drive tests of more than a billion cycles run in emulation on designs of more than a billion gates. System-on-chip (SoC) designs for server, networking, graphics, mobile, and artificial intelligence (AI)/machine learning applications meet or exceed this debug complexity metric. The verification of these devices presents three primary challenges: reducing the effort to find the time window around the root cause of a test failure, reproducing the root cause deterministically, and using scalable waveform-based debug to identify the exact root cause.

The first exascale debug challenge arises from the length of the tests. An emulation test typically involves resetting the SoC design, booting an operating system, starting a variety of system services, and running user applications. This process typically takes at least two billion emulated clock cycles. If an application fails or the operating system hangs, the root cause of the failure often occurred during a much earlier phase. For example, a corrupted memory location due to a cache coherency bug in the design may happen during operating system boot but not be read for a billion cycles or more. Traditional waveform-based debug is effective for a window of only a few million cycles or fewer. Getting to the root cause of the test failure and finding the design bug by iterating backwards through thousands of waveform windows is impractical.

The second major challenge is failure reproduction due to the requirement for high throughput in emulation. Running a slow simulated testbench in lockstep with emulation is highly inefficient since it can leave the emulator idle while simulation catches up. To avoid this problem, emulation tests can be set up with the design and testbench running concurrently. The problem is that this leads to non-determinism since the testbench may communicate with the design at different times depending upon server load or interface traffic. As shown in Figure 1, re-running the same test may not reproduce the failure, making it nearly impossible to debug the underlying design error.

Figure 1: Rerunning a typical emulation test with multiple testbench interfaces may not reproduce a failure

The third challenge for exascale debug is the result of the sheer size of today’s SoCs. If the failure can be reproduced and the appropriate debug window can be identified, data must be dumped from the emulator for debug purposes. The volume of data is now in the gigabytes, much more than the megabtyes experienced with older and smaller designs. This raw debug data, typically just the values for state elements, must be expanded into a full dump file with values for all signals. This process may take an hour or more and result in dump files of many gigabytes. Finally, the time to load this data into a waveform-based debug tool has grown from a few minutes to as much as an hour. The longer times to dump, expand, and load debug data can dramatically increase the time to find and fix design errors.

Fortunately, modern emulation technology can provide solutions to all three exascale debug challenges. Root-cause identification can be accelerated if the emulator can stream system-level data such as abstract logs for interfaces, key events, checkers, assertions, and selected signals. Users can leverage this data to narrow down the problem and focus on a specific window of a few million cycles appropriate for waveform debug. Streaming of this high-level data must be continuous through the entire test run (“infinite depth”). Further, the emulation must run without pause to prevent any reduction in throughput.

The best way to reproduce a failure deterministically is for the emulator to record the stimuli from the testbench during the test run. If a failure occurs, the emulation can run the test again by replaying the stimuli file, eliminating any non-determinism due to variations in testbench response. In addition, the emulator should support save and restore so that a test can be rerun from an intermediate point instead of from reset at time zero. Once the users have identified the debug window, emulation can start at the save/restore point closest to this window. As shown in Figure 2, this process speeds up debug by getting to waveforms much more quickly.

Figure 2: Save/restore points rerun shorter tests to dump debug data around the root cause of a failure

Finally, the emulator must support a data dump flow that scales to exascale debug. The raw data must be able to stream from the emulator via a high-speed interface of a terabyte per second or more. High-performance parallel expansion capability must be able to convert raw debug data to complete dump files in a matter of minutes rather than hours. The waveform debug tool must load the dump data efficiently and respond quickly to interactive commands. For example, selecting a signal and adding it to the waveform via drag-and-drop should take less than a second. Effective debug requires both crisp interactive performance and minimal loop time to run a test in emulation and bring the results into the debug tool.

Synopsys provides an exascale debug solution today with ZeBu Server 4 and Verdi. This combination provides all the solutions described above and has been used on many SoC projects. A case study presented by AMD at the 2017 Design Automation Conference (DAC) provides an example. The AMD team emulated a large graphics IP block in ZeBu while running all non-graphics parts of the SoC on a virtual platform connected to a virtual machine modeling a host PC. They captured all I/O signals so that failing tests could be rerun deterministically in emulation without any of the software virtual models. The team also defined regular restore points from which tests could start for efficient debug.

The era of exascale debug is here, and yesterday’s technology does not scale to address the three key challenges of reducing root-cause time, deterministically reproducing the root cause, and scaling waveform-based debug. A well-proven solution to address these challenges is available today, enabling debug of even the largest SoC designs with the most complex testbenches.

Leave a Reply

(Note: This name will be displayed publicly)