Two Methods For Debugging SW Workloads On Arm-Based SoCs

Alternatives to using actual software workloads on pre-silicon designs.


By Andy Meier and Tomasz Piekarz

In a typical system-on-a-chip (SoC) development project, chip architects will make a given SoC’s initial specification available to design teams years in advance of the silicon. As requirements change, they will modify both the hardware and software specifications. Typically, a large portion of the software development occurs much later in the development program.

A major challenge for the SoC verification and validation teams has been ensuring correct system operation before software development takes place. Several emerging techniques address this issue, including running actual software on fully functional pre-silicon designs. This approach is especially appealing simply because of the technology advances available in hardware-assisted verification platforms; validation engineers can put a design through a set of realistic workloads and mimic a design’s eventual day-to-day use.

However, using actual software workloads is not problem-free, as limited visibility and debuggability are among the obstacles designers can face.

In this article, we’ll look at two different techniques used for debugging Arm-based SoCs at the register transfer level (RTL), including adding an enhanced software debugger to the hardware verification environment. The example we will use in this article is an Arm 926 based design.

In a typical flow for embedded processor design, IP vendors provide a processor model in the form of a precompiled encrypted design simulation model (DSM) or Verilog model based on the design’s architecture, such as Arm, RISC-V, etc. This CPU model is then connected to the bus fabric and the rest of the RTL design, and the designs are executed using a logic simulator or an emulator.  Once the basic embedded design is available, the next step involves building and partitioning embedded software to load into the design’s (RTL-based) program memory. While the broad outlines of the process are understood, designers and design verification engineers often face a challenge when debugging CPU subsystem designs. Since the models are not designed for debug (i.e., they don’t provide an API to connect to a debugger that shows the actual software execution, variable values, or memory contents) more often than not values shown are not aligned with PC execution. This happens because the debug is at a different stage of the pipeline, creating an inconsistent view. Consider a scenario where two consecutive instructions update the same register. Since the CPU pipeline is generally not aligned with overall execution, only the last change would be visible.

Manual debugging: not for the faint of heart

Let’s examine what’s involved in manually debugging embedded software running on the CPU. Consider the relatively common failure where the CPU executes code and then at some point goes into “flatline” mode where it basically quits executing further instructions.

To debug this failure, we pose at least four questions to find the state of the CPU just before it failed:

Question 1: What line of code was last executed in the simulation?

To answer this, the last executed instruction must be identified. This is where the eis.log can be found. The eis.log file, also known as a tarmac file, is generated by the Arm cores during simulation and consists of the static instruction trace. Here, the last instruction executed was LDRB instruction at address 0x118, according to the file.

Next, we have to identify the correlated line in the source code. This can be found in the line table, which points to line 135 in the demo_diag.c file.

Question 2: What function was being executed when the execution stopped?

To answer this, go to the symbol table and look for address 0x118, which is shown as belonging to the send_to_dbg_port function.

Question 3: From where, in terms of source line number and function name, was the function called?

To answer this, one must look into the disassembly file, which here shows that the function was called from main from the address 0x7e0. This information can be referenced with the line table, which shows that the function was called from the file mine.c line 411.

Question 4: What was the value of variable “p” in main() when the simulation stopped?

Next, look at main in the main.c file to see where the value of “p” is changed. Note below the three entries on line 400 that could affect the value of “p.”

Now, use the line table to find out the address of the third statement on line 400 (p++) and the disassembly view to find out where “p” is stored. It turns out that the address we are looking from is 0x784 and “p” is stored in R7.

Now go back to the waveform and look at the values in R7, which below are zero, around the function call.

Enhanced debug techniques

So far so good, but we still don’t know the cause of the failure. We can find additional clues in the logic simulator transcript file, which shows that “X” propagated into the status register followed by a conditional branch but doesn’t indicate why.

Let’s state the obvious: manual debugging is labor intensive and tedious. The previously described example involves generating static log files, searching through them and then trying to correlate the gathered data to the correct time in the waveform that represents the hardware simulation environment in the CPU. And all of that is done to find the last CPU state before failure, the logical starting point of the debug process.

The bottom line is that hardware verification environments are great for verifying hardware but are lacking when it comes to bringing more visibility to the hardware-software environment. The key to such visibility is linking the software debugger to the RTL simulation environment. Now let’s answer the same questions using Veloce Codelink from Siemens EDA to troubleshoot what happened during a simulation run.

Veloce Codelink does not require changes to software or hardware design while allowing for logging and replaying simulation. It supports multi-core and multi-CPU environments and offers the ability to step forward and backward.

One key feature of Veloce Codelink is the ability to debug offline from the emulator and simulator environment. This is critical to freeing up valuable resources for other uses.

What line of code was last executed in the run?

To find this out, after loading the software execution file (rlf) from the example run, you are able to move the cursor to the last instruction executed and use the Veloce Codelink debugger to look at the source code window. The line number was 135 in demo_diag.c file.

From where, in terms of source line number and function name, was the function called?

To answer this question, scroll up in the source code window to see what function the code belongs to. From there you can single-step backwards to the caller. Here, the function call is send_to_dbg_port and the caller is main.c line 411. In an environment like this, being able to step backwards is very important because so much RTL debug happens on the previously executed runs.

What was the value of variable “p” in main () when the simulation stopped?

Moving the cursor and hovering it over the “p” variable shows the latest value: zero, in the example.

Connecting the debugger to the RTL processor model

The best way to connect the debugger to the RTL CPU model is to split the task into two parts: logging a live simulation into the database and then opening the debugger in post-simulation mode using this database. This matches the hardware debug environment, where it’s very common to do a batch simulation and then debug it in post-simulation mode to avoid long simulation times. The setup, which operates on prerecorded data, provides an ability to step forward and backward. This allows a verification engineer to start from the point of failure and trace back the cause, which certainly beats the alternative: re-running the simulation every time the interesting place in the simulation is passed.


Adding a software-enabled verification environment to aid in debugging SW workloads executing in the hardware verification environment yields many benefits to designers. Tools such as Veloce Codelink provide the functionality described. Veloce Codelink does not require changes to software or hardware design and allows for logging and replaying simulation. This multi-faceted tool supports multi-core, multi-CPU environments and offers the ability to step forward and backward.

Adding the software debugger to the hardware verification environment increases productivity when writing and debugging software. This helps to avoid the tendency to oversimplify the software used in the verification of the embedded CPU because of a lack of execution visibility. With this additional visibility and debug capabilities, one is now capable of expanding beyond simple C and C++ programs into complex boot-up software and full Linux execution to run on the CPU to fully verify the design with real world workload conditions.

Andy Meier is a product marketing manager in the Scalable Verification Solutions Division at Siemens EDA.

Tomasz Piekarz is a technical marketing engineer in the Scalable Verification Solutions Division at Siemens EDA.

Leave a Reply

(Note: This name will be displayed publicly)