High-efficiency trace for RISC-V for today’s SoCs.
Adoption of RISC-V processors is accelerating. This technology, like everything, comes with benefits and risks. The open standard means freedom for many developers, but success depends on the development of a support ecosystem around RISC-V. Industry collaboration is making broad adoption of RISC-V possible, and one example is the introduction of efficient trace for RISC-V cores.
When incorporating RISC-V processor cores, designers need to know if the RISC-V core is verified, compliant to the standard, and bug-free. What tools are available for design, verification, compilation, operating system support, debug, and trace?
For debug and trace, you’ll need a solution that ensures the software behaves as expected, what happens when it doesn’t do what you expect, and how the core interacts with the rest of the system.
Understanding program behavior in a complex system is challenging. Techniques typically require halting the core to debug the software (run-stop debugging), but we need non-intrusive, full-sped observation of program behavior.
Efficient trace for RISC-V provides the visibility to see full-speed program execution. Trace is a debugging technique where executed processor instructions are captured and compressed on-chip, then transmitted to host software that reconstructs the program execution sequence later (figure 1). It allows for observation at full speed for forensic debugging, code profiling, finding random bugs, and avoiding Heisenbugs (those that vanish when you study them).
Fig. 1: Processor trace lets you monitor program execution of a CPU in real time.
Processor trace, however, comes with its own challenges as SoCs become larger and more complex. The solution is found in better encoding (compression) and decoding.
Instead of trying to capture every instruction possible, which would lead to unmanageable volumes of data, the RISC-V standard employs processor branch trace, where only branches (or deltas) in the program code are reported. In fact, the RISC-V standard has been named “Efficient Trace for RISC-V” in recognition of this. Processor branch trace achieves very high compression. The higher compression allows you to trace more data, for example, tracing multiple cores simultaneously or saving a larger trace history to a fixed size trace buffer.
The RISC-V standard describes several optional and run-time configurable modes. Some are designed to increase the encoding efficiency even further, while others can be used as a debugging aid for use during the development of software trace decoders.
Trace begins by reporting a known start address, which the decoder software can locate within the program binary (or ELF file). Only branches in the program code are reported; branches can be jumps, calls, returns, interrupts, or exceptions. All instructions that exist between branch instructions are assumed to execute sequentially so there is no need to report them, which results in a considerable saving in trace bandwidth as only whether branches are taken or not are reported.
Indirect jumps, interrupts, and exceptions (known as “uninferable program counter discontinuities”) occur when the program counter is changed by an amount that cannot be determined from the program binary alone and requires the destination address to be reported. Interrupts (and exceptions) generally occur asynchronously and require the address where normal program flow ceased to also be reported. Branch trace is also known as “Instruction Delta Tracing” since deltas are typically introduced by branch instructions.
As an example, let’s follow the execution of the code used to calculate the factorial of a given number (figure 2). Assume that the register ‘a0’ contains the value 2 (for which the factorial will be calculated). Trace begins by reporting the start address. The following sequential instructions do not need to be reported. The first branch instruction to be encountered is not taken and is reported as such. The subsequent instructions execute sequentially and are not reported. The next branch is taken, jumping back to address 14, and is reported as taken. As before, sequential instructions are not reported. Finally, the second time round, the final branch is not taken and is reported as such.
Fig. 2: An example of RISC-V processor branch trace.
The Embedded Analytics group of Siemens has been instrumental in driving the development of the RISC-V trace standard and are pioneers of commercially available implementations. The group delivered the first RISC-V trace encoder in January 2018, before the RISC-V trace working group was formed. In May 2022, version 2 of the Efficient Trace for RISC-V specification was ratified, which includes data trace as already implemented in the Siemens encoder.
Download the Efficient Trace for RISC-V specification (PDF) for details.
The Efficient Trace for RISC-V standard requires significantly fewer bits per instruction than older encoding techniques (figure 3). Higher encoding compression is important for efficient trace for several reasons, including:
Fig. 3: The Efficient RISC-V Trace standard requires the fewest bits per instruction.
The high efficiency achieved by the RISC-V trace standard ensures that it is suitable for SoC designs of today and in the future. The Siemens implementation, the Tessent Enhanced Trace Encoder (figure 4), includes all of the mandatory and optional features in the Efficient Trace for RISC-V standard, plus a significant feature that is not yet part of the standard: Cycle-accurate trace. Cycle-accurate trace lets you optimize software performance by identifying where the hart is stalling. The Enhanced Trace Encoder enables this by reporting the number of cycles of contiguously retired instructions, followed by the number of cycles in which no instructions were retired. These cycle counts are encoded to compress the information.
Fig. 4: The Siemens’ Tessent Enhanced Trace Encoder is a fully-featured RISC-V trace solution.
The Tessent Enhanced Trace Encoder is just part of a whole system solution. Debug and tracing data within a system doesn’t end with the capture of the data. In addition to a range of analytic, message, and communicator modules, Tessent also provides the software to analyze and interpret the trace results. Using Tessent Embedded Analytics components, you can debug and trace any design from simple single‑processor systems to highly complex superscalar multi‑processor systems.
Leave a Reply