UVM: What’s Stopping You?

Easier UVM and FPGA-based acceleration work together to increase UVM productivity.


These days, verification of the most complex designs is performed using a standard verification methodology, probably SystemVerilog-based UVM. Many verification teams have ramped up on UVM, but others have yet to take the plunge. Why is that? And how big a “plunge” is it, anyway?

If UVM is as great as all that, then why hasn’t everybody adopted it already? Is it not as “Universal” as we first thought, or are there barriers to its adoption? If so, what might these barriers be and how can they be overcome?

Real and imaginary barriers
To answer my own question (I do that a lot, sorry) these barriers might be technical, procedural, or economic. Other barriers might be based on perception, fear of a steep learning curve, or uncertainty of the potential return on investment in adopting UVM.

To group them another way; some of these barriers are real but others may be imaginary.

This article is going to explore two of these barriers; the learning curve, and simulation throughput, but are they real or imaginary?

Learning curve: an imaginary barrier?
At the Verification Futures Conference in 2012, Janick Bergeron, one of the fathers of constrained random verification methodology, was asked for some advice on how to learn UVM. The answer was, “Don’t try to learn all of it.” This sentiment has been reflected by others at various times, and in at least one case, used as a basis of adoption. For example, Doulos, the well-respected training company, has been teaching UVM classes and helping to define the standards since UVM started, and they have created something called “Easier UVM” as a way to accelerate adoption and reuse.

Easier UVM comprises guidelines and a code generation tool, and has been adopted for real-life projects and tape-outs. By use of Easier UVM, new users may gain understanding and early success in order to promote the wider use of UVM in other projects. John Aynsley, CTO of Doulos, asserts that learning curve is not an imaginary barrier but very real. Verification engineers really need to know what they’re doing with UVM but Easier UVM is a useful way to get started.

At Aldec, Easier UVM guidelines have been followed in order to tailor verification tools and VIP, which are portable and re-usable across simulation, emulation and even hybrid environments including SystemC virtual models. This portability has become important because the use of emulation, in particular, is becoming more widespread partly owing to the need for greater and greater throughput in UVM-based verification.

Throughput: A real UVM barrier?
There is a suggestion that UVM encourages the substitution of “dumb” simulation cycles for domain knowledge; that it is easier to throw more simulation cycles at a verification problem in order to achieve coverage goals, rather than creating more pertinent directed tests, or more intelligent constraints. Readers may have their own opinions on that, but a widely adopted best practice is to plan is to use both directed tests and constrained random tests. The directed tests are created in order to reach a pre-agreed functional coverage score; testing all the obvious major and minor features of the Design Under Test (DUT). Only then does one rely on a constrained random approach to test non-obvious corner cases and bring the coverage up to the desired 100% score.

What does this mean for the use of UVM? UVM can be used for directed tests as well as constrained random, of course, but the latter may require many more cycles of simulation runtime, leading to simulation throughput becoming a bottleneck. This is especially a problem when the above best practice is not followed, so that teams prematurely switch to constrained random instead of using design knowledge to create further directed tests. Coverage-driven and metric-driven techniques help, but emulation increasingly is being used to accelerate UVM in order to overcome the constrained random throughput bottleneck. It also is helpful for long-duration directed tests.

Increasing throughput with emulation
Emulators have a number of use modes including simulation acceleration, for which we are required to place part or all of the DUT, and perhaps part or all of testbench into hardware. In the case of the Aldec HES platforms, that hardware is based on FPGAs but other types of emulator exist which use an array of bespoke processors.

Let’s consider the common approach of partitioning the DUT in the emulator, and the testbench on the simulator. We are then faced with the need to link the testbench with the DUT via bi-directional interfaces between the simulator and the hardware. For each port between the testbench and the DUT, we need to ensure that every simulator event that produces a signal change in that port produces an equivalent voltage representing logic 1 or 0 at a physical pin in the hardware (and vice versa). We can see this represented in Figure 1, which shows a simplified example of a single port in a UVM test environment. Let’s imagine that represents a top-level bus port.

Figure 1: Partitioning at the signal level.

As might be obvious, even a simple transaction on that bus port would involve multiple simulation events and changes. The hardware cannot run any faster than the simulator’s ability to make those changes and the signal-by-signal, event-by-event communication between the testbench and the DUT across the hardware-software partition. Often the overall speed of the simulation acceleration will be governed by the clock rate and complexity and traffic on such interfaces, whatever the speed of the hardware. Even so, the acceleration achieved is still useful for large DUTs and long tests.

For greater acceleration, we need a way of not only speeding up the communication, but allowing the hardware to run at its own (higher) speed for most of the time. That’s where transactors come in.

Rationalizing interfaces using transactors
Taking a look at the top-level ports of a typical DUT, in many cases, these will be standard peripheral or bus interfaces (such as USB, SATA, APB etc.), the behavior of each of which is well understood. We can use this known behavior to agree short cuts in the communication between simulator and hardware. For example, instead of relying on the simulator to drive every signal change for the writing of a data value over a standard port, we place some extra hardware alongside the DUT in the FPGA(s), which makes those changes for us locally. This is shown in Figure 2 as a BFM, or Bus Functional Model.

A single command or function call on the simulator side could then initiate all the necessary changes on the hardware side. The mechanism by which this all happens is called a transactor, and in Figure 2 this is comprised of the BFM interface on the simulator side, a transaction layer and the BFM in the FPGA hardware.

Figure 2: Partitioning using a transactor.

Not only does the transactor simplify the communication, it also allows the hardware to run faster because it is not reliant on slavishly following signal-by-signal events in the simulator. If all the interfaces between the simulator and the DUT employ the relevant transactor, then we can achieve greater acceleration overall.

UVM already employs transactional level communications, but how do we convert our simulator-only UVM test environment to use transactors and FPGA-based hardware?

Linking UVM and FPGA
Preferably, we should have written our UVM/SystemVerilog testbench in a style of UVM that allows easy inclusion of transactors. As it happens, the style recommended by Easier UVM is exactly such a style, and with a few simple substitutions, the verification team can re-compile and run the design using FPPGA-based acceleration. In Aldec’s case, this adaptation of the UVM is performed mostly automatically in their Design Verification Manager (DVM) tool.

At the heart of a transactor-based acceleration is the representation of interface transactions as function calls. The UVM agent’s driver makes a single function call, resulting in sometimes hundreds of signal changes in the hardware, at a much higher clock rate than the simulator event rate. The same effect is happening on outputs from the DUT into the simulator via the UVM agent’s monitors. It is this ratio of calls-to-signal changes that increases throughput.

We see this simplified in Figure 3, which also differentiates the two realms by the languages used to describe them, i.e. a Hardware Verification Language (HVL) – in this case, SystemVerilog on the one side and the transactors written in Hardware Description Language (HDL) such as Verilog or VHDL on the other. This implies that any SystemVerilog items appearing on the right-hand side must be synthesizable, or the tools have to be able to interpret them as such.

Figure 3: HVL and HDL communication via function calls.

We might also think of the boundary as dividing the timed and untimed domains, between events and clocks, and the clock is never used to synchronize communication across the boundary.

SCE-MI and DPI-C: The essential enablers for UVM acceleration
The approach of HVL-HDL partitioning has been captured in Accellera’s Standard Co-Emulation Modeling Interface (SCE-MI) which they describe as “allowing a model developed for simulation to run in an emulation environment and vice versa.” Aldec and other emulation providers follow the SCE-MI standard which recommends that the cross-boundary function calls are made via SystemVerilog’s Direct Programming Interface (DPI), which allows it to communicate with other programming languages. In the case of the C language, we term this DPI-C for short.

DPI-C allows us to make calls to externally defined (and imported) C functions from within a SystemVerilog testbench, and to export SystemVerilog items allowing them to be accessed from C. This is very helpful for accelerating UVM as we can use DPI-C as the wrapper between the testbench calls and the transactors, which will be synthesized along with the DUT in to FPGA. We can see this in Figure 4 (note the naming convention used by Aldec).

Figure 4: DPI-C acts as the boundary between simulator and hardware.

Having said that, the HDL side is synthesizable. The definition of what is and isn’t synthesizable varies from tool to tool based on how much effort each tool vendor has put into implementation the HVL and HDL standards. Aldec has taken the view that, to allow easier conversion of UVM to use transactors, some traditionally non-synthesizable HVL needs to be handled. For example, Figure 5 shows part of the code for the driver in the BFM for the trivial example in Figure 3.

Figure 5: BFM Code partitioned and made synthesizable.

This code was previously in the UVM agent but must be moved to the HDL side of the boundary when using hardware acceleration. Notice the use of “while” statements and an implicit state machine, both typically non-synthesizable SystemVerilog code. Other non-synthesizable code constructs appear in UVM code, which might be interpreted as registers that are driven by multiple sources.

Aldec’s DVM handles this by employing a specific SCE-MI2 Compiler, which interprets the code, converts non-synthesizable items into equivalent HDL, and extracts the DPI-C elements into so-called Emulation Bridge code. This is done automatically as part of the DVM flow, which also does all those other jobs involved in making FPGA-hostile SoC HDL ready for implementation in an FPGA-based emulator, including gated-clock conversion, memory modelling, partitioning and synchronizing to a single clock in order to allow clock stopping and single-stepping during emulator runtime.

Hit the accelerator
This article has shown that UVM need not be hard to learn and need not take a long time to run. The use of Easier UVM and easy access to FPGA-based hardware acceleration, can bring UVM to a wider user base, including teams which are creating today’s most complex FPGA and SoC designs.

For further detail, see Aldec’s webinar: FPGAs for Verification, UVM Simulation Acceleration with Scalable FPGA Platforms.