Accelerating Simulation Of PCIe Controllers For DMA Applications

Speeding up the PCIe link training and initialization process, plus creating custom testbenches that can dynamically adapt to different IP topologies and configurations.


For memory-intensive and high-performance computing, direct memory access (DMA) is indispensable. A typical DMA operation in PCI Express (PCIe) entails the transfer of data from the system memory to end point devices using a point-to-point PCIe bus to reduce latency and increase memory access throughput between the CPU and the device.

Verification of DMA engines is concentrated on the data transfer aspect of PCIe, which resides primarily in the transaction layer. These complex architectures require a highly-scalable and configurable testbench for verification.

This article shows how verification engineers can use Questa Verification IP (QVIP) from Mentor, A Siemens Business, to improve productivity during the functional verification of PCIe designs with DMA engines. QVIP speeds up the PCIe link training and initialization processes as well as PCIe device enumeration. Furthermore, the flexibility of QVIP is key to creating custom testbenches from scratch that can dynamically adapt to different IP topologies and configurations, mixing PCIe interfaces with multiple AMBA-AXI interfaces.

PCIe link training

PCIe is a dominant technology for hardware applications requiring high-speed connectivity between networking, storage, FPGA, and GPGPU boards to servers and desktop systems. It is a robust technology that has evolved over decades to keep up with advancements in throughput and speed for I/O connectivity for computing requirements.

PCIe is built upon a layered architecture consisting of a transaction layer for payload transfers, a data link layer for link management, and a physical layer for initialization and training of a reliable PCIe link between two devices. In terms of PCIe verification, each layer has its own challenges and complexities.

The essential step in a functional test using PCIe is to perform PCIe link training and initialization before data transfer can commence between the two PCIe devices. This step is an integral part of every test that utilizes PCIe for data transfer. Optimizing PCIe link up will result in a reduction in the simulation runtime to reach the L0 state (the fully-operational link state for data transfer).

There are four main states through which the link training status state machine (LTSSM) traverses in PCIe devices to establish a reliable PCIe link: Detect, Polling, Configuration, and Recovery (figure 1). The PCIe link traverses these four main states (and various sub-states defined within them) starting from Detect and following the path shown by the highlighted arrows to reach the L0 state.

When verification engineers develop a testbench environment specifically for verifying DMA features, it is crucial that they configure the LTSSM parameters of the DUT and the configuration settings of the verification IP used in the testbench in-sync so that both devices can successfully transition the LTSSM states in step with each other and achieve PCIe link up in a reduced amount of simulation time.

Fine tuning these configuration parameters for both devices can become quite cumbersome and is an error-prone task, especially if these parameters are not known to the testbench developer. In this case, having a DUT and verification IP that provide a highly configurable design component becomes an absolute necessity to achieve the desired optimization.

Aside from this fine tuning, PCIe GEN5 introduces an optional link equalization bypass mode for faster link-up at 32 GT/s. To train PCIe link at 32 GT/s, a conventional speed change process comprises initially training the link to L0 at 2.5 GT/s and then initiating a speed change followed by link equalization at the intermediate speeds of 8 GT/s, 16 GT/s, and finally 32 GT/s. Since equalization at each data rate greater than or equal to 8 GT/s is an essential process for higher link reliability and lower bit error rate, the time spent performing speed change and equalization at each speed consumes approximately 100 ms of simulation runtime. With equalization bypass mode, the PCIe link in L0 at 2.5 GT/s directly transitions link speed to 32 GT/s, performing equalization once at the highest rate, thereby eliminating the process of stepping through the intermediate data rates of 8 GT/s and 16 GT/ to perform equalization.

PCIe QVIP provides a well-documented standard set of APIs to access LTSSM related configuration variables which include training sequences OS counters, timeouts, and the ability to configure LTSSM state and sub-state specific timeout configurations. With the ability to configure a varied set of LTSSM parameters in QVIP, it is imperative to keep the use-model as simplistic as possible. For ease of use, the default settings of these configurations are chosen such that the QVIP achieves an optimized LTSSM transition for PCIe link up. Having a highly configurable QVIP and optimized default setup greatly improves the usability of the Verification IP in a testbench.

QVIP enumeration flow

In a functional verification environment, QVIP is configured as a root port connected to the PCIe DUT. The enumeration process is a lengthy sequence of configuration reads and writes that the QVIP performs. With QVIP built-in features and capabilities, this process can be reduced significantly by reducing the number of configuration reads and writes after link up, resulting in shorter simulation runtime set up.

The steps below describe QVIP’s enumeration flow for end point device discovery.

  1. Read all the base address registers (BAR), starting at offset 10h, to determine the memory space region and the memory size requirements for this physical function.
  2. Read the capability pointer at offset 34h. PCIe uses a link-list structure to access the standard device capabilities and extended capabilities registers supported by a physical function.
  3. After reading the complete capabilities list present in the configuration space registers by traversing the nodes of the linked list, the enumeration sequence follows a similar approach for physical functions supporting SR-IOV. The SR-IOV capability defines a set of lightweight PCIe functions, called virtual functions, that share one or more physical resources with the physical function. The enumeration sequence then follows a similar process of discovering the virtual functions supported by the physical function.
  4. Finally, the enumeration sequence now starts configuring the device by issuing a series of configuration writes to set up the device, based on the settings provided by the user in the QVIP agent configuration and the capabilities it discovered. This series of configuration writes are targeted specifically to set up the following:
  • Initialization of BAR addresses for all the physical functions and virtual functions based on its memory requirements.
  • Initialization of different device capabilities like power-management, max-payload size, maximum read request size, and read completion boundary.
  • Enable bus-mastering capabilities of the device to initiate transactions on the PCIe bus.
  • Initialization of MSI/MSI-X addresses for the devices.

While executing the enumeration sequence, QVIP stores and maintains a data structure per the physical function of the PCIe DUT in order to utilize this information for user-specific test scenario development during the test phase — after enumeration is complete. This feature enables QVIP to easily execute extensive verification scenarios based on design capabilities, by providing the test writer with APIs to query DUT capabilities and provide address offsets for updating the configuration space registers in the DUT.

The number of configuration transactions executed in the enumeration sequence has a multiplication factor dependent on the number of physical functions and virtual functions per device. This setup phase needs to be performed for every test that uses the PCIe link for verifying DMA functionality. As a result, simulation runtime increases before any actual user-specific test scenario is executed. Reducing the simulation runtime by lowering the number of configuration transactions, significantly improves the set up time needed for a device.

QVIP provides two verification capabilities for enumeration sequences to reduce runtime dramatically: fast enumeration and backdoor enumeration.

In fast enumeration mode, the QVIP is configured through a backdoor mechanism while the DUT is configured through configuration writes only. The advantage here is that the configuration reads for the configuration space registers do not take place, instead the QVIP does configuration writes to configure and set up the device. In this mode, runtime is reduced by half or even more (since configuration writes are fewer in number than the configuration reads performed during the enumeration sequence).

In this mode, configuration reads are not performed by QVIP. Still, the device capabilities information and memory resource requirements to perform configuration writes are needed. This crucial information is provided to QVIP using built-in utilities to accurately capture the required settings in a testbench usable format. The following are the steps needed to extract this information and feed it back into QVIP.

  1. Run a test case, with default full enumeration mode setting:

  1. Enable the following commands to capture the configuration space register settings of the device:

  1. Run the test and then open the simulation log to find the setting captured by QVIP, which will be used in the configuration phase of the test:

The output between the banner’s FAST_BUS_ENUM CONFIGURATION in the simulation log is directly copied into the testbench for configuring the QVIP through the backdoor. Once the above settings are applied, the test case configuration is complete and ready to run in fast enumeration mode.

Providing QVIP with the above settings ensures that no further configuration read is necessary for accessing device capabilities. QVIP will now only perform the necessary configuration writes needed to set up the device for normal operational mode.

In backdoor enumeration mode, configuration reads and writes are not performed at all. Configuration space   registers for QVIP and the device are configured through a backdoor mechanism. The enumeration sequence in this mode is not performed.

This feature is dependent on the DUT to be able to update the configuration space register settings through a backdoor mechanism before the link is trained. PCIe design IP built with this capability can take advantage of this feature in QVIP and reduce the initial simulation runtime even further, as compared to fast enumeration.

The steps to extract the configuration space settings are similar to the fast enumeration mode with minor updates   in the configuration option assigned to the PCIe QVIP agent.

  1. Run a test case with the default full enumeration mode setting:

  1. Enable the following commands to capture the configuration space register settings of the device:

  1. Run the test and then open the simulation log to find the setting captured by the QVIP, which will then be used in the QVIP configuration phase of the test:

For this mode, output between the banner’s BACKDOOR CONFIGURATION in the simulation log is directly copied into the testbench for configuring the QVIP through the backdoor. Once the above settings are applied, the test case configuration is complete and ready to run in backdoor enumeration mode.

Providing QVIP the above settings ensures that no configuration read is necessary for accessing the device capabilities, and QVIP assumes that since the user is running the test with the backdoor enumeration option, the configuration write is also not necessary.

When running QVIP in this mode, after the link is established the user can start initiating test scenarios with the assurance that QVIP and the device have completed the enumeration process.


QVIP speeds up the initial PCIe set up and reduces the initial simulation runtime required to set up tests targeted towards verification of PCIe controllers with DMA functionality.

Initial PCIe link training and the enumeration process is an essential part of every test for verification of DMA engines using PCIe. On average, taking advantage of the most optimized settings in a QVIP assisted testbench, the simulation runtime to establish a PCIe link is reduced by twenty percent. In one such typical case, link training time was reduced from 61 microseconds to 13 microseconds. This reduction in simulation runtime boosts the productivity of the verification engineer developing tests and reduces the overall turnaround time for debug and analysis.

To learn more and read about a QVIP PCIe verification use case, please download the whitepaper PCIE Simulation Speed-Up Using Mentor QVIP with PLDA PCIE Controller for DMA Applications.

Leave a Reply

(Note: This name will be displayed publicly)