Connecting Emulated Designs To Real PCIe Devices

While connecting an emulator with an external environment usually requires some additional up-front effort, it pays off in the long run.


These days verification teams no longer question whether hardware assisted verification should be used in their projects. Rather, they ask at which stage they should start using it.

Contemporary System-on-Chip (SoC) designs are already sufficiently complex to make HDL simulation a bottleneck during verification, without even mentioning hardware-software co-verification or firmware and software testing. Thus, IC design emulation is an increasingly popular technique of verification with hardware-in-the-loop.

Recently, hardware assisted verification became much more affordable thanks to the availability of high capacity FPGAs (like Xilinx Virtex UltraScale US440) and their adoption for emulation by EDA vendors.

A good example of such an emulation platform is Aldec’s HES-DVM. It can accommodate designs over 300 Million ASIC gates on multi-FPGA boards (HES-US-1320), where the capacity scales by interconnecting multiple boards in a backplane.

Design compilation, synthesis and partitioning are fully automated by the DVM tool which is part of the HES-DVM platform. Worried about tracing design problems and debugging while in emulation? Not a problem. HES-DVM has a HW Debugger tool that can capture design signals across multiple FPGAs and correlate them with your RTL design hierarchy, so the waveforms captured from emulation look as if they had been captured during an HDL simulation.

There are other advantages to using emulation, one of which clearly differentiates this verification method from others. Emulation allows you to connect the emulated design with real devices known as In-Circuit-Emulation (ICE), and this is the subject I wish to elaborate on in this blog.

Connecting an emulator with an external environment is an advanced use case and usually requires some additional up-front effort, but it pays off in the long run because less effort is required to create complex test scenarios.

Now, live data streams will stimulate your design under test.

However, please be advised that all the great features of emulation – such as fully automatic design setup and sophisticated debugging capabilities – come at the price of slower than real-time design clocks.

A typical emulation clock frequency will be between 1 and 10MHz. It becomes obvious that in many cases such clocks are too slow to communicate with external peripherals directly. Thus, the ‘additional up-front effort’ mentioned above is with regards to the development of Speed Adapters, which buffer data and provide cross-domain crossing (CDC) between the slower emulation domain and the usually faster peripheral device domain. It’s also important that the Speed Adapters remain transparent for both domains.

Fortunately, this up-front effort does not need to be high.

Let’s take an example of the PCI Express (PCIe) interface which, in my experience and through talking with Aldec customers, is one of the most popular interfaces to use for ICE.

Imagine you have a System-on-Chip (SoC) design containing a CPU, on-chip SRAM, an Interrupt Controller and a Timer which are interconnected with an AXI BUS. This is a typical subset of SoC components required for running an embedded Linux operating system.

Clearly, the SoC is built to communicate with the external world so, along with many external peripherals, it provides a PCIe ROOT controller to allow connecting external PCIe devices. This is just the hardware and, as we all know, no matter how large our hardware team, the software team – responsible for developing firmware, embedded Linux and application software – will be larger.

To some extent the software team can use virtualization techniques and simulation models to develop many software components but thorough software testing and debugging requires a fast and accurate model of the target hardware.

With an emulation platform the software team receives the exact hardware model of the SoC and, more importantly, receives it much earlier than if they had to wait for the first chip samples from the foundry.

Let’s assume one of the external devices to be a Network Interface Card (NIC) with a standard PCIe interface. Connecting such a board with the emulator provides the opportunity to test the SoC with real network traffic and TCP/IP protocols with all their properties in the natural environment. With the alternative being to have your DV team develop a TCP/IP stack and UVM constrained random test sequences, I know which I prefer. So, let’s do this.

Developing a PCIe speed adapter from the ground up would be a complex and time-consuming task, requiring a level of effort comparable to developing a complete device controller. Thankfully, we have access to reusable components and Xilinx, for example, provides some ready-to-use IP-Cores for PCIe with the Vivado Design Suite. Among these is the ‘AXI Bridge for PCI Express’ which seems to be appropriate for this task.

As a proof of concept, I created a SoC design based on the Xilinx MicroBlaze soft processor core CPU which is shown in the following block diagram.

The design contains a complete SoC with MicroBlaze CPU, a timer, an interrupt controller, local RAM and a DMA controller. External devices required for this SoC are a DDR4 memory controller, a PCIe root complex, and UART and SPI Flash controllers. All components of our SoC are connected using AXI interconnect infrastructure.

Please note, there are four external peripheral controllers and all need to be implemented as speed adapters. For the sake of simplicity, I’ll explain the implementation of just the PCIe speed adapter but the other three were implemented in a similar manner.

The ‘PCIe Root Complex’ controller is provided with the AXI wrapper as the ‘AXI Bridge for PCI Express Gen3 Subsystem’ IP-Core.

It’s convenient to use in an AXI-based SoC because all lower level PCIe interfaces are already wrapped into higher level memory mapped AXI device wrapper. Thus, the IP-Core contains one AXI slave port for controller configuration – S_AXI_CTL and for data transfers there are two other ports; an AXI slave (S_AXI) and a master (M_AXI).

The AXI Master port is capable of transferring data through DMA channels, hence we have the DMA controller in our SoC design (see above block diagram). Apart from AXI interfaces the Bridge provides PCIe physical interface lines (pcie_7x_mgt) and requires a dedicated reference clock (refclk) plus a clock for the MGT physical lines (sys_clk_gt). Note also, the IP-Core provides an output clock (axi_aclk) which should be used to synchronize all its AXI ports.

The remaining ports are used to handle interrupts from the PCIe subsystem.

Because the Bridge IP implements the physical layer (PHY) of the PCIe interface it is required that input clocks are real-time so, in our design, both refclk and sys_clk_gt are driven from the 200MHz on board oscillator. Thus, the axi_aclk (which is derived from refclk) will also have real time frequency, so is not suitable for the emulation side.

So, how do we connect this IP block with the rest of the design, which is emulated and driven from the emulation clock domain? To implement the speed adapter, we need to use clock converters, which are easy to implement with ‘AXI Clock Converter’ IPs also available in Xilinx Vivado. They are connected as shown in the following figure.

Note that the ‘AXI Clock Converter’ IP provides S_AXI slave and M_AXI master port with two corresponding clock inputs; s_axi_aclk for slave and m_axi_aclk for master.

Internally they have clock domain crossing circuits assuring that data can be safely transferred between devices in different clock domains. Hence, the two clock lines can be asynchronous.

Let’s check how the clocks are connected. In the above diagram they are marked in red and green. The red line is driven from the axi_aclk output port of AXI Bridge instance and this is the free-running hardware clock derived from the 200MHz reference clock.

The green line is driven from the input port which, in our design, will be connected to the emulated clock – which is the same as the one used for the MicroBlaze SoC subsystem. Besides the AXI ports there is one more in AXI Bridge which is used in the design. It is the interrupt_out port and since it is in the refclk clock domain (synchronous to axi_aclk output) it needs to be synchronized to the emulation domain.

A customized synchronizer was developed here as RTL Verilog based on a 1-bit wide FIFO, so the output port pcie_interrupt is properly synchronized to the emulation clock domain.

The experiment ended up as a real hardware implementation. Here is the photo of the emulation workbench.

The largest board in the photo is the HES-US-440 main emulation board with Xilinx UltraScale XUS440 FPGA. The entire SoC design is implemented in this FPGA and the PCI Express interface I/Os are locked to the site which connects with one of the three FMC connectors.

Next, there is a daughter card FMC-PCIE hooked on top of the respective FMC connector. This board provides standard PCI Express sockets, one of which was used to connect the Network Interface Card with Intel chipset.

It’s also worth mentioning that the HES-US-440 board was configured for emulation mode, hence it provides an emulation controller (the smaller chip with the fan on in the above photo, which is the Xilinx Zynq 7000) and another PCIe link used to connect the emulation board with a host workstation.

Note: I should mention too that I developed a simple GPIO transactor to transfer some status flags (e.g. PCI link up, DDR calibrated etc.) from hardware to the host workstation instead of using on-board LEDs. This is of course a very simplistic use of SCE-MI channel and in more advanced scenarios you can implement complex bus transactors or monitors and integrate your emulation with a sophisticated testbench running on a host workstation while other interfaces (like PCIe in our case) are connected with real devices. Since “a picture says a thousand words,” here’s the entire system as implemented in the emulation workbench.

Aldec’s HES-DVM was used to setup the design for emulation and to integrate it with SCE-MI infrastructure.

I created a simple C++ testbench application that links with the SCE-MI API and allows me to read the GPIO transactor to decode the status information. The same application also initially configures the FPGA with the bitstream file of the design. OK, let’s power up the workbench and see the result.

The design status looks perfect. Immediately, I could see that the reset lines were at their proper levels – indicating the design was out of reset – and that the FMC daughter card was powered correctly and the PCI Express link was up (and its lines synchronized).

Besides the PCIe link I also have the UART port in the design which will be used as a STDIO channel by the Microblaze CPU. The UART-to-USB controller is available on the HES-US-440 board so I can connect a serial port terminal to this device. I used the minicom program on my Linux host workstation.

The Microblaze CPU subsystem contains a debug interface module hooked to the JTAG port of the FPGA. I can connect Xilinx Software Commandline Tool (XSCT) with this JTAG port and control the Microblaze CPU.

The embedded Linux image of PetaLinux that was customized for my SoC design was uploaded first to the design DDR at 0x8500000 and then booted up using the u-boot method. The two steps of the software upload process followed by the ‘con’ command brought me to the U-boot prompt in the UART terminal and took about 2 minutes.

Then I used the ‘bootm’ command to boot from the memory location 0x85000000.

I then waited 10 minutes for the PicoLinux bring up, but while waiting I observed boot-log messages. I was pleased to see that my NIC card was detected correctly, matched with Intel’s driver module e1000e and configured as eth0.

Next, I received the PetaLinux #-root prompt and was able to interact with the OS.

First, I checked if the PCIe card was up and running (lspci) and that the ethernet interface was configured (ip link). Then i checked the network connection to my workstation (ping). Everything was working fine.

I left the system running over a weekend and also tested it by transferring large files from my workstation to the embedded SoC on the emulation board via FTP; achieving around 460 Kbytes/sec transfer rate.

Everything was stable, which proved that the design of the PCIe Speed Adapter was correct.

Speed adapters are very useful while emulating or prototyping in FPGA because they allow connecting the design with external peripherals which have to operate at their target speeds while design is driven by slower emulation domain clocks.


In this blog I presented the principles of developing a reliable speed adapter and showed an example implementation of one; specifically, the PCI Express root complex controller.

At Aldec, we have developed a scalable FPGA-based emulation and prototyping platform, but we are also working closely with engineers and helping them to build an efficient verification environment with hardware in the loop.

Leave a Reply

(Note: This name will be displayed publicly)