Unblocking The Full Potential Of PCIe Gen6 With Shared Flow Control

Creating a common pool of resources to avoid exhaustion of individual buffer space.


As technology advances at a rapid pace, PCI Express (or PCIe) has grown tremendously, allowing data transfer up to 64 GT/s in Gen6. This technology is widely used in data centers, artificial intelligence and machine learning computing, high-performance computing accelerators, and high-speed applications—including high-end SSDs, automotive, IoT, and mil-aero.

To fully utilize this high-speed link’s potential, we need to double the traffic transmission on buses, as compared to PCIe Gen5. The processing speed of packets in PCIe-based devices has barely improved. Therefore, existing buffer spaces are not sufficient to accommodate the large number of packets that need to be transmitted. But increasing the receiver buffer proportionally would result in inflating the hardware and cost of design.

Shared flow control was introduced for FLIT mode in PCI Express to solve this issue by introducing the concept of a shared pool of buffer resources. In shared flow control, there are two types of resources available for each enabled virtual channel (VC)—shared and dedicated—for each type of packet (posted/non-posted/completion). All the shared resources of every active virtual channel form a combined shared pool that can be used per requirements. If an individual buffer space is exhausted, then more traffic can be received if other resources are available in the common pool. This article introduces the implementation and verification requirements of shared flow control, which he we go into at great depth in our full paper, Effective resource utilization in PCIe Gen6: Shared flow control.

The current control flow log jam

In PCIe, a credit-based flow control mechanism is used so that both devices can share their available buffer space with the transmitter to prevent data loss. As in all PCIe architectures, there are primarily three types of packets: posted, non-posted, and completion. Each packet can have a header and data, so as a result, we need to maintain six distinct buffer spaces:

  • Posted Request TLP headers (PH)
  • Posted Request TLP data (PD)
  • Non-Posted Request TLP headers (NPH)
  • Non-Posted Request TLP data (NPD)
  • Completion TLP headers (CPLH)
  • Completion TLP data (CPLD)

The transmitter uses these credits to transmit transaction layer packets (TLP). If enough credits are not available for a particular type of TLP to be transmitted, it gets blocked until credits are made free by the receiver and updated to the transmitter. These credits are freed when the packets are processed at the receiver’s end through an UpdateFC.

The increased requirement for more buffer space could be fulfilled by enabling more VCs, in addition to the default VC (VC0). These additional VCs can prevent traffic at one channel from getting blocked by another to improve QoS. There could be seven additional VCs (VC1-VC7) in a multi-VC implementation that could be enabled dynamically. These VCs independently maintain six distinct buffers, as mentioned above.

Consider having two enabled VCs: VC0 and VC1. VC0 has less allocated buffer space, and VC1 has more allocated buffer space. But there is a requirement to transmit more traffic through VC0 and less traffic through VC1 due to TC-VC mapping. Therefore, the unused space of the second VC is wasted in normal flow control.

This solution is inefficient as it will lead to more power and area consumption and subsequently increase the cost of the chip.

Opening the flood gates for high-speed designs

As the problem could not be solved using only multi-VC implementation, shared flow control was introduced for FLIT mode in PCIe. FLIT mode is used to organize data into units of uniform size.

Unlike a multi-VC implementation, shared flow control does not require a full set of receiver buffers (VC1-VC7) for each VC. In shared flow control, instead of allocating a different buffer space for each VC, every enabled VC contributes to creating a common shared pool of buffer space.

With shared flow control there is a common buffer space pool that can be used by traffic directed to any enabled VC that contributed to that shared pool, irrespective of TC-VC mapping, consequently increasing the throughput of the link.

In the following figures, T_m_n denotes the TLP to be transmitted, where m is the VC through which the TLP is to be transmitted, and n is the packet number of the TLP to be transmitted. V_m_p denotes the buffer space of any type for a VC, where m is the VC number and n is the slot number of the buffer space.

Figure 1 shows an illustration of a shared buffer pool. There is a shared set of buffer resources combining three buffer spaces from VC0 and five buffer spaces from VC1. Five packets will be transmitted through VC0, and three packets will be transmitted through VC1.

Fig. 1: Shared flow control solution showing how resources are shared.

Figure 2 shows that any packets can use any buffer space in the shared pool irrespective of the TC-VC mapping, resulting in efficient utilization of buffer space.

Fig. 2: TLPs utilizing shared buffer space irrespective of TC-VC mapping.

Keeping high-priority traffic flowing

Along with these shared resources, there are dedicated resources. Six distinct buffer spaces are maintained at the receiver for each dedicated VC. These resources can be used to transmit high-priority traffic at a VC at the cost of some overhead in the packet header. A special prefix type is introduced in FLIT mode, which, when attached to the TLP, signifies that it requires dedicated resources.

As the shared pool is the common resource across all the VCs, there may be a requirement that high-priority traffic should always flow. To fulfill this requirement, we have dedicated buffer space for all the enabled VCs, which the shared pool will not block. This is a small but high-priority resource and works similarly to normal flow control (as we describe it in the full paper).

Figure 3 shows that all the shared buffers get consumed by the TLPs directed to the receiver, and some high-priority traffic needs to be transmitted. This can be done by attaching the required prefix to the TLP using the dedicated buffer space of the VC to which the packet is directed.

Fig. 3: Transmitting high-priority traffic through a dedicated channel.

Dissolving verification challenges

Not surprisingly, users will encounter several challenges during the verification of the shared flow control feature. ICVIP from Siemens EDA provides a comprehensive solution in verifying shared flow control feature with a built-in sequence library, functional coverage, and an exhaustive set of assertions and debug messages.

ICVIP PCIe comes with a built-in sequence library, enabling you to generate a valid or erroneous stimulus. These self-checking sequences are provided for verifying positive scenarios as well as to check the behavior of Device Under Test (DUT) in case of error injection cases.

ICVIP PCIe comes with a coverage test plan, which includes the following. Functional coverage comprises four terms: covergroups, coverpoint, crosses, and bins. A covergroup includes multiple coverpoints. For example, there could be covergroups for basic and advanced scenarios. Each covergroup has a coverpoint, such as for DLLP packets. The coverpoint has defined bins covering all possible DLLP packets.

ICVIP PCIe provides coverage-driven verification with a user configuration to enable or disable optional features. You can disable a feature if your design under test (DUT) does not support it. Also, this feature of ICVIP helps you find missing features of your DUT.

ICVIP PCIe provides an exhaustive set of assertion checks, which helps to debug and fix any issues, efficiently covering all the rules related to the specification. Assertions are provided for the error scenarios mentioned above and for illegal field values to provide exhaustive verification.

Shared flow control is an efficient solution for coping with the increasing demand for higher data rates, which results in larger buffer space requirements. However, verification of the shared control flow features can present verification challenges. This is why Siemens DISW designed a solution that effectively verifies shared flow control features and overcomes any verification pitfalls.

To gain a much deeper understanding of this topic, please read the whitepaper, Effective resource utilization in PCIe Gen6: Shared flow control.

Leave a Reply

(Note: This name will be displayed publicly)