Why PCIe And CXL Are Essential Interconnects For The AI Era

Dynamically share memory, storage, and accelerators across multiple compute nodes.

popularity

As the demand for AI and machine learning accelerates, the need for faster and more flexible data interconnects has never been more critical. Traditional data center architectures face several challenges in enabling efficient and scalable infrastructure to meet the needs of emerging AI use cases.

The wide variety of AI use cases translate into different types of workloads. Some require high compute resources, some need vast memory or storage resources, while many others have their own unique requirements. Servers are often over-provisioned to accommodate peak demands. This means that resources remain idle during less demanding workloads, leading to inefficiency and unnecessary costs. With traditional architectures, communication between different peripherals (e.g., CPU to GPU or GPU to memory) must often pass through the CPU, adding latency. In some cases, when multiple compute devices are required to service a workload, data replication is required across these multiple devices, further exacerbating bandwidth and storage overhead.

By leveraging PCI Express (PCIe) and CXL to enable disaggregated compute, systems can dynamically share memory, storage, and accelerators across multiple compute nodes resulting in increased utilization and avoiding issues of over-provisioning. For example, CXL-enabled memory pooling allows multiple CPUs to access a shared memory pool, thereby maximizing memory utilization.

Compared to traditional Ethernet-based connectivity, which introduces latencies in the microsecond range, PCIe interconnects offer sub-microsecond latencies. For instance, PCIe Gen 5 interconnect through retimers and switches can achieve latencies on the order of a few hundred nanoseconds for device-to-device communication. Using PCIe peer-to-peer, devices like GPUs can access memory or storage directly without CPU intervention, further reducing latency across the PCIe fabric.

CXL interconnects can offer further reduction in latency compared with traditional PCIe interconnect. CXL offers a lower latency via .mem path when the CXL controller architecture is optimized for latency, offering as much as a 50% reduction in latency through the controller. A typical PCIe controller roundtrip latency can be between 30-40ns, and a latency-optimized CXL controller can reduce this by greater than 50%. If the Root port and Endpoint CXL controllers are used in a representative disaggregated system, a rough estimate 40ns reduction in latency can be realized vs. PCIe.

The reach of PCIe signaling is also a significant constraint in traditional setups. PCIe Gen 5 and Gen 6, for instance, support trace lengths up to approximately 14 inches (~35 cm) on a PCB. Extending connectivity beyond this range typically requires additional hardware like retimers or switches, which introduce complexity and costs. There’s a desire to extend PCIe and CXL beyond single servers to enable rack-to-rack or data center-wide connectivity. Technologies like CopprLink for PCIe Gen 6 support up to 2 meters of copper cabling, and optical interconnects can stretch up to 100 meters—sufficient to cover entire racks or data center rows.

While data-centric disaggregation promises numerous benefits, several challenges must be addressed:

  • Signal Integrity and Reach: PCIe 6.0’s adoption of PAM4 signaling, while doubling the data rate, also increases signal integrity issues. Maintaining acceptable Bit Error Rates (BER) over longer distances requires advanced retimers and equalization techniques.
  • Cabling and Connectivity: Solutions like CopprLink enable ~2 meters of connectivity at Gen 6 speeds. Additionally, PCIe 7.0 is expected to introduce enhanced provisions for optical signaling, supporting data transfers over 100 meters at 128 GT/s.
  • Latency Considerations: Reducing latency is critical for AI applications. New PCIe features such as Unordered I/O and Port-Based Routing minimize latency by bypassing traditional ordering rules and providing more efficient path selection between endpoints. Peer to Peer (P2P) allows PCIe/CXL devices to directly access memory in other PCIe/CXL devices on the flex bus without requiring the host processor.

Rambus can help system and chip designers address these challenges with a broad suite of PCIe and CXL IP solutions enabling next-generation performance in data center computing.

Links



Leave a Reply


(Note: This name will be displayed publicly)