CXL: Sorting Out The Interconnect Soup

How Compute Express Link provides a means of connecting a wide range of heterogeneous computing elements.

popularity

In the webinar Hidden Signals: Memory and Interconnect Decisions for AI, IoT and 5G, Shane Rau of IDC and Rambus Fellow Steven Woo discussed how interconnects were a critical enabling technology for future computing platforms. One of the major complications was the “interconnect soup” of numerous and divergent interface protocols. The Compute Express Link (CXL) standard offers to sort out much of the complexity by providing a means of interconnecting a wide range of heterogeneous computing elements including CPUs, GPUs, System on Chip (SoC), memory and more.

CXL, now at the 2.0 specification level, provides ultra-low-latency links and memory coherency between devices. It promises the performance needed by Artificial Intelligence/Machine Learning (AI/ML) and other compute-intensive workloads. Further, it builds upon the enormous momentum of PCI Express (PCIe) technology by adopting the PCIe 5.0 PHY as its physical interface. With next-generation server platforms from Intel and AMD coming in about a year’s time with PCIe 5.0 support, there will be a growing ecosystem of PCIe 5.0 devices offered in the marketplace.

To support a broad number of heterogenous computing use cases, the CXL standard defines three protocols: CXL.io, CXL.cache and CXL.mem. CXL.io provides a non-coherent load/store interface for IO devices and can be used for discovery, enumeration, and register accesses. CXL.cache enables devices such as accelerators to efficiently access and cache host memory for improved performance. With CXL.io plus CXL.cache, the following use model is possible: an accelerator-based NIC would be able to coherently cache host memory on the accelerator, perform networking or other functions, and then pass ownership of the memory to the CPU for additional processing.

The CXL.mem protocol enables a host, such as a processor, to access memory attached to a device using load/store commands. This enables some very compelling use cases. Using CXL.mem and CXL.io, a processor or multiple processors can be connected via CXL to a memory buffer device to access banks of DDR, LPDDR or other memory types including non-volatile memory. This can provide enormous architectural flexibility by offering processors access to greater capacity and memory bandwidth.

The use case of coherently sharing memory resources between computing devices, such as a CPU and an AI accelerator, can be enabled by using all three of the protocols. For instance, a server with a CXL-connected accelerator card would enable an embedded cache of host memory on the accelerator as well as use of the accelerator attached memory, such as high bandwidth HBM2E devices, between the accelerator and the host server CPU.

The performance of CXL benefits greatly from the 32 gigabits per second (Gb/s) speed of PCIe 5.0, a doubling of data rate over the previous generation. As we have seen in the past, doubling the speed of a PCIe link improves performance but increases the complexity of the analog circuits and accompanying digital cores. On the analog side, complexity is driven in no small part by the growing number of signal and power integrity issues that emerge at higher speeds.

Rambus offers a PCIe 5.0 PHY with CXL standard support that is ideal for performance-intensive workloads such as AI/ML. Working with an interface IP vendor such as Rambus, designers have access to a high-performance CXL/PCIe 5.0 PHY that benefits from Rambus’ over 30 years of high-speed signaling expertise and over a decade of implementing PCIe solutions.

Additional Resources:
Website: PCIe 5.0/CXL SerDes PHY



Leave a Reply


(Note: This name will be displayed publicly)