Building Scalable And Efficient Data Centers With CXL

The performance demands of generative AI and other advanced workloads will require new architectural solutions enabled by CXL.


The AI boom is giving rise to profound changes in the data center; demanding AI workloads are driving an unprecedented need for low latency, high-bandwidth connectivity and flexible access to more memory and compute power when needed. The Compute Express Link (CXL) interconnect offers new ways for data centers to enhance performance and efficiency between CPUs, accelerators and storage and move towards a more disaggregated architectural approach.

Data centers face three major memory challenges as roadblocks to greater performance and lower total cost of ownership (TCO). The first of these is the limitations of the current server memory hierarchy in addition to the limited amount of memory that can attach directly to the CPU. There is a three-order magnitude latency gap between direct-attached DRAM (dynamic random access memory) and solid-state drive (SSD) storage. When a processor runs out of capacity in direct-attached memory, it must go to SSD, which leaves the processor waiting. That waiting, or latency, has a dramatic negative impact on computing performance.

Secondly, core counts in multi-core processors are scaling far faster than the bandwidth and capacity that can be supported with main memory channels. This translates to processor cores beyond a certain number being starved for memory bandwidth, sub-optimizing the benefit of additional cores.  Finally, the broad spectrum of workloads run on data center servers do not take advantage of the full density of direct-attached DRAM leading to underutilized or stranded memory resources.

CXL is the broadly supported industry standard that has been developed to provide low-latency and memory cache coherent links between processors, accelerators, and memory devices. CXL leverages PCI Express (PCIe), which is ubiquitous in data centers, for its physical layer. With CXL, new memory tiers can be implemented which bridge the gap between direct attached memory and SSD to unlock the full power of multi-core processors. In addition, CXL’s memory cache coherency allows memory resources to be shared between processors and accelerators. Sharing memory on demand is a key to tackling the stranded memory resource problem.

CXL, now at the 3.1 specification level, builds on the enormous momentum of PCIe technology. CXL 1.1/2.0 use the PCIe 5.0 PHY operating at 32 Gigatransfers per second (GT/s). CXL 3.1 scales signaling to 64 GT/s using PCIe 6.1.

To support a broad number of computing use cases, the CXL standard defines three protocols: CXL.io, CXL.cache and CXL.memory. CXL.io provides a non-coherent load/store interface for IO devices and is used for discovery, enumeration, and register accesses. It is functionally equivalent to the PCIe protocol. CXL.io is the foundational communication protocol and as such is applicable to all use cases. CXL.cache enables devices such as accelerators to efficiently access and cache host memory for improved performance. As an example, using CXL.io plus CXL.cache, the performance of workloads shared between an accelerator-based NIC and host CPU can be improved with local caching of data in the accelerator’s attached memory. The CXL.memory protocol enables a host, such as a processor, to access device attached memory using load/store commands. This enables several compelling CXL memory expansion and pooling use cases.

All three CXL protocols are secured via integrity and data encryption (IDE), which provides confidentiality, integrity and replay protection. To meet the high-speed data rate requirements of CXL without introducing additional latency, IDE is implemented in hardware-level secure protocol engines instantiated in the CXL host and device chips.

The performance demands of generative AI and other advanced workloads will require new architectural solutions enabled by CXL. Rambus CXL IP solutions are designed to deliver the throughput, scalability and security of the latest CXL standard for innovative chip designs. The new Rambus CXL 3.1 Controller IP is a flexible design suitable for both ASIC and FPGA implementations, with a built-in, zero-latency integrity and data encryption (IDE) module protecting against attacks on the CXL and PCIe links.

Join me at our February webinar “Unlocking the Potential of CXL 3.1 and PCIe 6.1 for Next-Generation Data Centers” to learn how CXL and PCIe interconnects will be key to building scalable and efficient data center infrastructures.

Leave a Reply

(Note: This name will be displayed publicly)