Enabling different processing units to access further memory units in different server units or racks.
By Keivan Javadi Khasraghi and Ruben Sousa
Data centers are currently facing the increasing need to enhance their bandwidth capacities. This surge is largely driven by the rise of new technologies, especially the growing demand for AI and machine learning applications. As these technologies advance, bandwidth requirements are projected to expand dramatically. As Large Language Models (LLMs) become more accurate and comprehensive, they need progressively higher computing and storage capabilities, which in turn needs faster interfaces with the lowest latency possible. This rising demand for swift processing of LLM models has exposed inefficiencies within data centers. This article explores PCIe 7.0 over optics, a promising technology to meet the growing bandwidth and power efficiency demands in data centers.
PCIe, the interface of choice within rack servers, links resources together through copper cables or a backplane. With over six generations of deployment, and the ratification of the PCIe Gen 7.0 specification coming up, PCIe will continue to be a key player for high-speed interconnects.
Fig. 1: PCIe 7.0 is critical to enable the next-generation of data center interconnects.
The increasing demands placed on data centers have exposed several critical challenges, from resource limitations and latency to energy consumption and scaling difficulties. Understanding and addressing these obstacles is essential for optimizing data center performance and ensuring they can meet future demands.
Current data centers are experiencing efficiency challenges due to memory bandwidth and memory utilization. The restriction of only accessing local memory not only limits the speed of data processing but also leads to underutilization of data center memories. This occurs despite the evolution of processors to include more and faster cores.
Latency currently poses a significant bottleneck for most AI/ML applications. The transfer of high data rates and complex modulation schemes over copper cables, backplanes, requires the use of advanced equalization techniques and algorithms such as forward error correction (FEC), but these further contribute to system latency.
Power is the scarcest resource in data centers, and current technologies demand the utilization power-hungry chips. An estimated 25% of a data center’s total power is used solely for point-to-point data transfers. As the need for data transfer grows, particularly with the advent of AI/ML applications, this energy consumption is predicted to rise dramatically.
Demand for data transfer and data processing is going higher and higher with emerging requirements and technologies, this would directly lead to higher memory and faster memory access. Data center growth requires network architecture to scale accordingly and designing networks that could be scaled without too much financial burden becomes very important. The ability to scale resources up or down based on demand is crucial for AI workloads, which can be highly variable.
Optical links offer higher bandwidth density compared to electrical links. Initially, PCIe interfaces were developed to be utilized over copper, DAC, and PCB interconnects. However, as data rates increase and electrical losses escalate, this approach is becoming less appealing.
Optical links have the advantage of covering longer distances. Resource limitations, particularly memory constraints, are becoming increasingly challenging to address using the current architecture of PCIe over copper, which only permits access to local memory. Optical technology, however, can overcome this limitation by enabling different processing units to access further memory units in different server units or racks. This is beneficial for resource pooling or sharing over CXL switch and other similar applications.
When it comes to energy efficiency and cost-effectiveness over longer distances, optical links excel. They are far less lossy compared to electrical links, which means they demand fewer re-timers and signal conditioning units over the same distances. Additionally, the use of low-cost, high-yield optical components could further reduce costs per distance. Copper interconnects, on the other hand, occupy a lot of space in data centers and are not suitable for dense data centers. In contrast, optical fibers are more flexible and take up less space, making them a better option for increasing density in data centers.
Finally, linear or direct drive optical links can also help to reduce latency and power consumption. Different optical architectures can be employed for PCIe over optics, leading to improved latency. For instance, linear direct drive optics avoids an extra timer in the link, resulting in reduced latency.
Figure 2 shows a PCIe over optics use case scenario for data center intra rack and rack to rack configurations based on requirements from OCP (Open Compute Project). These applications range from compute, storage, accelerator, and memory connectivity scenarios for NVMe & CXL enabled disaggregated data centers.
Fig. 2: OCP General PCIe connectivity intra rack & rack to rack.
The PCIe Interface was not originally conceived with optical compatibility in mind. Applications of PCIe interconnects, such as CPU to CPU, GPU to GPU, and GPU to memory, were typically addressed using the current PCIe PHY and controller, from the root complex to the endpoint, via copper-based channels. Consequently, transitioning from PCIe with electrical channels to PCIe over Optics is not a straightforward process and has its own challenges.
The first challenge lies in meeting PCIe electrical compliance. This involves the necessity for clearly defining compliance specifications to ensure interoperability. Another aspect of this challenge is maintaining backward compatibility over optical links. The second challenge concerns the support for PCIe protocol over Optics. This may necessitate alterations to the existing protocol to accommodate optical technology. These changes might encompass aspects such as Rx detection (where impedance is currently used to determine if the remote electrical receiver is ready for traffic, a method not compatible with optics), management of electrical IDLE states, performance of SSC clocks with optics, and handling of sideband signals.
The PCI-SIG Optical Working Group was established on August 2023 to tackle the challenges on the adoption of PCIe optical technologies. Synopsys is actively involved in discussions helping contribute to the advancement towards “optical-friendly” PCIe standards.
The retimed topology is a key approach where a maximum of two retimers are permissible within an end-to-end link. Some important aspects to consider within this topology include the strategic placement and the precise quantity of retimers deployed.
Conversely, the non-retimed or linear topology introduces a more complex set of challenges. This is primarily because a linear link disrupts the continuity of the path, making it more difficult to reconcile with the existing PCIe standards and compliance stipulations. Effective regulation of channel losses is paramount in this topology. Moreover, it may necessitate substantial alterations to the protocol layer, and potentially to the PHY layer as well. A comprehensive feasibility study with all types of optical engines is also a critical aspect of this topology.
Fig. 3: Different topologies to enable PCIe over optics.
In addition to link topology, other critical elements such as form factor standardization and FEC schemes should be considered to successfully establish a PCIe link over optics. Currently, form factors such as CDFP, OSFP, QSFP, QSFPDD, among others, are being evaluated, with the advantages and disadvantages of each being carefully considered. The same is happening in the FEC discussions, where concatenated FEC architectures are being considered to relax the optical PMD requirements or extend its reaches while providing low latency for the overall system.
Synopsys and OpenLight showcased during OFC 2024 the world’s first PCIe 7.0 data rate demonstration over optics, using a linear drive approach, in addition, we also featured a PCIe 6.x over optics demo. This demonstration showcased end to end link BER performance orders of magnitude better than the FEC threshold, showcasing feasibility of PCIe 7.0 over optics running at 128Gbps PAM4. This performance was achieved using discrete electrical and optical components to build the PCIe over optics link.
Synopsys recently announced a complete PCIe 7.0 IP solution, consisting of controller, IDE security module, PHY and verification IP. This solution will enable designers to address the demanding bandwidth and latency requirements of transferring massive amounts of data for compute-intensive AI workloads while supporting broad ecosystem interoperability. Synopsys is demonstrating this technology at PCI-SIG DevCon in Santa Clara on June 12 and 13, 2024.
PCIe over optics is essential for establishing interconnectivity among rack units, thereby enabling them to operate as a cluster. The role of PCIe is central as it acts as a controller — the digital logic that interfaces with a particular software. One of the major hurdles is to ensure the transition to optical PCIe does not disrupt the control process of the software stack. It is clear that PCIe over optics represents the future of signaling. Its development and adoption depend on the enablement of a supportive ecosystem, which Synopsys is actively pursuing. Synopsys complete IP solutions for PCIe, with ongoing interoperability demonstrations and excellent field results with PCIe 7.0 and PCIe 6.x over optics, help reduce integration and risk and make first-pass silicon success possible.
Ruben Sousa is a senior director for SerDes System Architecture at Synopsys.
Leave a Reply