Link reliability and error minimization are crucial, as a single data packet error can significantly impact throughput.
Large language models (LLMs) are experiencing an explosive growth in parameter count. Training these ever-larger models requires multiple accelerators to work together, and the bandwidth between these accelerators directly limits the size of trainable LLMs in High Performance Computing (HPC) environments.
The correlation between the LLM size and data rates of interconnect technology herald a new era of AI experiences, made possible by the bandwidth and low-latency provided by silicon-proven 224G Ethernet PHY technology. To enable the next generation of compute throughput and support the training of ever larger and more sophisticated LLMs, 224G PHY must be widely adopted and deployed across the entire AI ecosystem. This includes integrating the technology into an array of critical components, such as accelerators, processors, retimers, switches, network interface cards (NICs), Data Processing Units (DPUs), and optical Digital Signal Processing (DSPs).
This article presents a comprehensive analysis of 3nm silicon results and their significance in the context of high-performance compute disaggregation, a critical milestone for achieving 1.6Tbps design success as the electrical specification for 224G technology is still in the drafting phase.
Fig. 1: Tx Eye diagram of 3nm 224G PHY demonstrating high linearity and low jitter.
In the high-performance computing environment, link reliability and error minimization are crucial. In scale-out compute disaggregation, a single data packet error may result in the resending of the entire packet, significantly impacting throughput. Therefore, minimizing packet error is essential.
This is where the forward error correction (FEC) implemented within 224G PHYs play a vital role. The FEC adds redundant data to the packets, allowing the receiving end to detect and correct errors without requiring a full resend. For links using backplane channels, copper twin-ax cables or direct detect optics, Reed-Solomon RS (514,544) FEC is used. It adds 544-514=30 parity symbols, equaling 300 bits (as each symbol is 10 bits).
The FEC can correct errors only when the number of bits in error does not exceed certain limits. If the number of parity symbols is x, then RS-FEC can correct up to x/2 symbols. If more than x/2 symbols contain errors, then the codeword is uncorrectable. For RS(514,544), x=30 and so up to x/2=15 symbols errors in a codeword are correctable.
The performance of RS (544,514) FEC can be analyzed using symbol error bins or buckets. These bins represent the number of symbol errors in an RS codeword, also referred to as the “bin-count”. As RS (544,514) code can correct up to 15 symbol errors per codeword, codewords with 15 or fewer symbol errors can be successfully corrected, while those with more than 15 errors typically cannot be corrected.
Figure 2 shows an illustrative silicon result of bin distribution of 224G 3nm PHY. The plot demonstrates no code word experienced more than 4 symbol errors, indicating zero post-FEC error with substantial margin.
Fig. 2: FEC Symbol error per RS (514, 544) code word.
Standards bodies conduct technical analysis to establish the relationship between pre- and post-FEC performance. This analysis determines the pre-FEC or raw PHY BER required to achieve an error free link after FEC is applied. Currently, industry standards are targeting a pre-FEC BER of 1e-4 (1 in 10,000) as the threshold for ensuring error-free operation post FEC.
Two SoCs implementing 224G SerDes can be interconnected through diverse channels, with channel loss ranging from the low teems to over 40dB. To ensure an error free link, it is crucial to maintain compliant pre-FEC raw BER. Figure 3 illustrates performance of 3nm 224G silicon across channels ranging from 13dB to 42dB, demonstrating margin of 100,000x to 1,000,000,000x better than the specification.
Fig. 3: BER with varying channel loss.
Tail latency refers to a small percentage of response times from a system that are significantly longer than the median response time. A consistent BER over time ensures that the communication link remains stable and reliable over extended periods of operation, which is essential for maintaining low tail latency in accelerator-to-accelerator links for compute disaggregation. For a given channel, figure 4 shows consistent BER over multiple reads.
Fig. 4: BER over time.
OIF’s Common Electrical I/O-224G-Linear builds on the CEI-112G-Linear approach and will support 224G full linear optical modules for next-generation applications. The linear interface eliminates the requirement of retiming the electrical signal before driving optical components. This approach not only saves power by eliminating addition retiming function but also reduces mission mode latency. In this scheme, the DSP needed to compensate signal degradation due to optical transmission is leveraged from the electrical PHY driving the photonic components. Figure 5 shows implementation of a 224G linear electro-optical-electrical link.
Fig. 5: Linear Electro Optical Electrical Link with 224G PHY.
A demonstration of this technology at OFC 2024, with Synopsys 224G TX over optics performance with OpenLight’s 200G per Lambda PIC, enabling Linear/direct drive without retimers can be viewed here.
The 224G technology is a crucial enabler for next generation high bandwidth applications, including 1.6Tbps Ethernet, Ultra Ethernet, and advanced accelerator-to-accelerator links. Therefore, a silicon proven PHY is essential for the entire compute disaggregation ecosystem to harness the full potential of these higher bandwidths. This advancement not only supports current demands for data intensive applications but also paves the way for future innovations in high performance computing, AI/ML workloads, and data center architectures.
In addition to providing to the world’s first complete 1.6T Ethernet IP solution, Synopsys silicon-proven Synopsys 224G Ethernet PHY IP enables a variety of channels and process nodes for the next generation accelerator-to-accelerator links. Learn more here.
Leave a Reply