Link layer retry fixes packet loss locally and avoids expensive recovery mechanisms.
As Artificial Intelligence (AI) and High-Performance Computing (HPC) systems become the backbone of modern data centers, they generate and consume a massive amount of data. Traditional Ethernet was not built for such high-bandwidth traffic.
In HPCs and AI models, computations are distributed across the nodes and the data is shared in real time with low latency and lossless communication. As all the processes are synchronized with each other, a slight delay or packet loss can slow down the whole system. The packet loss leads to major performance degradation, and traditional recovery mechanisms at higher layers are too slow.
Hence, the Ultra Ethernet Consortium is introducing a link-layer retry mechanism to avoid costly recovery at higher layers.
As we discussed, in traditional Ethernet, packet loss is usually handled at higher layers like TCP, and it leads to high latency. There are some alternate solutions available, like RDMA, which is used in InfiniBand or RoCE, but they are complex and vendor-specific.
LLR offers a middle ground by providing reliable loss recovery without RDMA and using existing Ethernet, providing a simpler solution.
In LLR, instead of involving the whole protocol stack, including CPU, Packet loss is detected and retried at the local link layer, resulting in better throughput and low latency.
In layman’s terms,
The sender retransmits only lost/corrupted frames from the local retry buffer, avoiding transport-level retransmission. Once the Sender receives the ACKs, it will remove the entries from the local retry buffer.
UEC outlines a layered Ethernet stack consisting of:
LLR sits above the PHY and below the UET layer, ensuring the loss resilience before the higher layers get involved.
By fixing packet loss locally, Link Layer Retry avoids expensive recovery mechanisms and provides a more open, scalable, and efficient solution for AI networking and HPCs.
With the availability of the Cadence Verification IP for Ethernet UEC, adopters can start working with these specifications immediately, ensuring compliance with the standard and achieving the fastest path to IP and SoC verification closure. Incorporating the latest protocol updates, the mature and comprehensive Cadence Verification IP (VIP) for the Ethernet protocol provides a complete bus functional model (BFM), integrated automatic protocol checks, and coverage model. Designed for easy integration in test benches at IP, system-on-chip (SoC), and system levels, the VIP for Ethernet helps you reduce the time to test, accelerate verification closure, and ensure end-product quality. The VIP for Ethernet runs on all major simulators and supports SystemVerilog and e-verification languages and associated methodologies, including the Universal Verification Methodology (UVM).
Leave a Reply