SPONSOR BLOG

Accelerating AI And ML Applications With PCIe 5

Key aspects of building a PCIe 5 implementation that can meet the rapidly growing demands of cloud computing and AI/ML applications.

January 16th, 2020 - By: Suresh Andani

The rapid adoption of sophisticated artificial intelligence/machine learning (AI/ML) applications and the shift to cloud-based workloads has significantly increased network traffic in recent years. Historically, the intensive use of virtualization ensured that server compute capacity adequately met the need of heavy workloads. This was achieved by dividing or partitioning a single (physical) server into multiple virtual servers to intelligently extend and optimize utilization. However, this paradigm can no longer keep up with the AI/ML applications and cloud-based workloads that are quickly outpacing server compute capacity.

AI & ML applications
AI workloads – including machine learning and deep learning – require a new generation of computing architectures. This is because AI applications generate, move and process massive amounts of data at real time speeds. For example, a smart car generates around 4TB of data per day, while AI and ML training model sizes continue to double approximately every 3-4 months!

To be sure, AI applications across multiple verticals are demanding significant amounts of memory bandwidth to support the processing of extremely large data sets. Moreover, unlike traditional multi-level caching architectures, AI applications require direct and fast access to memory. Additional characteristics and requirements of AI-specific applications include parallel computing, low-precision computing and empirical analysis assumption. Simply put, AI/ML workloads are extremely compute intensive – and they are shifting system architecture from traditional CPU-based computing towards more heterogenous/distributed computing.

Cloud computing & networking
Looking beyond AI/ML applications, the conventional data center paradigm is evolving due to the ongoing shift to cloud computing. Enterprise workloads are moving to the cloud: 45% were cloud-based in 2017, while over 60% were cloud-based in 2019! As such, data centers are leveraging hyperscale computing and networking to meet the needs of cloud-based workloads. Because the economies of scale are driven by increasing the bandwidth per physical unit of space, this new cloud-based model (along with AI/ML applications) is accelerating the adoption of higher speed networking protocols that double in speed approximately every two years: 100GbE ->200GbE-> 400GbE->800GbE. The steady march towards 400GbE cloud networking and the evolution of sophisticated AI/ML workloads is pushing the need for doubling the PCIe bandwidth every two years to effectively move data between compute nodes.

PCIe 5 interface requirements
PCIe5 – with an aggregate link bandwidth of 128GB/s in a x16 configuration – addresses these demands without ‘boiling the ocean’ as it is built on the proven PCIe framework. Essentially, the PCIe interface is the backbone that moves high-bandwidth data between various compute nodes (CPUs, GPUs, FPGAs, custom-build ASIC accelerators) in a heterogenous compute setup. For system designers, significant signal integrity experience is required to support the latest networking protocols like 400GbE. The performance of SoCs is contingent upon how fast data can be moved in, out and between other components. Because the physical size of SoCs remain approximately constant, bandwidth increases are primarily achieved by increasing the speed (data rate) of data per pin. Issues related to higher speeds – such as loss, cross talk and reflections – all become more pronounced as data rates increase.

As we discussed above, significant increases in speed are necessary to support AI/ML applications such as massive training models and real-time inference. This means that all supporting technologies – such as CPU, memory access bandwidth and interface speeds – need to double every 1-2 years. PCIe 5.0, the latest PCIe standard, represents a doubling over PCIe 4.0: 32GT/s vs. 16GT/s, with a x16 link bandwidth of 128 GBps.

To effectively meet the demands of AI/ML applications and cloud-based workloads, a PCIe 5.0 interface should be a comprehensive solution built on an advanced process node such as 7nm (FINFET). In addition, the solution should comprise a co-verified PHY and digital controller. As well, the PCIe 5.0 interface should support Compute Express Link (CXL) connectivity between host processor and workload accelerators for heterogenous computing. More specifically, the introduction of CXL (which uses the same transport layer as PCIe5) provides high-performance computing (HPC) and AI/ML system designers with a low-latency cache- coherent interconnect to virtually unify the system memory across various compute nodes.

Additional key features and capabilities should include:

32 GT/s bandwidth per lane with 128 GB/s bandwidth in x16 configuration
Backward compatibility to PCIe 4.0, 3.0 and 2.0
Advanced multi-tap transceiver and receiver equalization to compensate for more than 36dB of insertion loss

In conclusion, the rapid adoption of sophisticated AI/ML applications and cloud-based workloads is significantly increasing network traffic. The insatiable demand for more bandwidth to support these applications and workloads has accelerated the adoption of higher speed networking protocols that double in speed approximately every two years. PCIe 5.0, the latest PCIe standard, represents a doubling over PCIe 4.0: 32GT/s vs. 16GT/s, with an aggregate x16 link bandwidth of 128 GBps. At these speeds, it is important for systems designers to have significant signal integrity experience to prevent loss, cross talk and reflections.

Additional Resources
Complete Interface Solution for PCI Express 5.0 Launched (Blog)
Rambus PCIe 5.0 SerDes PHY
Rambus PCI Express (PCIe) 5.0 PHY Product Brief
PCIe 5.0 Controller Product Brief

Suresh Andani

(all posts)
Suresh Andani is the senior director of product marketing at Rambus. He is responsible for SerDes interface IP products. Prior to joining Rambus in 2019, Andani worked at Intel Programmable Solutions Group (previously Altera) for 13 years where he led a Systems Engineering and Applications team responsible to build and support FPGA-based system solutions for Enterprise-Cloud-Wireline-Wireless applications. Andani holds an MSEE degree from the University of Southern California in Los Angeles and a BSEE degree from the National Institute of Technology in India.

Accelerating AI And ML Applications With PCIe 5

Suresh Andani

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Flash Getting Stacked High-Bandwidth Version

Can Edge AI Keep Up?

Chiplets Need A New Workflow

Agentic AI Is Changing Data Center Architectures

Gates Add Functionality, But Wires Create Problems

Where Does Quantum Computing Stand?

AI Is Rewriting The IP Playbook

A New Era For Co-Processing

Sponsors

Recent Comments

About

Navigation

Connect With Us

Accelerating AI And ML Applications With PCIe 5

Suresh Andani

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Flash Getting Stacked High-Bandwidth Version

Can Edge AI Keep Up?

Chiplets Need A New Workflow

Agentic AI Is Changing Data Center Architectures

Gates Add Functionality, But Wires Create Problems

Where Does Quantum Computing Stand?

AI Is Rewriting The IP Playbook

A New Era For Co-Processing

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored