PCIe Benefits From AI, Despite Scaling Protocols

CXL is also gaining traction in AI processing, while MIPI and others are growing at the edge.

June 10th, 2026 - By: Bryon Moyer

Key takeaways:

PCIe remains a critical technology for non-AI processing.
For AI, PCIe will be strengthened by scale-out, agentic AI, and even some scale-up.
CXL is seeing uptake, and some even think it could participate in AI processing.

PCIe has been the go-to network for most data traffic moving from a processor to devices located elsewhere, which is also what the new data-center AI scaling networks do. The existence of those new networks could be seen as suggesting that PCIe is inadequate to the task, but it turns out that, rather than being pushed out of the way in AI, a new focus on agentic and other novel AI guises is strengthening rather than weakening PCIe.

CXL is closely related to PCIe and lies atop the PCIe stack. It’s had a slow start as developers evaluate its utility, and some have questioned whether it will ever take off. But it’s starting to show signs of increased vitality as switches enter the market and system developers figure out where they want to employ it.

A long legacy and a slow startup
PCIe was first released in 2003 as a serial interconnect scheme that put the older PCI parallel approach to attaching computer peripherals in the rear-view mirror. Its initial performance was 2.5 Gb/s per lane, with the maximum ×16 configuration yielding 4 GB/s of throughput.

It has since evolved to version 7.0 of the specification, which was released in 2025. It features 128 Gb/s per lane (which includes error-correction bits), yielding a net 242 GB/s throughput.

“PCIe speeds are doubling,” said Arif Khan, vice president of product marketing for design IP at Cadence. “We’ve had a knee in the curve, and the hockey stick has gone way up over the past couple of years.”

PCIe has been the go-to interconnect for pretty much anything computer-oriented. While its initial focus was on personal computers, the performance of recent versions exceeds what is necessary for PCs and is instead being targeted more for data centers.

Years later, in 2019, Compute Express Link, or CXL, came into existence with its 1.0 specification. It’s effectively a memory and coherency add-on atop PCIe, and the first version coincided with PCIe 5.0. The most recent version, CXL 4.0, was released in 2025 and is based on PCIe 7.0.

It has three fundamental components:

A basic non-coherent standard for tasks such as initialization, management, and device discovery (CXL.io).
A means of allowing attached CXL devices to cache host memory and maintain coherency (CXL.cache).
A means of accessing attached memory using load/store semantics just as if the memory were internal to the server (CXL.mem).

CXL adoption has been slow, leading some to question whether it will take off. There may also be some inertia when it comes to learning about CXL. “CXL is not well understood,” said Antonio Costa, senior director of product marketing for PCIe at Synopsys. “As we see more use cases, people will understand how they can use it.”

Recent activity suggests that CXL is showing more signs of life.

New kids on the block
The AI boom, meanwhile, has put intense focus on data centers that execute training and inference workloads. GPUs have hogged the spotlight, and all hands have been focused on improving the performance of GPU-based systems, including their interconnect.

The result has been something of a networking bifurcation, yielding fundamentally different ways of scaling:

Scale-up. Aggregates GPUs in a manner that makes them appear as a single giant GPU with a unified memory space using memory semantics.
Scale-out. Accesses more remote resources using RDMA semantics.
Scale-across. A variant that’s similar to scale-out, but it covers a longer distance.

UALink is a new standard that helps implement scale-up. Nvidia’s proprietary NVLink provides various forms of interconnects, including scale-up.

Scale-out means reaching farther out on the network, an area where Ethernet dominates. But Ethernet has some weaknesses that hurt tail latency in particular, so various modifications are being added to it to improve performance.

These developments have kept the spotlight on AI scaling networks, leaving older interconnects such as PCIe in the shadows. In the past, notions of scale-up would have involved PCIe since that was the basic way of interconnecting processors locally. That role is retreating, at least to some extent. So what does that leave?

It’s about what you’re connecting
If you’re trying to follow all the AI-related developments, you might conclude that GPUs are all that matters, which can lead to the assumption that scale-up networks are taking over. But that’s not the case.

GPUs do much of the heavy mathematical lifting for AI. Scale-up networks specifically interconnect GPUs, with no intervening CPU. That’s very different from any other interconnects, which typically go through the CPU.

“What chip protocol are you relying on for your CPU to talk to your accelerator card? In many cases, your accelerator is sitting in a PCIe slot,” Khan said.

And that’s where PCIe’s role remains. When does the CPU talk to a GPU? That won’t be over something like UALink. It will be over PCIe, along with anything the CPU touches.

This directly affects scale-out, which exploits Ethernet variants. “PCIe is particularly well suited for enabling scale-out architectures,” observed Lou Ternullo, senior director, product marketing, silicon IP at Rambus.

PCIe connects the CPU to the network interface card (NIC), and so even though scale-out discussions typically omit this, PCIe is involved. “Smart NICs have a lot of bandwidth in and out, so that’s a place where the newer speeds of PCIe are still very widely used,” noted Khan.

Others agree. “Scale-out actually broadens the market for PCIe because you use the NIC, which relies on PCIe,” said Hui Wu, product marketing director for high-speed SerDes at Cadence.

That has been true for as long as scale-out has been a consideration, but newer AI developments are changing the playing field. Although traditional AI workloads have been all about the GPUs, with the CPU having little role other than to summon and support the GPU, agentic AI is changing that.

Agentic AI needs CPUs
AI agents perform tasks, which today help humans make better decisions or be more productive. But an agent largely executes on a CPU, offloading various inference workloads to the GPU. It must make decisions, and decisions typically involve branching, an operation poorly suited to GPUs. That means launching an agent sets off a process in which workloads can bounce back and forth between the CPU and GPU before the task is complete.

“You have AI accelerators that compute the next step, but then you have the CPU that will take the action,” Synopsys’ Costa said. “The more agentic AI you have, the more CPUs you need to execute those actions, and you need to scale compute.”

As agentic AI takes off, demand for CPUs is growing, and those CPUs need interconnect. “We used to have an 8:1 ratio of GPUs to CPUs,” said Bob Beachler, corporate vice president, marketing at Efinix. “With agentic AI, it’s 1:1 — one CPU and one GPU.”

PCIe is the natural choice for such connections. And that means PCIe’s role even within AI should grow.

PCIe does scale-up
The introduction of UALink for scale-up would suggest that something was needed to fill the gaps in PCIe’s capabilities. In some cases, that’s true. But not every system requires the ultimate in throughput, and it turns out PCIe can serve for scale-up, as well.

“Today, almost every endpoint supports PCIe interconnect to interface with a CPU,” explained Ternullo. “If we set aside NVLink, virtually 100% of accelerators are using PCIe as their primary interconnect. This ubiquity is a key reason PCIe remains the easiest option for scale-up scenarios such as connecting multiple accelerators (GPUs) via a PCIe switch. While emerging networks like UALink may offer higher throughput compared to PCIe, the accessibility and widespread adoption of the PCIe fabric and its ecosystem make it a compelling option for GPU/ASIC accelerator scale-up applications. I believe a primary reason we are seeing the PCIe specifications advance so quickly, now promoting PCIe 8 0.5, is to further promote scale-up with PCIe.”

UALink will certainly have its place in scale-up architectures and will be used where it fits best, but PCIe’s compatibility with all endpoints makes it easier to adopt in most environments. It’s also important to keep in mind that the best-performing solution doesn’t always become the standard. Cost and accessibility are significant factors in these decisions, often outweighing pure technical performance.”

The edge has other protocols, too
Much of the scaling discussion applies to data centers, which are garnering more than their fair share of attention. But AI is also growing at the edge, which can feature other protocols, especially when sensors are involved.

Cameras are a good example of this, and the MIPI protocols focus on getting camera data to whoever needs it. “MIPI typically lives at the edge — cameras, displays, sensors, mobile storage,” said Justin Endo, director, marketing and sales at Mixel, a Silvaco company. “It’s built for streaming traffic with low power, low latency, and a thin protocol stack.”

This isn’t a job that PCIe would typically perform. “PCIe and CXL handle the heavy lifting between chips for general-purpose compute and storage. PCIe and MIPI overlap in places, such as UFS over M-PHY vs. NVMe-over-PCIe in mobile storage, but they sit at different layers of the same stack rather than fighting for the same socket,” Endo explained.

Often, that edge data will be destined for the data center, but sending raw data would require too much bandwidth. “Take MIPI CSI 2 over MIPI PHYs, which carries camera data from image sensors into edge SoCs across phones, cars, drones, and industrial vision systems,” he said. “For most inference deployments, you can’t push that raw imagery wholesale into the data center, because the volume, power, and latency costs would all be prohibitive. CSI-2 feeding a local image pipeline lets the time-critical work happen right at the sensor, for example, for ISP, perception, sensor fusion, or on-device inference. What flows upstream [through PCIe] is typically the distilled result, such as features, embeddings, and metadata, rather than raw pixels. CSI shapes the traffic before PCIe ever sees it. That frees PCIe and the fabrics around it, such as CXL, UALink, and Ultra Ethernet, to focus on what they’re good at — training, memory pooling, and accelerator-to-accelerator at rack scale.”

Older PCIe revisions still have plenty of life
While AI demands the highest bandwidth available, plenty of other applications work just fine with older versions of PCIe.

“With a lot of SSDs and other consumer appliances, customers are happy with PCIe 2.0 and 3.0 data rates, and you will see that for a long time to come,” said Khan.

Other systems may require higher speeds, but not necessarily the highest one. “We still don’t have a PCIe 6.0 CPU system in the market today,” added Khan. “The lead OEMs have not productized those yet. They’re still in internal testing or testing with customers. But even for PCIe 5.0, after the spec was finalized, it did take a long time for AMD and Intel systems to enter the market.”

SSDs are migrating to faster versions of PCIe, although not always the fastest. “PCIe 6.0 SSDs are fairly common now, and we’ve got customers that are designing those,” said Khan. “We’ve got customers that have started designs with PCIe 7.0 because they want to hit those points of the system.”

CXL on the upswing
Since PCIe remains strong, it remains firmly in place as the underlying CXL PHY. The revision timing, however, may have hurt CXL’s acceptance. “CXL was hamstrung by the fact that it was piggybacking on the PCI 5.0 PHY, and those systems took forever to come to market,” Khan observed. “Then, the CXL spec evolved so much in the same time period. So what would the adopters do? CXL 3.0 systems were just getting designed and getting ready for market, and the specs moved on to 4.0.”

On top of that, different use cases for CXL have not earned equal acceptance. Three official use cases have been formally identified, with CXL devices implementing them organized as Type 1, Type 2, and Type 3:

Type 1 devices support accelerator attachment with coherent memory.
Type 2 devices can access host memory. If they have their own attached memory, the host can access it.
Type 3 devices allow a host to access and manage remote memory coherently. Depending on how it’s built, it can be considered a memory extension or memory pooling.

Using CXL for caching has met some resistance due to latency. “What we see on CXL is a desire to expand cache memory and main memory. On the cache side, CXL has been a bit weak because the cache has to demonstrate very low latency, and we don’t see customers using that relief for expanding cache,” said Costa. “But for memory extension, it’s popular because you have limited HBM or DDR memory in your system. If you want to expand beyond that, CXL is a very good technology because you have low latency and you can still get data back into your cache.”

Memory pooling debate
Meanwhile, there’s a debate about the effectiveness of memory pooling. Such considerations slow the overall adoption rate as the industry figures out which parts of CXL work best for them.

That said, the CXL ecosystem is expanding, giving developers more confidence in CXL’s traction. “The broader PCIe and CXL interconnect ecosystems continue to mature, further enhancing the value of memory expansion,” said Ternullo. “CXL switches are beginning to appear more frequently in the market, and extensions to PCIe reach are further supporting scalable memory pooling architectures.”

Agentic AI may also boost CXL. “I believe the next generation of CXL will be even more important, because we see all this agentic AI requiring more and more compute, and CXL — we are confident — will play a role there,” said Costa.

Developments continue
Further bolstering the continued demand for PCIe and CXL is the expectation of future revisions that build on what’s available now. PCIe 8.0 is expected in 2029, to double bandwidth from PCIe 7.0.

“We will see 256 Gb/s per lane, and a maximum of 16 lanes,” said Costa. “It will keep the same signaling and use the same flit structure on the controller side, so there’s no radical change.”

CXL is working on its 5.0 revision, but the CXL Consortium declined to forecast a target availability date.

In addition, efforts are underway to lengthen PCIe connections with new cabling standards. Most prominently, CopprLink was released two years ago to support connections as long as two meters at PCIe 5.0 and 6.0 speeds. Work is underway to include PCIe 7.0, although the PCIe SIG declined to provide an expected release date.

Beyond copper, exploratory efforts continue for defining an optical format for carrying PCIe traffic. Here again, no expected release date is available.

“The growing deployment of PCIe retimers and switches, combined with emerging PCIe cabling technologies, such as the CopprLink and PCIe over optical, will extend the reach of the PCIe fabric, further enabling both scale-out and scale-up deployments,” said Ternullo.

No end in sight
These add to the indicators that, rather than AI being a threat to PCIe, it’s likely to strengthen it. Many devices will happily remain with older versions since the newer speeds primarily benefit data centers. But demand for higher bandwidth remains strong.

And while CXL activity has been slow, it’s accelerating. It’s still too early to declare absolute victory, but signs are strengthening.

All of this reinforces the fact that the newer standards are merely adding to what already exists, rather than replacing it. It makes for more networking complexity, given the number of options, but application expectations are clear. There’s no question as to what kinds of systems require, for example, UALink. But whether or not you’re scaling for AI in the data center, you’ll most likely be using PCIe.

Related Articles

Scale-up, Scale-out Get A New Partner
For reaching farther into another data center, developers are now talking about scale-across.

Confusion Grows With More Interconnect Options And Tradeoffs
Each standard serves a specific use case, so chip architects are choosing more than one for a single design.

Options Grow For Standardizing Data Movement And Sharing Resources
But figuring out which ones to use, and when to use them, isn’t always clear.

Bryon Moyer

(all posts)
Bryon Moyer is a technology editor at Semiconductor Engineering.

PCIe Benefits From AI, Despite Scaling Protocols

Bryon Moyer

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Flash Getting Stacked High-Bandwidth Version

Can Edge AI Keep Up?

Chiplets Need A New Workflow

Agentic AI Is Changing Data Center Architectures

Gates Add Functionality, But Wires Create Problems

Where Does Quantum Computing Stand?

A New Era For Co-Processing

AI Is Rewriting The IP Playbook

Sponsors

Recent Comments

About

Navigation

Connect With Us

PCIe Benefits From AI, Despite Scaling Protocols

Bryon Moyer

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Flash Getting Stacked High-Bandwidth Version

Can Edge AI Keep Up?

Chiplets Need A New Workflow

Agentic AI Is Changing Data Center Architectures

Gates Add Functionality, But Wires Create Problems

Where Does Quantum Computing Stand?

A New Era For Co-Processing

AI Is Rewriting The IP Playbook

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored