Is UCIe Really Universal?

Why developing a multi-vendor standard for plug-and-play chiplets is so difficult.

popularity

Chiplets are rapidly becoming the means to overcome the slowing of Moore’s Law, but whether one interface is capable of joining them all together isn’t clear yet. The Universal Chiplet Interconnect Express (UCIe) believes it will work, but some in the industry remain unconvinced.

At least part of the problem is that interconnect standards are never truly finished. Even today, the protocols that power the Internet (TCP/IP) continue to evolve. New technologies, materials and packaging concepts emerge that require standards be flexible enough to evolve over time in order to meet the needs of all players within the industry, including IP providers, designers, foundries and packagers.

UCIe 1.0 was released March 2, 2022, and at that time the initial goals included a physical die-to-die I/O, the adoption of PCIe/CXL high-level protocols for a near-term volume market, and a structure that would enable future extension. One important aspect was that all major packaging technologies were considered — silicon interposer or bridge, RDL fan-out, and organic substrate or laminate.

Moore’s Law fundamentally is the ability to economically increase the number of transistors within a package. “If you look at our ability to scale down the transistors, we’re going to hit a wall because of the physics,” says Nick Ilyadis, senior director of product planning for Achronix. “Being able to integrate functionality in 2D, 2.5D, or even 3D, is going to be key to taking us to the next level of processor, or system performance. The industry needs a standardized interconnect to allow us to scale up these systems — not just in a 2.5D world, but also in a 3D world.”

Continuation of the integration trend means taking things that used to be connected at the board level and moving that integration inside the package. “Within the system and PCB, you’re trying to remove packages and put each chip on the substrate of the package,” says Ramin Farjadrad, founder of Eliyan and developer of Bunch of Wires (BoW). “By doing that, you save the cost of these packages and solve the bandwidth problem between these chiplets, because they can have very high bandwidth between them (see figure 1). That translates into more performance. But we need to have very efficient die-to-die connectivity between these chiplets, as if they’re sitting on the same chip.”

Fig. 1: Main use cases for chiplet technology. Source: Eliyan

Fig. 1: Main use cases for chiplet technology. Source: Eliyan

Other companies are being limited by reticle size and forced to go to multiple dies. “Most of the die-to-die implementations are, and will remain, targeted to connectivity across designs from the same provider,” says Guillaume Boillet, senior director of product management for Arteris IP. “Under this scenario, more lightweight and custom solutions can be built around a simple solution such as 112G XSR, or OpenHBI and BoW for smaller pitch and power consumption.”

The UCIe rollout was not perfect. “One of the issues with the rollout was the messaging around CXL and PCIe,” says Mick Posner, senior director of HPC IP at Synopsys. “That left a number of people in the market with this notion that UCIe carries a lot of baggage. But anyone who’s read the spec sees there are multiple layers defined in the spec. It makes UCIe applicable to multiple use cases, not just the use case where you would utilize either a CXL or PCIe connection on top of it.”

This is where ‘universal’ often is interpreted as ‘master of none.’

“UCIe took the approach of ultimate interoperability at the sacrifice of everything else,” says Elad Alon, co-founder and CEO of Blue Cheetah and adjunct professor at UC Berkeley. “They took this PCIe, PCB notion of things, and tried to force that into the chiplet space. A whole lot of overhead was added, which for the vast majority of cases is not necessary. More importantly, it excludes important segments of the overall market from participating because of cost. For example, it disallowed you from using packages with less than a certain number of layers. Ultimate interoperability comes at the expense of things people care about, such as cost, performance, complexity, power, and things of this nature.”

Still, that’s not necessarily all bad. “The PCI and CXL protocols do carry some legacy that is not desirable for some applications,” said Manuel Mota, senior product manager for UCIe IP at Synopsys. “That’s where some people get shocked. But it’s not the only way of using it. The streaming protocol is an excellent example. It enables very lightweight, very low-latency implementations that extend wires from one fabric, on one SoC, to another SoC, and that covers a lot of the use cases we see in the market. It is a step in the right direction for the chiplet ecosystem. Other standards and proprietary implementations are not enabling that, at least at this stage, because they are not complete, because they are relying on implementation decisions, on critical aspects that define their operation.”

Both sides believe time will show them to be right.

What is UCIe?
Like most communications protocols, UCIe is divided into three stack layers:

  • Physical Layer: This is the electrical specification to the package media. It includes the transmitter and receiver, as well as a sideband channel to enable parameter exchange and negotiation between two dies. That includes the logic PHY, which implements the link initialization, training and calibration algorithms, as well as test and repair functionality.
  • Die-to-die Adapter Layer: This defines link management functionality, as well as protocol arbitration and negotiation. Optional error correction functionality is defined, which is based on a CRC and retry mechanism.
  • Protocol Layer: Multiple protocols could be defined, but release 1.0 defines PCIe, CXL, and a streaming protocol.

It is primarily at the physical layer where other standards and proprietary implementations compete with UCIe. UCIe uses clock forwarding and single-ended, low-voltage DDR signaling (see figure 2).

Fig. 2: Block diagram of the UCIe PHY architecture. Source: Synopsys

Fig. 2: Block diagram of the UCIe PHY architecture. Source: Synopsys

Many of the other standards bodies have been working on this layer for a longer period time, but have yet to tackle the higher levels of the stack (see figure 3).

Fig. 3: Scope of physical layer standards. Source: Synopsys

Fig. 3: Scope of physical layer standards. Source: Synopsys

Of these, Bunch of Wires is the farthest ahead. “Bunch of Wires has a small group of companies that are all working together to build a family of chiplets,” says Achronix’ Ilyadis. “It is slightly ahead of UCIe, because it’s got some tapeouts. But it was a little ahead of where the market was, and a lot of companies fell out of the standards process when UCIe came in. It sucked the air out of the room for everything else. UCIe will be the long-term survivor, because a lot of companies want to have ensured interoperability.”

Work on BoW certainly has not stopped. “Within ODSA, there is a link layer standard that is available in the draft format, and we are expecting to get it formally approved soon,” says Blue Cheetah’s Alon. “It specifically addresses the problem of how to take multiple on-die NoCs and get them connected in a reasonably compatible way, across multiple chiplets, while still retaining flexibility.”

The logical component of the ODSA interface aims to support protocols used for the two most common chiplet use cases, package aggregation and die disaggregation, across a wide range of open and proprietary D2D PHYs, such as PCIe, CXL, CCIX, AXI, and proprietary streaming protocols (see figure 4.)

Fig. 4: ODSA layered communications stack. Source: ODSA

Fig. 4: ODSA layered communications stack. Source: ODSA

Companies already are attempting to build PHYs that can accommodate both standards, and even extend upon them. “We have a PHY that is backward-compatible with UCIe,” says Eliyan’s Farjadrad. “We can have it operate with the UCIe PHY for applications that need that. But we go beyond UCIe, and can have simultaneous bi-directional communications. That gives every wire a 2X advantage over UCIe. We can provide similar performance to advanced packaging, but do this with an organic substrate.”

Other companies are going in the opposite direction. “We have customers that are currently using BoW, and they are asking for backward compatibility because they want to go to UCIe,” says Sue Hung Fung, product line marketing manager for UCIe at Cadence. “BoW had a lot of popularity, but we are seeing members that have pivoted to UCIe instead. We expect HBI to go into dormancy. Another comparison is AIB, which was originally designed for EMIB. In the UCIe spec, Section C, it mentions future plans for implementations to design a UCIe AIB interoperable PHY.”

Longer-term convergence
Ultimately, all of these standards groups are heading toward a similar goal, which is to support a number of link layers and protocols operating over high-performance PHYs. “UCIe, in its present form, is not an ideal die-to-die interface because it is basically taking the protocols that were running on card cages — PCIe cards — and collapsing them down to a die-to-die interface,” says Ilyadis. “This is fine if you’re putting peripherals around a processor. But if you’re trying to build a disaggregated system, then you need some lower-level protocols that are supported by the data-link layer, and specifically the Arm AMBA protocols and things like CHI for coherency. These are what system designers use for on-chip fabrics like AXI. They have to be supported to allow you to truly disaggregate devices and maintain very low latency interconnect between them.”

That legacy may have long-term costs. “The PHY is basically something that muxes a bunch of bits and then demuxes them on the other side,” says Farjadrad. “But if you want to follow the exact protocol defined by UCIe, you have to provide certain sideband signals (see figure 5). Those signals have been done very inefficiently and require four sideband signals plus two additional status signals — six bumps to manage a link. In BoW we do this with just one extra bump. It is not a challenge to add them, but it does impact the efficiency of the bump map. Even if you do not use these in the higher-level protocols, you still have to include them.”

Fig. 5: UCIe main band and sideband signals. Source: Synopsys

Fig. 5: UCIe main band and sideband signals. Source: Synopsys

The journey
Today, everyone creating a package that includes multiple dies is designing everything. They have full control over the PHY layer and the protocols they use for communications. It doesn’t really matter if they fully conform to any of the standards. They only need compatibility between their own dies.

“The third-party target market is like Plato’s ideal world,” says Ilyadis. “That’s the point where you have interoperability, plug and play, between devices. It requires having a packaging technology that is more accessible. It needs to be democratized so smaller companies have access. But it’s a journey.”

The communications standard is just one piece of the puzzle. “There are many issues that one has to work out to result in a true plug-and-play chiplet market,” says Alon. “What’s the bump-out for these chiplets? What the footprint for each chiplet? What’s the standard package that you use for each of these” How do they interact from a power supply standpoint? What’s the partitioning you want? There is a long list of things that nobody has an answer to yet and they’re hard questions.”

How long will it take? “I have talked to a lot of people at various conferences,” says Cadence’s Hung Fung. “The general response is that it is several years away. I’ve heard people say 5, I’ve heard people say even 10. UCIe is moving so fast right now, since its inception, that it could be sooner.”

One of the problems that must be overcome is compatibility testing. “How do you get chiplets from one vendor that are guaranteed to interoperate with the chiplet from some other vendor? That takes a lot of simulation,” Ilyadis says. “And then you put it together and hope it works, because it will be hard to do probing. There has got to be a mechanism by which you have visibility into the actual interfaces and are able to see what’s happening. That testability, verification of the die-to-die, is as important as verifying the innards of any of the devices.”

That process is starting. “When we develop IP, we also develop a test chip,” says Synopsys’ Posner. “Through that test chip development we gain significant expertise in what is required to meet the performance. We do a huge amount of analysis across various interposer topologies and routing configurations. That is exactly the kind of knowledge companies built up when they did proprietary interfaces in the past. We’re taking that information and it becomes part of our design deliverables as collateral.”

The fact that these will now be integrated in-package adds some complications. “There always will be the concept of silicon validation,” says Synopsys’ Mota. “Any reputable IP vendor will build test chips that include multiple dies, so that we can do actual silicon testing. While this is between our own dies, we also are doing that beyond our own dies. We are engaging with other companies to try that with them. That’s the spirit of what UCIe defines as an inter-operation test. It’s going to be a mini PlugFest within the package. It probably will use something like a golden die, or a reference design, connected through UCIe. Then you can test the die-to-die interfaces. Those tests must pack a lot of functionality to ensure it gives you very high coverage.”

True universality
Is UCIe 1.0 really universal? Far from it. It has taken a single use case and defined a standard that addresses the issues of that market. It has not looked at the consumer market, where costs are a larger factor, or where the integration of analog and RF dies may be required. It has not addressed the needs of markets such as automotive, that place specific demands on semiconductors.

“If you look at a lot of IP that has been developed, it is targeting 7nm, 5nm, or even 3nm,” says Ilyadis. “That’s not the technology you will be using to build analog devices. This is a low-voltage interface with 16-gigabit clock rates, and that is going to push you into leading-edge nodes. It’s going to be very good for heterogeneous digital systems, but I don’t think it’s going to give you that full array of chiplets.”

Available IP is tracking current users that are driving the standard. “What you hear companies talking about today is going to higher speeds, pushing the higher data rates, very high bandwidth,” says Mota. “But you have to separate that from what the standards defines, and what the standard dictates you to support. In your implementation you have to enable low-frequency operation so you can interoperate with devices that, by the nature of the process node, cannot go to 60 megabit per second. Some things need to stay at low data rates. They can interoperate with these circuits.”

There are other issues that will be addressed over time. One such problem is caused by the reach of the interconnect, as defined today for interposers and bridges. “GPUs burn several hundreds of watts, and they can operate hot, at 100°C to 105°C,” says Farjadrad. “But DRAM, which may sit next to them, cannot operate at high temperature. It has to operate at 80° to 85°. Because of this temperature crosstalk, they are limited in the rate they can operate.”

Conclusion
Combining multiple dies in a package remains a technology only being used by some of the largest semiconductor companies today, but it is seen as one of the brightest hopes for the continuation of Moore’s Law. Leading-edge companies that are forging the way forward have to solve many problems, especially if the ultimate end goal is the universal plug-and-play of chiplets between vendors.

Getting there will take many small steps, because it is not possible to solve the larger problem in one sprint. It is likely that many pieces will be defined, some based on legacy solutions that are known to have worked in the past, and it is equally likely this approach will lead to solutions that are not optimal for every application.

Nobody can foresee the future, and trying to predict it is often a fool’s errand. The industry has a successful track record of building on the past, even though it is known that many of these decisions are highly sub-optimal today.

Related content
UCIe: Marketing Ruins It Again
Product naming is often irrational, but when it comes to standards, extra care should be taken. It often isn’t.

Standardizing Chiplet Interconnects
Why UCIe is so important for heterogeneous integration.

What is UCIe?
Device interoperability enables the multi-die system market.



3 comments

Raj says:

Great article, thank you for sharing. I think the heat issue alone will make a GPU chiplet irrelevant to graphic intensive users. Also, testing of all permutations of varying chiplets will get mindboggling at some point. It will be interesting to see what innovations will be made in this arena.

Spike says:

As an interconnect beginner, it is a very helpful article. Especially, “the Arm AMBA protocols and things like CHI for coherency. These are what system designers use for on-chip fabrics like AXI” statement is striking to me. It would be very interesting how the UCIe adaptation will proceed.

Thermal Guy says:

Great article! Thanks for the overview. There is a lot going on in this space. It will be interesting to see what applications dictate which interface. Latency often determines what happens next.

Leave a Reply


(Note: This name will be displayed publicly)