Version 3.0 of the interconnect standard doubles bandwidth and supports new use cases and enhanced manageability.
Key Takeaways
Tracking with the uptick in chiplet usage, especially in the data center, UCIe announced its 3.0 revision, keeping to the annual update cadence it has followed since 2023. It doubles data rates, improves manageability, and covers three new situations that hadn’t been easily handled in prior revisions.
As these advancements continue to reshape the landscape, it’s important to understand how they impact real-world applications.
“AI data center workloads are now demanding compute and bandwidth at the scale that simply can’t be delivered by traditional monolithic dies,” said Archana Cheruliyil, principal product marketing manager at Alphawave Semi, in a joint webinar with Siemens EDA. “The reticle limit, yield constraints, and power delivery challenges make it impractical — and in many cases impossible — to build these massive devices as single chips.”
This means the chiplets must talk to each other. “Some of the toughest heterogeneous-integration problems involve achieving effective communication between chiplets and managing the complexities of routing and connectivity,” added Kendall Hiles, 3D-IC packaging flow senior product specialist at Siemens EDA.
UCIe and Bunch of Wires (BoW) are the two primary die-to-die standards targeting inter-chiplet connections. UCIe was launched in 2022 and takes a comprehensive view of chiplet interconnect, covering both physical signaling at the lowest level and adapting protocols atop it. The UCIe Consortium has an eye on a possible future marketplace for chiplets, where interoperability will be critical.
The 2024 2.0 release introduced new management functions, among other things. A general industry concern at the time was the expanse of those features, which gave UCIe a heavyweight reputation. What was not as clear at that time was that many of those features are optional, allowing developers to decide which features to implement.
“Initially, when people went to the spec, they thought, ‘Oh, it’s very heavy, and it has all these features I don’t need,’” said Mayank Bhatnagar, product marketing director for die-to-die interface IP at Cadence. “But most of the features are optional.”
That sentiment has abated since then, as has some of the reluctance to adopt something perceived as so heavy. The 3.0 release appears to have generated less controversy.
Bandwidth for planar packaging
The maximum allowed data rate has risen from 32 to 64 GT/s (where T is “transfer”) for UCIe-S (standard packaging with organic routing and C4 bumps) and UCIe-A (advanced packaging). But it doubles only for 2D and 2.5D designs. A 48-GT/s data rate is also called out.
“That’s the major upside for UCIe 3.0, and that’s one that drives the market,” said Manuel Mota, senior product manager at Synopsys. “It’s not that the previous limits were insufficient for all applications, just that the industry is moving, and we are moving with it.”
The reason 3D was left out of that jump has to do with shoreline. In planar designs, signals migrate toward the edge of the die for connection to a neighboring die, which makes shoreline a resource limited by the periphery of the die.
In a 3D stack, one die isn’t talking to a die next to it. It’s communicating with a die above or below it using through-silicon vias (TSVs). The number of allowed TSVs is determined by the area of the die, not the periphery. The UCIe Consortium believes that area should provide plenty of room for signals without doubling the data rate.
“For 3D, we don’t need to double the data rate,” said Debendra Das Sharma, chairman of the UCIe Consortium and an Intel senior fellow. “In fact, what we have there now is really good.”
One clock, four edges
The higher data rates are possible by using quarter-rate signaling. This was available before for 32 GT/s, but not higher. It’s now extended to allow 48 and 64 GT/s. Quarter-rate signaling can also be called quad-data-rate (QDR) signaling. Two DDR clock signals are generated internally from the basic clock. They’re driven by the same source, but they’re 90 degrees apart.
For 64 Gbps, “the clock is 16 Gbps, and there are two clocks running,” said Das Sharma. “One is 90 degrees off the other. The sending side is going to shift it by 90 degrees, and then the receiving side is also going to shift it by 90 degrees and capture the data.”
This means there is a rising and a falling edge on each of these two internal clocks, giving edges at 0, 90, 180, and 270 degrees.
“Quarter-rate is a really good thing because it significantly reduces the risk users and vendors have when they’re making new IP,” noted Bhatnagar.
Reliability and power
The bit error rates (BER) for 48 and 64 GT/s are different, however. The 48-GT/s BER is 1015, whereas the 64-GT/s BER is 1012. While one is three orders of magnitude lower than the other, both are acceptable, according to the UCIe Consortium — especially given CRC and replay.
Power remains below 0.5 pJ/bit at slower data rates. Faster designs require enhanced equalization that consumes more energy, taking the target to 0.75 pJ/bit, as shown in Figure 1.

Fig. 1: Key UCIe 3.0 metrics. The first four items are specifications that are expected to yield the performance shown in the subsequent lines. Source: The UCIe Consortium
The added bandwidth is available with no change in bump locations from prior versions, making it fully backward compatible. The higher speeds, however, can make it more challenging to ensure signal integrity.
Despite these advancements in signaling, reliability, and power efficiency, the broader implications for system design and integration are becoming increasingly significant. As the industry moves forward, new challenges and solutions are emerging to address these evolving demands.
“AI is driving unprecedented bandwidth demands, and the die-to-die connectivity is becoming exponentially harder to analyze,” noted Emily Yan, senior marketing strategy manager at Siemens EDA.
Heterogeneous integration brings new layers of complexity, from escalating power and thermal demands to full system-level verification across stacked architectures. “As UCIe pushes toward 64 Gbps, design margins shrink, routing densities increase, and signal integrity risks multiply, making system-level closure far more challenging than ever,” said Siemens’ Hiles.
Better booting and priority
Manageability was introduced in UCIe 2.0, and it’s received some upgrades with 3.0. One aspect addresses how an advanced package will boot up.
The original way to boot a chiplet-based package was to have separate firmware files or loaders for each chiplet requiring boot. With the current revision, those files can be combined into one source for all chiplets, or some combination of them.
“If you have a system built of many chiplets, then if each chiplet needs its own firmware, you will get this chicken-and-egg problem where you either have an independent path to load firmware into each chiplet, and then things can start all at the same time, or you need to wake up all the UCIe interfaces along the way, and you need to have a path to bring firmware to all the chiplets,” Mota explained. “What UCIe is giving you is a path and the methodology to distribute firmware across your chiplets through UCIe, either main band or sideband, in a way that is consistent and avoids you having to have flash memory links to each chiplet, or independent parallel paths to a chiplet, to bring in firmware.”
Before UCIe 3.0, priority notification events were sent on the main band. That occasionally resulted in an important message being blocked by some lower-priority data. In addition, everything that goes through the main band must pass muster through the root of trust (RoT) on the “lead” chiplet, slowing messages down. Those messages can now be passed over the sideband, which is slower, but it should be more available than the main band and doesn’t suffer RoT delays.
“You pull the clock low for eight cycles, and that tells the other side that on an 8 UI [unit interval] boundary, the next one is a higher priority one,” said Das Sharma. “It will go for 64 UI, and then after that you resume the longer payload.”
Security for the sideband, however, is a work in progress.
Longer sideband reach and new open-drain pins
Signal reach is limited for these high-speed interfaces, and with UCIe, it has been the same for main-band and sideband signals. But the sideband operates at a much lower frequency than what the main band is capable of, so that reach limitation ends up being more restrictive than it needs to be.
“With the side band, you could go only 25mm. Now you can go as far as 100mm,” said Bhatnagar.
“The reason they do that is enable the trace to be shared between multiple chiplets so you can have a star connection of the sideband,” said Mota. “The main band stays at the same distances [as before] because it is a point-to-point connection.”
The 3.0 revision allows signals to travel farther in one hop. It is still possible, however, to have messages forwarded from one chiplet to another.
Two new open-drain pins enable fast throttling and emergency shutdown. If any of the chiplets detects that it’s getting too warm, then one threshold allows for slowing things down, while a higher one provides for all chiplets to shut down to avoid damage from overheating.
“If there is a problem, you need to act quickly, and if you do it through the usual means — sideband or main band — this will be slow,” said Mota. “What they’re giving you is a standard way of reacting quickly with an open drain. If one chiplet gets too hot, you tell everybody at the same time, ‘I’m going to the next state,’ and they all get it immediately without going through protocols for shutdown.”
Such a capability can have a big impact on overall system reliability. “This support for fast throttling and emergency shutdown is very important, especially when it comes to automotive applications from a reliability perspective,” observed Bhatnagar.

Fig. 2: One example of how to exploit the open-drain signals. This is intended to allow throttling or shutdown of all chiplets at once as temperature rises. “Prochot” is a signal indicating that the processor is too hot. Source: The UCIe Consortium
Better streaming and recalibration
UCIe 3.0 covers some situations that are not addressed in prior versions. The most prominent one is for continuous streaming. Technically, this isn’t a new feature, but it has been difficult to implement without interruptions, requiring markers that poach bandwidth in the main band and with limited available clock frequencies.
The applications requiring this generate data at a constant rate, and that data must be transferred at that rate. Antennas are a good example, potentially having digital data generated on one chiplet that must then communicate with another one, such as an SoC.

Fig. 3: Continuous streaming. Here, ADCs on one chiplet generate a constant stream of data that must be transferred to another chiplet over UCIe. The main band can now be fully utilized to maintain that constant flow, with synchronization and parity sent over the Valid signal. Source: The UCIe Consortium
Clocking rates in such systems are fussy, as you don’t want to work with frequencies that might cause beating against the signals being sampled. In prior revisions, UCIe clocking had specific allowable frequencies that may not have been what the designers would choose. By allowing a range of frequencies, “designers can use UCIe in frequencies that will not beat with the RF channels,” said Mota.
Rather than a specified clock with PLLs to derive the desired clock, the data rate comes directly from the reference clock. Changes to the data rate come simply from adjusting that clock frequency. The encodings employed on the valid signal have been reused so that synchronization and parity can be handled out of band.
Another new feature allows the receiving side of a link to request recalibration from the transmitter. This can take some burden off the receiver for accepting data that’s drifting over time. In addition, it eases initialization since recalibration is possible without reinitializing. That reduces power.
“You are giving the PHY the means to recalibrate while in operation, so you can support a wider range of ambient environment conditions,” noted Mota.
Deeper sleep and CHI
The final new use case deals with deep sleep mode (L2). Before UCIe 3.0, the sideband always remained on even when the main band went to sleep. A new capability has been added to shut the sideband down while keeping a small circuit awake to detect when to exit the sleep mode, reducing power while sleeping.
“The sideband has to stay active, and it has to be awake so that it can wake up the rest of the things,” explained Das Sharma. “This means we are allowing for power gating most of the sideband.”
Not much must remain always-on to implement a wake-up when necessary. “Two or three gates watch for the transition,” Das Sharma said. “I don’t even need a clock there. If I just keep that powered on and the rest of the things powered off, I can get even deeper power savings.”
“You now have a means to wake up the sideband without going through the whole initialization procedure,” Mota said.
One last development comes from outside the UCIe Consortium. The consortium already had built PCIe and CXL over UCIe. Now Arm has followed suit, providing its popular CHI coherent protocol over UCIe.
“A lot of people want to run the AMBA protocol across UCIe,” said Eddie Ramirez, vice president of marketing for the Infrastructure Line of Business at Arm. “People like the protocol because it allows memory and I/O coherency. Nvidia runs the AMBA protocol across NVLink in its Grace Blackwell implementation. The comparison point is not necessarily to UCIe itself, because it’s just the physical interface. The closest comparison is what people are trying to do with CXL Type 2.”
This was done by fitting the CHI data into the UCIe format. “Arm defined ways of mapping CHI chip-to-chip into flits that can be transported in a simple way over UCIe,” explained Mota.
These advancements collectively show progress in both power efficiency and protocol support for UCIe. As the technology evolves, these updates help pave the way for broader adoption and more versatile application scenarios, setting the stage for what comes next.
The major pieces are in place
UCIe has gone down the road of being released initially with a minimal set of features and relying on feedback and revisions to fill in the gaps. The first couple of versions were significant upgrades to the original, but although 3.0 includes some nontrivial new features, none of them require any change to the interface. Existing signals, such as the Valid pin, are being reused for new features. The open-drain pins are new, but they are additions to, not changes from, the prior interface version.
While the Consortium presumably will continue to make necessary upgrades, we may now be at the point where the primary missing features have been addressed, along with the concerns about it being too heavy an implementation. Fewer objections are heard, and there are more tales of progress.
“Before UCIe, almost everybody was using a custom solution,” said Bhatnagar. “Even Cadence had a custom solution called Ultralink running at 40-gig speed. People were hesitant to move to UCIe because they were worried about overhead — area, performance, or speed penalties. But now with UCIe moving to 64 Gbps, we feel there is more adoption.”
BoW also will continue to be a factor, especially for designs requiring a minimal interface at the lowest possible power. “Clearly, BoW is still there, and they have their proponents,” said Mota. “They will not necessarily disappear. But the bulk of industry has moved toward UCIe, or at least to something that is based on the UCIe standard.”
Related Articles
Die-to-die Interconnect Standards In Flux
Many features of UCIe 2.0 seen as “heavy” are optional, causing confusion.
Partitioning In The Chiplet Era
Understanding how chiplets interact under different workloads is critical to ensuring signal integrity and optimal performance in heterogeneous designs.
Chiplets Still A Challenge With UCIe 2.0
New connectivity standard brings performance improvements and a bunch of new features, but it may take years before they are adopted — and still may not result in an open chiplet market.
Leave a Reply