Many features of UCIe 2.0 seen as “heavy” are optional, causing confusion.
UCIe, a standard for die-to-die interconnect in advanced packages, has drawn concern about being too heavyweight with its 2.0 release. But the fact that many of the new features are optional seems to have been lost in much of the public discussion.
In fact, new capabilities that support a possible future chiplet marketplace are not required for designs that don’t target that marketplace.
“It’s the blessing and the curse of UCIe,” said Mick Posner, senior product marketing group director at Cadence. “The spec is defined with so many variants that you can tailor it to your exact needs. It’s applicable for everything from automotive to high-performance compute to AI to mil/aero because it has so many flavors. That’s also a curse to an IP provider. How do you support all those flavors?”
Fig. 1: UCIe PHY in multi-die package. Source: Cadence
Two standards — Bunch of Wires (BoW) and UCIe — compete with proprietary designs. Today, the latter predominates, since virtually all projects underway are internal projects, with all chiplets being internally created and applied. Interoperability with externally sourced chiplets is therefore not a concern.
Features seen as necessary for promoting broad chiplet interoperability provide little utility for captive designs, and the industry has indicated resistance to building such capabilities into designs that don’t require them.
However, one critical UCIe 2.0 message that hasn’t penetrated the hubbub is, “A set of UCIe features is optional to implement,” said Brian Rea, marketing working group chair for UCIe Consortium. “You don’t need to use silicon for features you won’t use. UCIe — similar to other industry standards like PCIe, CXL, and NVMe — allows flexibility.”
The promise of a future marketplace
Today’s commercial advanced packaging offerings come from well-funded companies, along with the resources to create all the components internally, except possibly high-bandwidth memory (HBM). Such projects often originate as deconstructed SoCs, where blocks such as compute cores may become their own chiplets for scaling compute capacity or reducing cost. Other blocks, such as cache or I/Os, can be segregated into their own chiplets.
These projects are particularly beneficial when a monolithic version would exceed the reticle limits or require an extremely expensive advanced process node. As the original designers of such SoCs, each of the separated SoC chiplets originates with that company. There is no widespread use of commercially available chiplets except HBM, which is a die stack rather than a single chiplet. That gives the designing company full control over how the chiplets interact.
Longer-term, the vision is for a general marketplace resembling the one we have today for soft design IP. “Most of the customers that we talk to say they want to be part of an ecosystem,” said Patrick Soheili, co-founder and chief strategic officer at Eliyan. But instead of buying RTL, one is buying hard silicon.
There is, however, one big difference from IP. “RTL IP blocks don’t just plug together,” noted Peter Onufryk, manageability and security working group co-chair for the UCIe Consortium. “You add a lot of glue. When you have chiplets, you can’t add that glue. They just need to plug together.”
But if a single company can’t control every chiplet, then there must be broad agreement on a number of parameters to ensure that architects can source chiplets from different companies and have them plug and play. “The vision of being able to mix-and-match chiplets from everywhere will become a reality when there are standards for all aspects of a chiplet design, and when there is convergence in technology that allows that,” said Posner.
Although such a market doesn’t yet exist, the UCIe Consortium says it’s putting necessary features in place to guide those that will pioneer the market. “These features are looking into future aspects of an open chiplet ecosystem and backwards compatibility,” said Manmeet Walia, executive director, product management at Synopsys. “Making them optional is what most customers prefer.”
Most implementations today aren’t likely to include these extra features. “Ninety percent of people don’t care about these things because they are captive systems,” said Walia. “The 10% or so who do care do so only for future-proofing.”
Management is optional, in whole or in part
Much of what UCIe 2.0 brings are management features to ensure bring-up and composability. These features generally affect higher layers of the communication stack rather than the PHY. “You can think of UCIe manageability as AXI streaming between chiplets,” said Onufryk. “You can read and write registers on chiplets. There’s a capability structure. All of it’s optional. The capability structure lies at a defined address in a chiplet. It describes the UCIe-defined vendor ID and device ID, just like in PCIe.” Boot-up thus reads a few registers to complete the setup. The intent is that while such register reads involve latency, the latency is minor.
Management commands can be issued on one of two interfaces. “UCIe has a main-band interface, which is the main data path,” said Onufryk. “We also have a sideband wire per module that is used for link training. Management can run either on the sideband or main band.”
If implemented, the management capabilities provide a toolbox of features, each of which is optional. They include:
At a high level, each of these features provides clear utility. But they far exceed what would be necessary for a minimal connection between two chips.
A variety of other options
Many of these features require management software that will run on a processor. But the minimal set of required features aims for blind die bring-up. The idea is that the connections between chiplets must be functional without requiring the processor to boot first.
The spec includes mandatory elements such as lane reversal that must be handled with no external control. “Just like in PCIe, we can flip the order of lanes,” said Onufryk. “If you’re connecting a chiplet on the east or west edges, and then you want to connect it on the north or south edges, you’ve got to flip the order of the lanes. So you need a mux to flip the lanes.”
But even this “mandatory” feature is expendable in a custom implementation. “If you know that you’re always connecting on one side, you don’t need to flip the lanes,” he noted. “You can get rid of that mux. It doesn’t consume any measurable power, but that’s an issue that people bring up as ‘heavyweight.’”
Importantly, features requiring circuits, whether mandatory or optional, specify behavior, not design details. “They don’t tell you how to design the circuits,” said Kevin Donnelly, vice president of strategic marketing at Eliyan. “In fact, they specifically avoid that.”
It’s worth noting that the prior revision also had options. “Even UCIe 1.1 has options, and there is flexibility if you want to do a non-raw die-to-die mode UCIe–UCIe connection,” said Pratyush Kamal, director, central engineering solutions at Siemens EDA.
“There’s an organic-substrate variant, and then there’s an advanced-package version for CoWoS or EMIB,” said Posner. “For the advanced package, the standard definition is 64 transmit and 64 receive lanes. But if you don’t need all that bandwidth, you can cut it in half.”
Discovery as the poster child
One of the features that seemed to get more airplay is discovery. This term, along with its legacy in standards such as PCIe, has resulted in a higher level of concern based on how one might interpret the word “discovery,” and many have interpreted it differently from its intent.
Discovery is an important feature of many networks, particularly those with dynamic configuration options. If a network can boot up with any of a number of cards or nodes added or missing, then each bring-up must account for everything that’s out there. This could be called dynamic discovery to emphasize the point that elements can come and go from the network.
That, of course, makes little practical sense for an advanced package. Although there’s a remote possibility that someone might disassemble an advanced package, replace a chiplet, and reassemble it so that it still operates, the practical chances of that happening are nil.
Instead, it may be beneficial to confirm what’s in the package — essentially taking inventory — and negotiating any low-level features necessary for the chiplets to talk. One might think of this as static discovery, or enumeration.
The difference is important. Dynamic discovery involves much more communication because it starts from zero knowledge. With a chiplet, you know what you’re expecting, so a quick register read can confirm that. This is the essence of UCIe 2.0’s discovery feature.
“Our belief is that, in this open chiplet ecosystem, the long pole in the tent will be SoC firmware,” said Onufryk. “The hardware is the easy part. Therefore, we want to make it so that not only can you plug these chiplets together, but you could deliver firmware with the chiplets. The point of dynamic discovery was to allow reuse of firmware.”
Some of those closer to the standard wonder why a simple register read is considered heavy. “The cost of discovery is read-only registers,” said Onufryk. “It’s actually even simpler than PCIe enumeration, but the principles are very similar.”
Bundles of features
Some see the possibility of various features accreting into natural application-oriented bundles. They could even be recognized by the UCIe Consortium. “Maybe one day chiplets will be certified at different levels of UCIe compatibility,” Kamal noted.
Synopsys offers three different levels of its UCIe interface IP, called Compliant, Compatible, and Custom. “The Compliant version is fully compliant to the UCIe spec,” explained Walia. “Then we have Compatible, which can talk to the other side but may not meet the spec limits. The third is Custom, where we strip it down to reduce power, improve the metrics, and make it lightweight.”
This can color what some think about the utility of standards with too many options. “It’s manageable if there’s a clear hierarchy in options, like, ‘Do you support the level one, level two, level three?’” said Marc Swinnen, director of product marketing at Ansys. “But if it’s higgledy-piggledy and everybody makes their own smorgasbord of what they support, then it’s not a standard.”
But even those closest to the standard may take liberties. “Internal to Intel, we use UCIe, but we modify the data-link layer for our specific use cases because we drive tremendous volume,” said Onufryk. “The market will decide which capabilities are useful and which ones aren’t. And the ones that are useful will evolve, and the ones that aren’t will naturally fade away.”
Competing with BoW
In the evolving competition between BoW and UCIe, both compete with custom proprietary implementations, and those custom versions are likely to remain to some degree.
The BoW versus UCIe competition doesn’t have a clear winner at present. BoW is generally thought of as being lighter weight, and that impression was probably strengthened with the new UCIe features. Given optional features, the question then becomes, “Which is lighter in its minimal viable configuration?”
There’s not a clear answer here. Some elements still give BoW the nod, but selecting which to use will require more than a simple “heaviness” score. Two feature examples are the use of transceivers and signal placement.
BoW permits transceivers. “You have a transmitter and receiver on either side of the lane,” explained Eliyan’s Soheili. “You can either transmit or receive on the same wire.” Using standard technology, this must be half-duplex traffic. Full duplex would require two lines, one for each direction. Eliyan’s signaling technology permits full-duplex traffic on a single line, but it’s newer and not yet widely adopted. This choice gives the flexibility of having a lane consist of one or two lines.
UCIe doesn’t permit transceivers, and all lanes have two lines. That means, for applications that might permit transceivers, BoW would require half the number of lines.
As a separate aspect of the standard, UCIe includes bump details. “UCIe specifies bump locations, how many grounds and powers you have, how they’re physically oriented, and how you place everything,” noted Donnelly. “Not everybody wants that level of constraint.” The intent is to specify the PHY footprint, which the UCIe Consortium calls the “form factor,” to help in assessing implementation compliance.
BoW makes no such requirement, allowing any pattern of bumps or footprint size and shape. “You can be as deep or wide as you want,” Donnelly said. “It’s easier to make chiplets interconnect with different pitches and different PHY sizes using something like BoW than it would be with UCIe unless you follow the spec exactly.” That flexibility feels to some designers as making BoW “lighter,” at least with respect to that one feature.
“I would describe BoW as more of an architectural spec, in that it gives guidelines like an Arm AMBA bus,” said Donnelly. “But there are many ways to configure it. To ensure compatibility between two sides, you have to know which configuration options you’ve chosen.”
A remarkably civil competition
The differences between UCIe and BoW aren’t trivial, and even their philosophies differ. But useful examples for either one abound, so each side has adherents. Getting the word out that heavy features are optional should help UCIe in lighter-weight designs.
However, despite the nature of the competition, many of the IP providers and others involved in die-to-die interconnect view both standards positively and don’t want to be seen as disparaging either one. There’s remarkably little open mutual trash talk. So it’s less about all-out war and more about letting both play out to see what happens.
In the meantime, proprietary captive designs will remain. “We see the closed architecture continuing to implement its own highly efficient PHY, because 2nm or 3nm silicon is very, very expensive,” noted Soheili.
Part of this may simply be the nature of standards. “Proprietary solutions can be highly optimized for specific designs, offering superior efficiency in both area and power,” said Andy Heinig, group leader, advanced system integration, department head of efficient electronics at Fraunhofer IIS‘ Engineering of Adaptive Systems Division. “Furthermore, the standardization process is typically slower to evolve as updates require consensus among multiple stakeholders. This can delay the adoption of new features compared to the faster iteration possible with proprietary implementations.”
Others believe there are many benefits associated with implementing an industry standard die-to-die interface. “These are essential properties if a chiplet is to be sold as a product,” said Mark Knight, director of architecture product management at Arm. “However, if a semiconductor company is using chiplets as a manufacturing technology to mix process nodes or pack more transistors into a package with no desire to sell the chiplets, then they may choose to use a custom interface between those chiplets.”
Soheili noted an obvious exception to companies jumping onto the standards bandwagon. “NVIDIA can continue to leverage its NVLink inside the package,” he said. “It’s been designed to do exactly what the company wants for its own chiplets.”
Meanwhile, everyone else will be watching both standards carefully, perhaps picking and choosing features, and awaiting the emergence of the marketplace for which all of this should pay off.
—Ed Sperling contributed to this report.
Related Reading
Chip Architectures Becoming Much More Complex With Chiplets
Options for how to build systems increase, but so do integration issues.
Challenges In Managing Chiplet Resources
The chip industry is exploring multiple avenues for simplifying multi-die integration, but difficulties remain for optimizing designs.
3D-IC Ecosystem Starts To Take Form
Before any advancement can go mainstream, it requires an ecosystem. Chiplets are a first step.
Leave a Reply