NoCs In 3D Space

The network on chip has become essential for complex designs, but it needs to evolve to support 3D designs and enable the integration of chiplets.


A network on chip (NoC) has become an essential piece of technology that enables the complexity of chips to keep growing, but when designs go 3D, or when third-party chiplets become pervasive, it’s not clear how NoCs will evolve or what the impact will be on chiplet architectures.

A NoC enables data to move between heterogeneous computing elements, while at the same time minimizing the resources required to connect them. Tradeoffs can be made about the topology of a NoC, the resources consumed, and the latency associated with traffic for a defined bandwidth. A NoC also can help keeping data coherent between distributed computing elements.

“Every design needs a NoC or can benefit from one — even the smaller designs,” says Frank Schirrmeister, vice president solutions and business development at Arteris. “For very complex designs, you might see hierarchies of more than 10 NoCs on a chip. There are several reasons for this. The first is the separation of the coherent and the non-coherent domains. The second is that you have mixed criticality for safety of a chip. Part of it is just divide-and-conquer. As more hierarchies are being connected, it’s just natural for people to do a separation of the designs, of the problem, by having the different domains.”

Fig. 1: Divide and conquer showing different blocks connected using coherent and non-coherent NoC IP. Source: Arteris

Communications networks continue to evolve. “NoCs famously arrived on the scene in the early 1990s, with various proprietary implementations to solve the problem of what do we do with multi-processor cores and their ability to communicate outside their own memory domain,” says Gordon Allan, product manager for verification IP at Siemens Digital Industries Software. “Packet-organized network-on-chip topologies were developed at that time for distributed processing. Now, 2.5D and 3D-IC are opening up new opportunities for communication topologies. The innovation that has happened over the decades can continue in a new way with 3D-IC because of the ability to have many more cores in close proximity, in a very fast, very wide networking arrangement.”

As this progresses toward a world of 3D chip construction, the design hierarchies become deeper. “Large, complex, multi-core chiplets will require new concepts for advanced communication between the cores, the memory, and the peripherals,” says Andy Heinig, head of department for efficient electronics at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “Such communication structures must be hierarchical, with at least one hierarchy level on the chip and one hierarchy level on the system level that combine both chips.”

Success means the NoC stays essentially invisible. “NoCs must extend seamlessly across chips and across chiplets,” says Raymond Nijssen, vice president and chief technologist at Achronix. “This is essential to realizing the system in a package paradigm, which is essential to making technology scale to keep Moore’s Law alive. It’s also important that no new protocols or use models are required to use an inter-chiplet NoC from a design perspective. For example, if an existing intra-chip NoC transports transactions between AXI ports, then an inter-chiplet NoC should look the same.”

There currently appear to be at least three ways in which the overall problem is being approached. “The first is single-vendor, meaning that a company effectively controls all pieces, including all chiplets,” says Elad Alon, co-founder and CEO of Blue Cheetah. “They will do whatever makes sense in the context of either a specific product or product family. The other extreme is the plug-and-play chiplet market. This is where companies build chiplets that I can buy and integrate those together to make a unique design. It requires not only that the functionality is partitioned how I want it, but that they are electrically and mechanically compatible, have all the protocol choices consistent, and makes sense in the required use cases.”

Fig. 1: Chiplet use cases. Source: Blue Cheetah
Fig. 2: Chiplet use cases. Source: Blue Cheetah

Is that an impossible dream? “If you have true heterogeneous systems laid out side-by-side in a package, perhaps using an EMIB-style linkage or substrate linkage, you can use standard interfaces such as UCIe on each package,” says Siemens’ Allan. “You can tunnel higher-level protocols, or layer them over the top, such as Ethernet or CXL or PCIe. It doesn’t matter that your dies have different geometries, or different thicknesses, or different electrical characteristics. There will be attributes in the package associated with the space to leave around the die. But generally speaking, that’s the point of chiplets. You can integrate those heterogeneous, different-geometry dies without consequences.”

Until we get there, there is a third approach that is gaining traction. “Multi-vendor ecosystems are emerging,” says Blue Cheetah’s Alon. “The distinction here is that there is cooperation between a group of companies. They get together, do planning as larger monolithic organizations would be doing. If company A is good at task X, and company B is good at task Y, then here is how we’re going to put these things together. The product was pre-conceived and is designed for a particular target market and target specs. People know what they need to build.”

A separation of concerns
Hierarchical design relies on some fundamental tenets that can provide significant advantages. “There are several ways in which this can happen,” says Arteris’ Schirrmeister. “The development team can be distributed, and you could have multiple disciplines not fully understanding what the other one does. For example, there may be a safety island specialist who has responsibility for the hardware security module sub-system. They do not need to know anything about the CPU clustering, which is 4×4 CPUs connected coherently.”

Security is a growing concern. “The advanced NoCs themselves must provide features that support security,” says Fraunhofer’s Heinig. “In an open chiplet system, each node can cause safety problems, so concepts for advanced NoCs are necessary to close such gaps.”

But the NoC cannot get in the way. “Most people architect these things such that the NoC effectively looks transparent, other than the fact that there’s extra latency,” says Alon. “But from a functional perspective, you’re not supposed to be able to tell. Having said that, one doesn’t want to do a blind cut somewhere. If you do that, you can cause yourself more bandwidth pain and suffering across the data interface than would have been necessary if you partitioned things a little bit differently.”

Well-placed functional boundaries can have other benefits. “How do we, as human engineers, grasp the complexity of the overall design sufficiently to be able to verify it?” asks Allan. “We keep saying this design is too big to understand. Well, no, it’s not if we abstract it sufficiently into hierarchies and components that we can understand. We don’t need to understand all of it all at once. We need to understand the detail in what we are verifying today, whether it’s a low-level block, a chiplet, an interface between two chiplets, or a network protocol carried over that interface. This separation of concerns is what gets us to move beyond Moore’s Law and still understand what we’re actually designing.”

Top-down or bottom-up?
The semiconductor industry always has deployed a mix of top-down planning coupled with bottom-up implementation and verification. This was cemented in place with the development of the IP building block model, and it is expected to become even more pronounced when those blocks are hardened into chiplets.

However, some top-down questions need to be answered. “How do you design a system built from chiplets?” asks Schirrmeister. “Starting from the top, there are architectural questions, such as those involving latencies that can be tolerated across various interfaces and the bandwidth requirements. Are interfaces bi-directional? How many channels do you need? In the past you had PCIe lanes, but now you have UCIe lanes. You need to put down a substrate across chips, and you need to consider those architecture effects.”

But that only has to go down the hierarchy so far. “A company may be planning to use individual chiplets in a product family, or for some number of generations of products,” says Alon. “They would have fairly detailed specifications about the functionally provided by each of these partitions and that does mean you have to make some choices apriori, such as details about the protocols and what capabilities each chiplet has. Generally, people can provision those things appropriately so long as you functionally understand what’s happening on the two sides.”

Protocols become core architectural decisions. “If I have a processor that is relying on cache coherency, potentially with another chiplet, it may speak CHI,” says Schirrmeister. “There are people using CXL, which is a slightly different form of coherency. That’s what the NoC on the chiplets will speak. Then you need to figure out how the data is packed. There are interfaces for streaming, such as AMBA CXS. UCIe has this thing called FDI, which is a Flit interface, where these parallel bits are basically presented to the link layer, and the PHY carries the data across. It has performance impacts, because you’re packing the data. While these things alter latency, so does moving to a different technology node.”

Standardization bodies are trying to rationalize it. “The ODSA OCP spec defines two concepts,” says Alon. “One is the interface profile, and the other is the bus variant. The interface profile defines that, ‘This chiplet, with this particular set of die-to-die interfaces, will carry the following sets of protocols.’ For example, an interface profile may say, ‘I’m carrying this number of AXI requests ports, and this number of responder ports. Here is how they’re packed.’ It defines the set of protocols available for this NoC connection point and how are they carried. The second concept is the bus variants. When you say something uses CHI, that’s not a unique definition. There’s a lot of optionality and optimizations that people will do in the specific fields. The bus variance is a way of stating that this particular interface is using this specific version of the protocols. From an overall performance NoC perspective, that doesn’t guarantee that everything will work at the performance level people want it to. But at least functionally, it says these connections can be made in a consistent way, as long as everyone has published what it is they are actually doing at that boundary.”

Every layer in the communications protocol is seeing rapid progress. “We are seeing Ethernet evolving in the direction of synchronous, time-sensitive protocols,” says Allan. “Depending on the processor functionality, we may see some innovations in synchronous networks from chip to chip. Some people want to bring optical to the table. We’ll see innovations in the transport, and UCIe is one of them. There will be innovations in the topologies and in the approach, whether it’s packetization versus synchronous networking — possibly even analogous to the token ring networks of old, where there’s bandwidth created, and you jump on that bandwidth if you need it. It’s a decentralized organization rather than a switched top-down organization.”

By bringing more functionality inside the package, there will be significant changes in latency and bandwidth. In addition, shortening distances by going to 3D technologies will decrease communications times even compared to single chip solutions. “In all cases, there will be a latency impact for NoC transactions going across chips, even in the 3D case, albeit to a lesser extent,” says Achronix’ Nijssen. “The bandwidth between chiplets will be much less, and there will be an increased power cost for going between chiplets. This is not fundamentally different from multi-chip routing where there’s no NoC. What’s different is that a NoC multiplexes transactions over the same physical connections and can trade off QoS (like latency) between different streams. One challenge with this modeling is that most designs do not yet specify latency constraints between communicating blocks.”

Put simply, not everyone needs the same solution. “When you enter an open chiplet environment, as people would like it to be, you need standards,” says Schirrmeister. “You have to make decisions, and this will eventually cluster around the types of ecosystems that will drive some of the subsets. Imec has an automotive chiplet initiative, and one of the discussions is what interfaces you need in that ecosystem. That might not work for a data center guy. Consumer devices might be very different. It’s essentially an extension of the challenges we were already facing for hierarchical NoCs on chip, but now in the face of disaggregation for 2.5D and 3D environment, it’s becoming much more complex.”

That is expected to change over time. “Imagine in five years, we will be beyond talking about an ecosystem of UCIe-compatible chiplets,” says Allan. “We may be talking about compatibility at a higher layer, for example, some networking topology that a chiplet can offer, with easy plug-and-play participation in some network as SIP device. Standardization is what enables us to even imagine how to get our heads around that. It’s a necessary step, and it enables us in EDA to provide standard-based verification IP.”

But the path to get there may require smaller steps. “For the next several years, at least, we don’t actually have to solve the hardest, thorniest general problems,” says Alon. “We just need to get folks together to go after specific targeted markets, and that’s very much happening. It’s not that these issues go away. But if you try and solve it specifically for a given target, as opposed to all possible things you might do, you can get traction much faster.”

Further Reading
3D-ICs May Be The Least-Cost Option
Advanced packaging has evolved from expensive custom solutions to those ready for more widespread adoption.
Commercial Chiplet Ecosystem May Be A Decade Away
Technology and business hurdles must be addressed before widespread adoption.
An Entangled Heterarchy
The informal structural hierarchy used in semiconductor design is imperfect but adequate for most tasks, yet other hierarchies are needed.

Leave a Reply

(Note: This name will be displayed publicly)