Complex chips need coherent and non-coherent sub-NoCs to ensure efficient data paths. Correct hierarchy is essential.
Key Takeaways
Moving, managing, and keeping track of data is becoming a much bigger challenge as the amount of data that needs to be processed, stored, and accessed by different processing elements continues to grow.
Complex SoCs and multi-die implementations, particularly those involving AI, may contain numerous networks-on-chip (NoCs) to manage and prioritize data movement. They can include coherent or non-coherent caches and I/Os, or they can just handle a single physical portion of the system. But all of this needs to be planned much earlier in the design flow than in the past, and it needs to be monitored throughout a system’s lifecycle.
“From what we’re hearing across AI SoC teams, training and inference haven’t just increased data volumes — they’ve exposed data movement itself as the dominant system constraint,” said Andy Nightingale, vice president of product management and marketing at Arteris. “Compute capability is scaling far faster than Moore’s Law, but data movement, congestion, and energy efficiency increasingly determine whether that compute is usable at all.”
Data traffic and coherence are good starting points for determining the best NoC for a particular use case. “[Chip architects should] ask who needs coherence and why, which agents generate bursts versus steady streams, where latency bounds really matter, and how much reuse or scaling is expected across derivatives or chiplets,” Nightingale said. “CPU clusters tend to require coherent NoCs because their programming model depends on them. NPUs are usually non-coherent because explicit data movement and local memory deliver better power and throughput.”
Fig. 1: CPU vs. NPU networks on chip. Source: Arteris
Others concur that coherency is a good starting point. “Choose a coherent NoC for shared-memory CPU clusters where consistency matters, and a non-coherent NoC for NPUs and accelerators where throughput matters more than strict coherency,” said William Wang, CEO of ChipAgents.
NoCs come in multiple flavors. They can be fully cache-coherent, last-level cache coherent, I/O coherent (also known as one-way coherent), or non-coherent.

Fig. 2: Examples of coherent NoC deployments. Source: Arteris
Coherent networks tend to be more expensive and power-hungry than non-coherent networks. “A very common paradigm is to take the big, powerful CPU that has these caches, and connect them up in a coherent network while trying to keep the coherent part as small as possible,” said Kent Orthner, principal solutions architect at Baya Systems. “You really want to keep that only between the memory and the CPUs, and maybe AI accelerators. The rest of the system will typically use a simpler protocol that just reads and writes. The simple protocol doesn’t have any notion of who last touched the data or who’s responsible for it. They just go to an endpoint like a memory or PCIe control, or something like that, and test the data that they need. One common breakdown when people talk about multiple NoCs on an SoC is the coherent versus non-coherent domain.”
In the past, large chipmakers often developed their own relatively simple NoCs. But as the complexity of data movement has grown, the industry has migrated toward commercially developed NoC IP. “They each have their approaches to the question, ‘How can I provide you configurable blocks based on the input requirements?’” said Frank Schirrmeister, executive director for strategic programs and system solutions at Synopsys. “You say, ‘Here are all my sources. I have 30 sources. Some are coherent, non-coherent, cache-coherent, and some are I/O-coherent only.’ Then you get into the environment where you configure the NoC, push a button, and they have building blocks, being able to instantiate these caches, TLBs (translation lookaside buffers), and the like, to then build a NoC. When it comes to the non-coherent bits, the focus is on making sure they can be implemented. Looking at the layout, what is the timing? Where are these things? It is quite messy on those pesky multi-billion-gate chips to determine where to put things and how to carry data back and forth. The challenge is significant.”

Fig. 3: Examples of non-coherent NoC deployments. Source: Arteris
Cache coherency is more challenging than I/O coherency. “The space is much bigger for I/O core coherency,” said Schirrmeister. “You typically have the peripheral, such as a GPU or network interface, and you read the CPU cache data, but not vice versa. But in cache coherency, you have multiple CPU cores that need to see a consistent view of the memory across all of them, which is much more complicated because there are more options.”
Caches are used to temporarily store data close to the processor. “The processor says, ‘I need this, I need that, and I need this other thing,’” said Orthner. “It doesn’t have to go all the way to external memory, which can take 100 nanoseconds. It can get it really quickly. But with the big systems that we tend to look at, you could have many, many different processors. And as they share memory, if you have one processor saying, ‘Look, I need this little bit of information,’ and it doesn’t know that another processor already has a copy of it locally in its cache, then it can end up getting the wrong value. So cache coherency is all about making sure that each processor’s view of the information is the same cache.”
In fact, each type of compute processor has its own private cache of data. “They have to keep track of who has what piece of data and share it efficiently, and that’s what the cache coherency protocol was all about,” said Orthner. “If you have an SoC with 500 processors, it knows what those 500 processors are and how to talk to them, because all of that was decided before you manufactured anything.”
Chiplets, multi-die, and 3D
Multi-die assemblies require additional management of coherent and non-coherent NoCs.
“Our physical AI chiplet platform is typically a minimum of three chiplets,” said Mick Posner, senior product marketing group director for chiplets & IP solutions at Cadence. “In the center, you have the system chiplet. On one side you have a CPU chiplet. On the other side you have an AI accelerator. That’s the base three chiplets of any physical AI chiplet platform, and that center system chiplet must have both a coherent interface and a non-coherent interface because it will talk to a CPU. So it must have coherency between the system and the CPU, because it’s managing the memory. It also needs a coherent interface to it. But the link to the AI accelerator is only I/O coherent. It doesn’t require cache coherency, because the accelerator is like an extension. You’re just sending something to it. It typically can have its own memory, maybe it’s sharing memory, but it doesn’t need cache coherency.”
Multiple NoCs are needed to achieve this. “A NoC that is designed for coherency usually has a lot more overhead than a non-coherent one,” said Posner. “They probably have a link between them, but that link is non-coherent by default because there’s no coherency on one side.”

Fig. 4: SoCs can contain a mixture of non-coherent and coherent NoCs. Source: Arteris
Balancing coherency, programmability, and integration
Orchestrating data movement in multi-die assemblies puts added demands on programmability and system discovery.
NoCs can help chiplets find each other in the package. “In a chiplet approach you might have all the same chip, on the same die, the same piece of silicon, but you might have it by itself in a package,” said Baya’s Orthner. “You might have it with an array of four other chiplets in a package. You might stack it vertically in a different package. So now you have this whole interesting problem of, when you first power up, how do the chiplets discover each other, and how do they learn who I am in the context of this package, and where is everybody? As a result, you end up needing a much higher degree of programmability.”
At boot-up, systems need a management agent. “It comes in and says, ‘If you’re looking for this subset of processes, you’re going to have to send your traffic to the north side of the die, because that’s where they’re located — unless you’re the chiplet that’s on the north, because then you’re going to have to send your stuff south,’” Orthner said. “You end up waking up, communicating, discovering, and then reconfiguring the routing in the network so that different chiplets can recognize the same destination, but via different routes.”
Stacked-die configurations add more networking challenges. “With the different vendors in a die, how are you going to manage the different types of parts and determine how well they are integrated into the system?” said Hee Soo Lee, high-speed digital design segment lead at Keysight EDA. “These challenges are making networks and I/Os complicated. With stacked die configurations, managing the thermal and mechanical issues will be more significant problems than electrical issues. All of these systems are driven by the market, especially the ever-increasing demand for data in data center AI workloads.”
Optimizing for PPA
Whether coherent or non-coherent, multiple NoCs can be considered from a top-down approach to chip design.
“Let’s say you have two at the main level, with the top ones connecting all the subsystems,” said Cadence’s Posner. “Underneath those, there are typically localized networks on chips. There could even be multiple — one for I/O peripherals that is low-bandwidth, low-performance, and a separate one for high-bandwidth peripherals. You want to tailor your NoC to whatever you’re connecting it to and configure it for power, performance, and area (PPA) based on the network that it’s connecting to. You need to understand that hierarchy because it all still needs to be controlled.”
AI-driven EDA technology is helping to sort through PPA tradeoffs. “Tools are evolving toward a fully autonomous, multi-agent workflow operating system that reasons across spec-to-silicon with real-time design-quality feedback, PPA optimization, and cross-domain co-design,” noted ChipAgents’ Wang.
In large, powerful SoCs, designers will often create several configurations. “Typically, they want the configuration network to be separate from their coherent and primary non-coherent data paths, and configuration is pretty low throughput,” said Baya’s Orthner. “It’s your control plane. It wants to read registers, check up on the performance of the system, maybe manage some power control stuff where you shut down parts of the device and turn it on again. That’s also typically implemented as a distinct network. It’s the same protocols as you would see in your data flow network, but on a much smaller scale. You want to be able to access it without affecting your primary data traffic, stealing some of the network bandwidth, for instance.”
Geographical or topological differences also drive the need for distinct NoCs. “You might have, in the west part of your chip, an array of compute cores,” said Orthner. “In the east part of your chip, you might have a different array of compute cores, and for your primary data paths, they do not need to talk to each other, so you can implement that as a west network, an east network, and a third network that brings them all together.”
Memory requirements are forcing new network design decisions to enable high-performance systems. “Even though they’re geographically the same location, people will take half the memory space and call it the ‘even’ network, with the other half the ‘odd’ network, just so they can get twice as much data flowing around the system without stepping on its toes,” Orthner explained. “There are many ways to have different logical networks. When you’re designing a chip, you think, ‘Here’s the NoC, and here’s the area that it represents.’ Then the physical design engineers want to place all of the pieces of that inside that rectangle and get it to meet timing. If you’ve used different tools, or done different projects with the same tool to do all of your different NoCs, you’d end up with a lot of logic with different locations in the hierarchy that are superimposed on each other, which makes it very difficult for the physical designer.”
These distinct network approaches underscore the complexity of modern chip design, especially as physical and logical requirements diverge. To address these challenges and maintain efficiency, designers are increasingly looking for unified solutions that seamlessly integrate multiple NoCs within a single project.
Unified NoC software refers to a way of managing the overall project of multiple NoCs, rather than a confluence of all the data in a single NoC. “When we talk about a unified network, what we’re saying is you can have one design, one top-level project, one idea, for the routers and the logic and the different tracks and the positioning and everything else,” Orthner explained. “And within the design of the big picture network, we’re keeping all of the smaller networks distinct. If you look at coherent versus non-coherent, you can say, ‘I know enough about my traffic flow to say I want to allow my non-coherent traffic and my coherent traffic to be logically separate, but still use thin wires,’ which is a really strong capability nowadays when you’re trying to optimize for area and cost and everything else. You can have one run of the tool that embraces all of the different networks that are running in parallel, and which allows them to share resources under the system designer’s control.”
As chip complexity continues to rise, it becomes increasingly important to consider how these unified NoC strategies translate to larger system architectures. Bridging the gap between on-chip networks and broader infrastructure, designers also must address how these concepts scale up to the data center level.
Data center hierarchy
Considering NoCs from the perspective of a data center, several fundamental questions arise. How do the racks communicate with one another? How do the cards within a rack communicate? What methods allow chips on a card to interact? And finally, how is communication achieved between chiplets?
Answering these questions provides the top-down view of the data center, which helps to establish hierarchy between the numerous NoCs in use. “The top level is really the data center level, but you start at the data because the whole data center acts as your computer at this point,” said Saurabh Gayen, chief solutions architect at Baya Systems. “You need to make sure that you understand how the racks are organized, how data is flowing across the scale-out domain, then, going into the scale-up domain, and further hierarchically going down through your particular package level, into the chiplet level, etc. You have to create that top-down view because that defines the hierarchical design and how to group things.”
While it’s easier to build many small things that are loosely connected, that’s not the best approach. “You need much denser, tightly packed, smaller-level, flatter, higher-performance things that are tightly coupled and hierarchically organized together,” said Gayen. “You cannot cheat by having tons of networks. The old data centers could do that. We would see a lot more scalar design, and the AI models were stalling. Now, as an industry, we bit the bullet and said, ‘Yep, we’re going to go all in on this to make our hierarchies flatter, higher performance, and denser.’”
These same hierarchy-level concepts apply as designers go down into packages and chiplets. “You take a top-down view,” said Gayen. “We don’t want to think about bottom-up, ‘Here’s a NoC and here’s a NoC and here’s a NoC and here’s a NoC, and how do we switch them up together?’ The best way is to look top-down at the overall system, and how to break it up into these smaller things. There are also differences between NoCs on a particular chiplet versus how they communicate with each other, die-to-die. The performance is top down, but the reality of engineering is bottom up, so you have to balance the two things hierarchically.”
Correct data management comes down to the sub-NoC hierarchy. “In terms of the hierarchy within the design of a piece of silicon, how do you construct the sub NoCs, and the NoCs talking to each other, and the boundaries?” Orthner said. “It is important to get it right, all the way down to physical design, when you’re placing the transistors on the silicon, and take advantage of the hierarchy that you thought through, to minimize the effort that goes into it. Hierarchy is something that, when designing the framework for NoC design, we take as a first-class concern.”
For performance analysis, it’s especially important to look at the top level. “Instead of thinking about how fast the memory interface is, you think about it in terms of the big picture data flows.” said Orthner. “Who needs to talk to who and why? Where’s the data going to be moving? Almost always you have multiple data sources running at the same time, so, how are those going to affect each other and impact each other?”
Conclusion
In the past, NoCs often were an afterthought in chip design, addressed once the physical layout was nearly complete. Today, designers must integrate NoCs into the early stages of development, ensuring that communication infrastructure is optimized alongside processing elements. This approach improves performance, power efficiency, and scalability, allowing complex systems to achieve better overall results.
“What’s changing in practice is a reframing of priorities,” said Arteris’ Nightingale. “Data movement is now a first-class design axis alongside compute and memory, especially as systems scale from monolithic dies to chiplets and distributed architectures. Leading teams are investing earlier in architectures that provide visibility, quality-of-service guarantees, and long-term scalability, without treating interconnect as a late-stage optimization.”
To keep pace with evolving system architectures, the industry must prioritize the development of robust, standardized security protocols and invest in scalable interconnect solutions from the earliest stages of design. Collaboration across the supply chain is essential to ensure trust, interoperability, and resilience against emerging threats, especially as chiplet and multi-die integration becomes more prevalent. Moving forward, ongoing research, cross-vendor partnerships, and the adoption of adaptive network topologies will be critical to meeting the demands of secure, high-performance data movement in next-generation systems.
Related Article
Data Boom Puts Pressure On NoCs, Fabrics
New adaptive, mesh NoC topologies are enabling chip designers to optimize data movement in complex SoCs and multi-die systems.
Leave a Reply