Exhaustive proofs are the only way to find deep corner-case bugs that can result in deadlocks and silent data corruption.
By Ashish Darbari and Bing Xue
It’d be inconceivable these days to design a modern high-performance SoC without a network-on-chip (NoC) fabric. AI hyperscalers are inherently multi-threaded and rely on using hundreds of processing elements (PEs). Crossbar-based fabric would just not scale. What also changes with the adoption of the NoC is how to handle coherency between PEs. ACE is no longer adequate, and the Arm CHI protocol is widely used to stitch the compute elements on a coherent NoC.
The NoC sits at the center of every data movement in a coherent SoC: cache lines, snoop responses, DMA traffic, interrupts — all of it.
In this blog, we describe why verification of NoCs is hard, what kind of bugs can escape to silicon with a few specific examples, and why exhaustive proofs are the only way to find the deep corner-case bugs that can also result in deadlocks and silent data corruption.
We describe what kind of bugs can be caught by nocProve – Axiomise’s latest domain-specific app powered by the proof technology CoreProve. The app can find deep bugs as well as verify complete mesh configurations of NoC designs generated from NoC code generator. We provide insights on the scalability of nocProve to handle exhaustive proofs.
A non-coherent NoC moves bytes. Its correctness obligations are bounded: in-order or out-of-order delivery per channel, no deadlock, no loss, well-formed protocol packets at every boundary. It is very typical to use AXI or custom AXI protocols for such NoCs. The greatest challenge is to ensure that data integrity is preserved and there are no deadlocks and livelocks. With low-power optimizations and CDC, the verification problem is sufficiently hard for simulation alone to be able to establish that there will be no respins or last-minute bugs. Exhaustive verification of complex routers with VCs, reorder buffers and protocols that support bursts is a formidable challenge for any formal verification testbench.
A coherent NoC must preserve the coherence protocol invariant across every cache, every snoop filter, every directory entry, every Point-of-Serialization, and every Point-of-Coherence in the fabric. AMBA 5 CHI, ACE, and ACE-Lite NoCs sit here. Consider the several bus bridges that add complexity. The verification surface explodes: a single mis-routed Comp flit, a single double-counted credit, a single stale snoop response, can corrupt the line’s coherence state at one Requester without any other agent noticing. The corruption could be structurally silent; the Arm RAS architecture sees a clean transaction because every individual flit was well-formed.
Coherent NoCs therefore demand a different verification posture. The properties are not flit-local; they must be end-to-end invariants over the entire fabric and the entire protocol state space.
A coherent NoC makes the verification much harder. We not only have to look at verifying data transport checks but additionally the full context of coherency (MOESI) must be addressed, and the conformance against the CHI protocol must be done. With seven cache-line states, 11 reads, 14 writes, 18 combined writes, 21 snoops and 4 atomics – the combination of transactions and states is just staggering. If you weave DCT/DMT/DWT on top and look at the NoCs with virtual channels, transaction IDs, coherence states, credit accounts, routing-table contents, QoS classes, and outstanding-transaction queues – we’re simply not able to rely on simulation alone.
No simulation budget reaches the corners where the bugs live.
Worse, the bugs that cost respins are not the bugs that simulation misses by accident. A simulation that hits one of the three preconditions is vanishingly unlikely to also hit the other two within any practical simulation budget. The bug stays hidden until silicon, where real workloads naturally produce the precondition combination — sometimes within minutes, sometimes only under specific application traffic — and always without warning
We’ve been studying the bug patterns on NoCs for a few years and examining them lately from the context of coherent NoCs. We are happy to share a high-level view of the split of bugs into families.
Due to space limitations, we are not expanding on the numerous sub-classes of bugs that may be coherency related. Any verification plan must consider these. Items 1-11 are about correctness, while item 12 is the family that crosses into a security threat model.
We have cataloged several bugs in our internal technical report. Here, we take a look a couple of example bugs.
The waveform shows an AXI ID-ordering issue in which the reorder buffer writes responses to memory locations indexed by transaction ID, but two consecutive responses with the same ID are written to the same address so that the earlier response is silently overwritten. This bug is caught through the checks that are shipped out-of-the-box with nocProve.

Fig. 1: Silent data corruption.
The bug is subtle in that the earlier response is “silently” overwritten. These kinds of bugs are very hard to catch as they silently corrupt data. The system may continue operating, yet data has already been lost or corrupted with no immediate visible crash.
This particular bug is a deadlock that occurred due to NodeID aliasing in routing-table configuration. To understand this better, we first need to describe the context.
The bug occurred in a 2D mesh NoC carrying AMBA 5 CHI traffic. The key agents involved in this were:

Fig. 2: Deadlock caused by routing table.
HN-F is returning a sequence of nine response flits r0:r8 on the RSP virtual channel, all transiting router (1,1):
There is no bit-flip, no flit-header corruption, no parity error, no ECC event. Every header is intact. Every parity check passes. Every routing decision is consistent with the routing table at the moment it is made. The fault is purely a consistency hazard between in-flight flits and a non-atomic routing-table update causing a deadlock.
Finding bugs such as these on real NoC has historically been hard for three reasons: state-space blow-up across virtual channels and transaction IDs; concurrent flows whose interleaving defeat naive constraints; and proof engines that run out of memory or time before convergence.
nocProve, part of the Axiomiser platform, is purpose-built to remove these barriers. It is delivered as an automated, abstraction-based, push-button flow that is minimally invasive, maximally expansive in scope.
Engineers provide the NoC RTL and a configuration. nocProve does not require a hand-written formal testbench, hand-written SystemVerilog Assertions, or a parallel verification environment. The invariants are pre-coded inside the tool. The user does not write properties. They configure the NoC, run the flow, and receive three deliverables per invariant: a mathematical proof of bug absence when the invariant holds across all reachable states; or a concrete counter-example trace when it does not; and coverage data showing which parts of the design state space contributed to the proof. This triad replaces the “billions of cycles, no bug found” outcome of constrained-random verification with exhaustive proofs of bug absence or deep corner case bugs before they hit the silicon.
The flow is being designed to scale across AXI, ACE, ACE-Lite, and CHI; across mesh, ring, and crossbar topologies; across multiple VCs and transaction classes.
The proof technology CoreProve applies intelligent abstraction-driven techniques developed across 25+ years of formal verification work on CPUs, GPUs, cache-coherent systems, and bus bridges. The abstractions are tuned to NoC structural patterns, which is why invariants that would normally not converge out-of-the-box with formal tools, do converge on nocProve in practical wall-clock time.
The figure below shows the scalability of nocProve on different mesh configurations of FlooNoC. Without nocProve, most properties at the router level alone (never mind about the mesh) remain inconclusive (bounded), and bugs could remain hidden behind the bounds. The gate counts for different mesh configurations are shown in Table 1. The NoCs were generated with full AXI4 support for bursts, and FIFOs depths of up to 8 were analyzed.
| Mesh Size | FIFO depth | Gate count |
| 2X2 | 4 | 1,013,566 |
| 6 | 1,178,606 | |
| 8 | 1,361,166 | |
| 4X4 | 4 | 4,327,516 |
| 6 | 5,015,836 | |
| 8 | 5,774,236 |
Table 1: Mesh configurations of FlooNoC with respective gate counts verified with nocProve.

Fig. 3: Proof run times for different mesh configurations with varying FIFO depths, bursts are enabled.
Verification of NoCs is a formidable challenge that gets compounded for coherent NoCs. The perception about formal verification is that it would not scale for such designs. At Axiomise, we have consistently shown how this can be done for all kinds of designs, and our latest offering nocProve demonstrates that “proving bug absence” is not limited to toy designs. The exercise of going beyond the run-of-the-mill formal by investing in automated proof engineering offers the benefits of identifying deep corner case bugs – a side effect of building exhaustive proofs.
For design teams building custom NoC architectures for AI, HPC, automotive, or data center silicon, the path from “these bug families exist” to “these bug families are closed in our design” is a single nocProve engagement.
— Bing Xue is a senior formal verification engineer at Axiomise.
Leave a Reply