Systems & Design
SPONSOR BLOG

Why Your NoC Verification Strategy Must Consider Using Formal

Exhaustive proofs are the only way to find deep corner-case bugs that can result in deadlocks and silent data corruption.

popularity

By Ashish Darbari and Bing Xue

It’d be inconceivable these days to design a modern high-performance SoC without a network-on-chip (NoC) fabric. AI hyperscalers are inherently multi-threaded and rely on using hundreds of processing elements (PEs). Crossbar-based fabric would just not scale. What also changes with the adoption of the NoC is how to handle coherency between PEs. ACE is no longer adequate, and the Arm CHI protocol is widely used to stitch the compute elements on a coherent NoC.

The NoC sits at the center of every data movement in a coherent SoC: cache lines, snoop responses, DMA traffic, interrupts — all of it.

In this blog, we describe why verification of NoCs is hard, what kind of bugs can escape to silicon with a few specific examples, and why exhaustive proofs are the only way to find the deep corner-case bugs that can also result in deadlocks and silent data corruption.

We describe what kind of bugs can be caught by nocProve – Axiomise’s latest domain-specific app powered by the proof technology CoreProve. The app can find deep bugs as well as verify complete mesh configurations of NoC designs generated from NoC code generator. We provide insights on the scalability of nocProve to handle exhaustive proofs.

Coherent and non-coherent NoCs: Formidable verification challenge

A non-coherent NoC moves bytes. Its correctness obligations are bounded: in-order or out-of-order delivery per channel, no deadlock, no loss, well-formed protocol packets at every boundary. It is very typical to use AXI or custom AXI protocols for such NoCs. The greatest challenge is to ensure that data integrity is preserved and there are no deadlocks and livelocks. With low-power optimizations and CDC, the verification problem is sufficiently hard for simulation alone to be able to establish that there will be no respins or last-minute bugs. Exhaustive verification of complex routers with VCs, reorder buffers and protocols that support bursts is a formidable challenge for any formal verification testbench.

A coherent NoC must preserve the coherence protocol invariant across every cache, every snoop filter, every directory entry, every Point-of-Serialization, and every Point-of-Coherence in the fabric. AMBA 5 CHI, ACE, and ACE-Lite NoCs sit here. Consider the several bus bridges that add complexity. The verification surface explodes: a single mis-routed Comp flit, a single double-counted credit, a single stale snoop response, can corrupt the line’s coherence state at one Requester without any other agent noticing. The corruption could be structurally silent; the Arm RAS architecture sees a clean transaction because every individual flit was well-formed.

Coherent NoCs therefore demand a different verification posture. The properties are not flit-local; they must be end-to-end invariants over the entire fabric and the entire protocol state space.

A coherent NoC makes the verification much harder. We not only have to look at verifying data transport checks but additionally the full context of coherency (MOESI) must be addressed, and the conformance against the CHI protocol must be done. With seven cache-line states, 11 reads, 14 writes, 18 combined writes, 21 snoops and 4 atomics – the combination of transactions and states is just staggering. If you weave DCT/DMT/DWT on top and look at the NoCs with virtual channels, transaction IDs, coherence states, credit accounts, routing-table contents, QoS classes, and outstanding-transaction queues – we’re simply not able to rely on simulation alone.

No simulation budget reaches the corners where the bugs live.

Worse, the bugs that cost respins are not the bugs that simulation misses by accident. A simulation that hits one of the three preconditions is vanishingly unlikely to also hit the other two within any practical simulation budget. The bug stays hidden until silicon, where real workloads naturally produce the precondition combination — sometimes within minutes, sometimes only under specific application traffic — and always without warning

The bug families that escape to silicon

We’ve been studying the bug patterns on NoCs for a few years and examining them lately from the context of coherent NoCs. We are happy to share a high-level view of the split of bugs into families.

Due to space limitations, we are not expanding on the numerous sub-classes of bugs that may be coherency related. Any verification plan must consider these. Items 1-11 are about correctness, while item 12 is the family that crosses into a security threat model.

  1. Flow-control accounting errors: Producer and consumer disagree on how many buffer slots are free, so a flit is launched into a slot that still holds an earlier one. The result is silent overwrite of in-flight data.
  2. Ordering and serialization violations: Transactions that the architecture requires to be observed in a specific order are reordered by the fabric. Software then sees a memory state that no sequential execution could have produced.
  3. Coherence state-machine bugs: Caches and home nodes disagree about the architectural state of a line, often because of an unhandled transient, a CompAck race, or a dropped forwarded transfer. The system is incoherent without any protocol error firing.
  4. Head-of-line deadlock: A single stranded flit at the head of a shared resource blocks every other transaction sharing that virtual channel. Coherency freezes on the affected lines while the rest of the fabric runs.
  5. Arbitration livelock and starvation: The arbiter keeps advancing transactions, but one requester or class of requesters is deferred indefinitely. The fabric makes progress; the victim never does.
  6. Burst integrity: AXI bursts lose a beat, lose LAST, or arrive with the wrong beat count, usually at a width converter, a 4 KB boundary, or a clock crossing. The slave hangs waiting, or the master silently consumes a stray beat as the next transaction.
  7. Response channel integrity: A completion is dropped, reordered across IDs, or delivered with a corner-encoded BRESP/RRESP value the consumer mishandles. The requester treats a wrong response as authoritative because the handshake on the wire is legal.
  8. Reset and power-domain crossings: Transactions in flight when a reset edge or power-down crosses the fabric are not drained, cancelled, or quiesced cleanly. A pre-reset request completes as a post-reset one, or a slave disappears mid-transaction.
  9. Clock-domain crossing and metastability paths: A signal crosses between clock domains with insufficient synchronizer depth, a gray-code pointer slip, or a missed handshake pulse. The bug appears at one PVT corner and not others, never reproduces in RTL simulation, and shows up in the field.
  10. X-propagation: A flop or RAM bit was never deterministically reset, drives an X, and a downstream control decision is made on that unknown value. Simulation’s 2-state optimism hides it; silicon resolves the X to whatever the gate happens to decide.
  11. Control-field mutation in transit: A control field (opcode, transaction ID, destination, length) was correct at the producer but arrived wrong at the consumer because of a glitch, a CDC misalignment, a stuck-at fault, or an ECC-uncovered SRAM flip. The wire carried a deterministic but wrong value, and no protocol check fires.
  12. Security and side-channel paths: The fabric routes traffic from multiple security domains (Secure/Non-Secure, Realm/Root, multiple guest VMs) and must enforce isolation across them. A routing-table misconfiguration, a shared-buffer leak, or a snoop-filter timing channel lets one domain read, infer, or corrupt another domain’s data.

Example bugs caught in NoCs

We have cataloged several bugs in our internal technical report. Here, we take a look a couple of example bugs.

Silent Data Corruption: ID-ordering in ROB

The waveform shows an AXI ID-ordering issue in which the reorder buffer writes responses to memory locations indexed by transaction ID, but two consecutive responses with the same ID are written to the same address so that the earlier response is silently overwritten. This bug is caught through the checks that are shipped out-of-the-box with nocProve.

Fig. 1: Silent data corruption.

The bug is subtle in that the earlier response is “silently” overwritten. These kinds of bugs are very hard to catch as they silently corrupt data. The system may continue operating, yet data has already been lost or corrupted with no immediate visible crash.

Deadlock: NodeID aliasing in a coherent NoC

This particular bug is a deadlock that occurred due to NodeID aliasing in routing-table configuration. To understand this better, we first need to describe the context.

Context

The bug occurred in a 2D mesh NoC carrying AMBA 5 CHI traffic. The key agents involved in this were:

  1. RN-F (1,1): a fully coherent Requester Node co-located with router (1,1). For this scenario it is acting as the upstream port that forwards RSP/DAT-channel flits sourced by the Home Node toward two downstream RN-Fs. (Equivalently, this could be an HN-F whose physical placement is at mesh node (1,1).)
  2. RN-F (2,1): a coherent Requester Node with one outstanding ReadShared to a cache line owned by the HN-F. It is the intended recipient of response flit r0.
  3. RN-F (0,1): a coherent Requester Node with eight outstanding requests (a mix of ReadOnce / ReadShared) to the HN-F. It is the intended recipient of r1:r8.
    1. RN-F (0,1) is implemented with a single-entry RSP-channel ingress buffer at its CHI Link Layer (a low-area peripheral PE choice). Note that CHI does not specify a minimum depth.
  4. An HN-F (Home Node, fully coherent) is the sourcing agent for r0:r8. It is co-located at mesh node (1,1) and injects all nine flits into the NoC through router (1,1)’s local injection port, in transmission order r0, r1, …, r8 on the CHI Response virtual channel (RSP VC). Under the correct NodeID-to-coordinate binding, router (1,1) would route r0 east toward router (2,1) (RN-F (2,1)) and r1:r8 west toward router (0,1) (RN-F (0,1)). The misroute that triggers this bug consists of r0 being sent west instead of east, after which it follows the path router (1,1) to router (0,1) along with r1:r8.
  5. All nine flits travel on the same CHI virtual channel: the RSP VC (CHI mandates VC separation only across REQ/RSP/DAT/SNP, not within a class: IHI 0050 §2.6). Each flit’s header carries a CHI TgtID field, which the NoC’s local routing logic at each router translates to an (x,y) egress decision via a firmware-programmable routing table / SAM.
  6. The system supports runtime reconfiguration of NodeID-to-coordinate bindings, used for power-domain transitions, hot-plug of accelerator tiles, and DVFS-driven re-binding of master ports. Routing-table updates are pushed to each router by a control-plane agent (PMU / system controller) via a separate configuration channel.

Fig. 2: Deadlock caused by routing table.

Bug description

HN-F is returning a sequence of nine response flits r0:r8 on the RSP virtual channel, all transiting router (1,1):

  1. r0 is a CompData (or, equivalently, a Comp for a write transaction) destined for RN-F (2,1), completing that RN-F’s outstanding ReadShared. HN-F encodes r0’s header with TgtID = NID_A, where NID_A is the NodeID value that at the moment of encoding mapped to physical coordinate (2,1) in every router’s routing table. A few cycles later, the system controller commits a routing-table update that rebinds NID_A to a different coordinate (specifically, one reachable via the West port from router (1,1)). r0’s header is unchanged — it still carries NID_A. But the routing-table lookup at router (1,1), now using the new entry, resolves NID_A to the West egress instead of the East one. r0 is misrouted toward (0,1).
  2. r1:r8 are responses destined for RN-F (0,1): TgtID(r1..r8) = NodeID(0,1) : completing eight separate transactions outstanding from that requester.
  3. Concurrently with HN-F’s response stream, the system controller initiates a NodeID reconfiguration. Triggered by a power-domain transition, the controller is rebinding the NodeID previously held by RN-F (2,1), call it NID_A to a different physical node.
  4. The destination of the rebinding does not matter for the failure; what matters is the propagation of the routing-table update. The configuration write is broadcast by the system controller to every router. Because routers (0,1), (1,1), and (2,1) sit at different distances from the controller and apply table writes asynchronously, the update does not commit atomically across the mesh.
  5. There is a window of several cycles in which:
    1. HN-F’s egress decoder at its injection port has already committed the new mapping. So, when r0 is launched, HN-F encodes its TgtID field using the old NodeID NID_A: it has been queued in HN-F’s response pipeline since before the reconfiguration commit, with TgtID already latched into its header.
    2. Router (1,1)’s routing table has just committed the new mapping. It now resolves TgtID = NID_A to a different (x,y) : specifically, to coordinate (0,1), the West neighbor: because NID_A has been rebound to a node reachable via the West port. This is a valid, parity-correct, table-driven routing decision.
    3. Router (0,1)’s routing table has also committed the new mapping, consistent with router (1,1).

There is no bit-flip, no flit-header corruption, no parity error, no ECC event. Every header is intact. Every parity check passes. Every routing decision is consistent with the routing table at the moment it is made. The fault is purely a consistency hazard between in-flight flits and a non-atomic routing-table update causing a deadlock.

How are these bugs found?

Finding bugs such as these on real NoC has historically been hard for three reasons: state-space blow-up across virtual channels and transaction IDs; concurrent flows whose interleaving defeat naive constraints; and proof engines that run out of memory or time before convergence.

nocProve, part of the Axiomiser platform, is purpose-built to remove these barriers. It is delivered as an automated, abstraction-based, push-button flow that is minimally invasive, maximally expansive in scope.

Minimally invasive

Engineers provide the NoC RTL and a configuration. nocProve does not require a hand-written formal testbench, hand-written SystemVerilog Assertions, or a parallel verification environment. The invariants are pre-coded inside the tool. The user does not write properties. They configure the NoC, run the flow, and receive three deliverables per invariant: a mathematical proof of bug absence when the invariant holds across all reachable states; or a concrete counter-example trace when it does not; and coverage data showing which parts of the design state space contributed to the proof. This triad replaces the “billions of cycles, no bug found” outcome of constrained-random verification with exhaustive proofs of bug absence or deep corner case bugs before they hit the silicon.

Maximally expansive

The flow is being designed to scale across AXI, ACE, ACE-Lite, and CHI; across mesh, ring, and crossbar topologies; across multiple VCs and transaction classes.

Abstraction-based

The proof technology CoreProve applies intelligent abstraction-driven techniques developed across 25+ years of formal verification work on CPUs, GPUs, cache-coherent systems, and bus bridges. The abstractions are tuned to NoC structural patterns, which is why invariants that would normally not converge out-of-the-box with formal tools, do converge on nocProve in practical wall-clock time.

Scalability

The figure below shows the scalability of nocProve on different mesh configurations of FlooNoC. Without nocProve, most properties at the router level alone (never mind about the mesh) remain inconclusive (bounded), and bugs could remain hidden behind the bounds. The gate counts for different mesh configurations are shown in Table 1. The NoCs were generated with full AXI4 support for bursts, and FIFOs depths of up to 8 were analyzed.

Mesh Size FIFO depth Gate count
2X2 4 1,013,566
6 1,178,606
8 1,361,166
4X4 4 4,327,516
6 5,015,836
8 5,774,236

Table 1: Mesh configurations of FlooNoC with respective gate counts verified with nocProve.

Fig. 3: Proof run times for different mesh configurations with varying FIFO depths, bursts are enabled.

Closing thoughts

Verification of NoCs is a formidable challenge that gets compounded for coherent NoCs. The perception about formal verification is that it would not scale for such designs. At Axiomise, we have consistently shown how this can be done for all kinds of designs, and our latest offering nocProve demonstrates that “proving bug absence” is not limited to toy designs. The exercise of going beyond the run-of-the-mill formal by investing in automated proof engineering offers the benefits of identifying deep corner case bugs – a side effect of building exhaustive proofs.

For design teams building custom NoC architectures for AI, HPC, automotive, or data center silicon, the path from “these bug families exist” to “these bug families are closed in our design” is a single nocProve engagement.

— Bing Xue is a senior formal verification engineer at Axiomise.



Leave a Reply


(Note: This name will be displayed publicly)