NoC Versus PIN: Size Matters

Complexity and flexibility are the real drivers of fabric choice, not the number of initiators and targets.

popularity

Since I first helped introduce the concept of applying networking techniques to address SoC integration challenges in 2007, I have been asked many hundreds of times how to determine when and where to best use an on-chip network (NoC) instead of a passive interconnect network (PIN)? Is there a minimum number of initiators and targets below which it makes more sense to use a PIN for the SoC architecture in “smaller” designs?

The answer is yes. For very simple SoC designs of no more than 8 initiators by 8 targets, it does make sense to use a PIN-based architecture. By simple, I mean:

  1. Uniform protocol (e.g. all AXI);
  2. Uniform data widths (e.g. all 64b);
  3. Uniform clocking (e.g. only one clock);
  4. Moderate clock frequency (< 400 MHz), and
  5. Small distances between the ports (largest distance from end to end of about 2-3 mm).

However, in my nearly 20 years of working with SoC designers on interconnect fabrics and SoC architectures, I’ve never seen such a simple design! Let’s examine today’s most common SoC attributes in relation to using NoC versus PIN fabrics in “smaller” – but not necessarily simpler – designs.

Most SoCs mix current protocols like AXI with legacy protocols like AHB. PIN protocol converters are very large. Most SoCs have initiators and targets with differing data widths. PIN fabrics have a single internal data width, so designers must either convert all interfaces to a single width or partition their designs into multiple PINs with different widths. The latter choice adds latency and area. It also can have a detrimental impact on wire routing and timing convergence if the data width partitioning doesn’t match the chip floorplan.

Most SoCs have to deal with multiple asynchronous clock frequencies. PIN asynchronous bridges are very large because they operate on the massive number of signals in a 5-channel AXI interface. Many SoCs have a range of clock frequencies, including some at 800+ MHz to keep up with DRAM speeds. PINs have significant timing closure challenges, even with short wires, around 500 MHz. Their register slice components, needed to operate at moderate frequencies with long wire runs, add substantial area and can be difficult to get placed properly to assist in timing closure.

Most SoCs have relatively long Manhattan distances at the top level of the design between the components connected by the fabric. For an SoC that is 7×7 mm, you can expect to see some endpoints that are 7 to 10 mm apart. With PINs, these require multiple register slices to pipeline the wire runs, with each register slice wide enough to cover the 5-channel AXI protocol and 2 registers deep to provide sequential *VALID/*READY flow control. Furthermore, a PIN implementing an 8×8, 64b wide bus-based crossbar PIN can easily consume over 10 meters of wire, even with a maximum endpoint spacing of 2mm, making routing congesting a huge issue!

The argument for using NoCs in “smaller” SoC designs:

  1. Some NoCs have much more area-efficient protocol conversion because the internal network carries a superset of the IP core protocols in addition to providing integrated support for very low area AHB and APB local buses for register and peripheral interfaces.
  2. Some NoCs include non-uniform switching elements that enable a single switch to handle a mix of data widths with no loss in throughput or increase in area. When partitioning NoC fabrics into multiple switches is desired, for example, to support clustering of IP cores, the links between the switches are much narrower than AXI.
  3. Some NoCs support internal asynchronous clock domain crossing with optional voltage/power domain crossing, which saves area because the internal links have fewer wires to synchronize. Additionally, NoCs that offer virtual channels can have multiple data flows that share an asynchronous crossing without blocking each other.
  4. Some NoCs are architected around the idea of scalable pipelines with options to limit the maximum logic cone depth per pipeline stage to support over 2 GHz operation at 14nm. NoC flow control systems also provide superior wire pipelining than *VALID/*READY to support higher frequency operation.
  5. NoCs are engineered to close timing in the context of the high frequencies and long inter-core distances common in SoCs. NoC link retiming is much lower area than the PIN equivalent, both because the links are narrower and some NoCs use more advanced flow control to reduce total retimer depth. These narrow links also ease routing congestion and may support optional source-synchronous and mesochronous timing profiles that eliminate the need to distribute low-skew synchronous clocks across the SoC, gaining valuable design margin. Furthermore, link sharing using Virtual Channels substantially reduces routing congestion. When combined with floorplan-directed partitioning, switched on-chip networks can save more than 75% of the top-level routing versus bus-based crossbars.

If you have a simple SoC design, PINs do provide a simple interconnect fabric solution. But, if you are one of the SoC majority facing the complex requirements covered here, NoCs are a more effective solution for the reasons I’ve mentioned, even for “smaller” designs.



1 comments

Mihit Mehta says:

Great blog on NoC you have define everything so clearly, thanks for sharing.

Leave a Reply


(Note: This name will be displayed publicly)