Optimizing NoC-Based Designs

Further optimization of RTL repartitioning with switching from crossbar interconnects to NoCs.


Semiconductor development is currently in a phase of rapid evolution driven by the combination of new technologies and methodologies. The technique of combining multiple functions into systems-on-chips (SoCs) is continuing to grow in complexity. Rapid advancement in new technologies for market segments like data centers, robotics, ADAS and artificial intelligence/machine learning (AI/ML) are resulting in a new breed of SoCs. These fields demand designs that are maximized for both power and performance efficiency. Designers are finding that networks-on-chip (NoCs) provide the enabling technology to meet this demand and are accelerating the move away from crossbar interconnect technology.

Many design teams are transitioning from crossbar interconnects to NoCs in order to reduce congestion, ease timing closure, and optimize for target bandwidth and latencies. Making the switch to a NoC-based interconnect helps designers better optimize increasingly complex SoCs. But does that switch alone leave further optimization opportunities on the table? Often the answer is yes, as many builders of large SoCs have discovered. In addition, NoC technology enables further optimization of RTL repartitioning.

RTL restructuring for physical design: to minimize inter-block wirelength (above), to maximize connection by abutment (below).

What’s left behind?

Crossbar switches are built around clumps of highly connected wiring and logic which are difficult or impossible to disentangle through the narrow channels available in a tightly packed floorplan. This tends to force the area of the design to expand as these connections become more complex. NoCs resemble networks, branches of which are easily threaded through narrow spaces and are therefore less likely to force blocks apart.

But NoCs can only optimize to the extent that gaps between blocks in the floorplan will allow. If gaps are not big enough, either these blocks must be pushed apart to make enough room for a wide interconnect (increasing area), or the interconnect must be routed around intervening blocks (increasing latencies). Further gains could be realized if these limitations could be mitigated.

RTL repartitioning

Functional blocks act as natural barriers to interconnect routing and are bounded by RTL hierarchy. These vary in size and aspect ratio in physical realization – from big to small and from square to long and thin. Larger blocks create more of a barrier, while too many small blocks limit the opportunity for physical synthesis to optimize the logic. Finding the best floorplan solution requires trial and error, but the project schedule can limit the time dedicated to this.  Designers need flexibility to adjust block boundaries by splitting up or merging functions to find a solution more conducive to interconnect layout.

Physical design tools can do some of this, but RTL hierarchy boundaries are limiting. To experiment with the partitioning, surgical alterations must be done in the RTL, which is possible but challenging. Consider splitting a block in two, merging blocks into a new hierarchy level or unmerging a block from its parent. These are all conceptually easy to describe but complicated to implement.

Wires/buses must be restitched in the RTL inside the block and at the next level up, and ports must be created or removed. Care must be taken with tie-offs, feedthroughs, and backing out changes. There are often multiple answers to how these cases are handled. Implementation teams do what they can in experimentation but can only do a few trials because of schedule constraints.

Another way to avoid added latency in running an interconnect around an obstruction is to feed it through the block, allowing neighboring blocks to connect directly by abutment. Physical design tools can support adding simple feedthroughs. NoC structure can be reconfigured as the design evolves, which might require a restart on each physical design pass. One way to avoid this overhead is to define feedthroughs in the RTL through similar repartitioning capabilities.

Automated repartitioning

Automating the repartitioning task is the obvious solution, and Arteris IP has the capability, proven across many customers. The technology can support multiple passes per day if needed. Design teams will still run equivalence checks for each generation. Even so, this automated approach is much easier than the manual approach. Learn more here.

Leave a Reply

(Note: This name will be displayed publicly)