Considering Semiconductor Implementation Aspects Early During Network-on-Chip Development

How implementation can fundamentally impact architecture.

popularity

As they say, while history may not repeat itself, it sure rhymes.

In 2015, I wrote the blog “Why Implementation Matters To System Design And Software.” At the time, I mused that while abstraction is essential in system design, it has limitations that users must consider. Critical decisions, such as those regarding power and performance, require more accuracy than can be feasibly abstracted. But it takes time to get to this increased accuracy. Power analysis driven by RTL-based emulation would provide more accurate power predictions when considering implementation effects seen when modeling semiconductor technology more accurately.

Fast forward eight years, and I am facing a situation that rhymes. This time it is about the topology development of Networks-on-Chips (NoCs) and how semiconductor implementation effects can fundamentally impact architecture.

Consider the high-level flow graph below. From a whiteboard, developers will separate the compute and peripheral building blocks from the interconnect. Users decide which protocols like AMBA 5 ACE-Lite, AXI, AHB, APB, OCP, or PIF the NoC often depend on the processors in the system on chip (SoC) to be developed.

Co-optimization of network-on-chip architecture and its layout

For the interconnect implementation, we at Arteris provide the tools and infrastructure for developing and optimizing the NoC topology, i.e., the number of switches, buffers, and safety features, all based on the interface, traffic, clock, and power domain specifications. At the level of “functional performance,” our and our partner’s tools for performance exploration allow assessments of memory and other architectural effects.

So it’s done from here. Right? Well, that would be nice.

From RTL to implementation

Place & Route (P&R) and the effects for semiconductor-technology-specific implementation want their fair say. Here are three areas in which the floor planning and sometimes even the actual layouts after P&R can heavily impact NoC topology development.

Firstly, while the interconnect between the significant building blocks often becomes the long pole in the tent called timing closure, its silicon real estate, more often than not, depends on the primary building blocks in the system. From a NoC-centric view of the world, we call them “blockages.” Designers ask the NoC to use what’s left.

Secondly, these blockages determine with the ports of the layout building blocks of compute and peripheral components the layout position of the critical connections communicating through the NoC. It’s like a puzzle to connect all of them using the NoC.

Thirdly, signal propagation becomes an issue now that we have the port positions and the area of silicon real estate that is made available for the NoC. Determining signal propagation has become very complicated, especially at smaller geometry nodes, as outlined in the following figure:

The transport delay is a function of many parameters, including the actual foundry, the routing stack used, the type of driving cell, the process used, the voltage, the temperature, and many more. Yes, developers still use rules of thumb that one will get about 1 mm of distance in about 500 ps and need to place pipeline registers appropriately. But still, it is complicated, and there is no one number for the transport delay. Gone are the days of my first chip design in which we manually calculated the metal routing layer capacity and decided to invert the clock tree right before tape-out. The chip worked on the first try. It used a .8 micron technology (ahem, 800nm) process, and I think we had only one metal layer. In contrast, in today’s smaller geometries, architects will be hard-pressed to determine which of the many layers to use for which type of signal to carry signals across.

The schedule impact – tool results are far from instant

Bottom line – all of this comes down to an actual information dilemma. The IP development tools know the architectural topology issues. The P&R tools know all the implementation effects. But as the right side of the first illustration shows, when delving down the various layers of abstraction, the turn-around times become longer and longer.

The following illustration shows a project schedule of a layout-aware but manual flow.

It can easily take two to five weeks in the front end to optimize the NoC architecturally. The team used an abstract floor plan to co-optimize the NoC topology. They manually developed the constraints to steer the digital implementation flow and P&R. They automatically exported the NoC from our environment to RTL and went through synthesis and P&R. But: They could not close timing, which they found out after layout runs that took 5-6 days. Returning to manually re-adjust the constraints to update the pipelining, and three times even the topology itself (updates to other blockages can require that, too), they had spent about ten weeks for the physical closure phase.

By now, it should be clear that it is critical to minimize the number of loops that lead back to the architecture phase, which is only possible when co-optimizing the IP with its layout.

Estimation based on abstraction to the rescue

There is probably no surprise that we have worked on the abovementioned issues.

Firstly, teams can consider early-stage and even late-stage floor plan information for IP development early. In “Why network-on-chip IP in SoC must be physically aware” my colleague Andy Nightingale recently illustrated this further. As of today, we have updated our IP development tools to read in floor plan information from chip images, Visio files, or LEF/DEF definitions.

Secondly, importing blockages with the positions of the ports to which the NoC needs to connect to allows automation of the placement of the main components of the NoC topology. Check. We implemented that. It beats the manual development of constraints in a project’s first phase.

Thirdly, abstracting technology information like gate delay & area, as well as wire delays, allows estimating the positioning of NoC components and insertion of pipelines significantly better and faster than any manual estimate would enable development teams to do. Does it rival or even attempt aspects of P&R? Absolutely not! We are partnering with all P&R vendors in this domain. These capabilities provide a significantly better starting point for digital implementation flows as they pick up the RTL generated by our IP development tools.

And best of all, as the following figure illustrates, this much better starting point allows for much-reduced schedules, as we are now allowing the co-optimization of NoC IP with the layout.

In addition, this physically aware flow further reduces the area and power consumption of the NoC by optimizing wiring and pipeline stages. We are simply avoiding overprovisioning, which is often the result of more manual flows.

The introduction of these capabilities is a significant step forward – check out FlexNoC 5 from Arteris. But please know, we are from done yet! Looking at the last illustration, one can quickly identify areas for further optimization, like even closer integrations with the information we can derive from our partner’s P&R engines. And the early phase of NoC topology development is also ripe for further optimization. Watch this space!



1 comments

Eric Esteve says:

Frank, as far as I remember, 0.8um was 2 metal layer technology. The big jump (moving from 1 to 2 metal layers) was for 2um… in 1983 at MHS, and it was far to be easy (took one year to manage proper metal lines crossing).

Leave a Reply


(Note: This name will be displayed publicly)