The Architect’s Dilemma And Closing The Loop With Implementation

Getting earlier reliable estimates on timing can reduce the overall project schedule.

popularity

Gordon Moore has left a mark on our industry. Moore’s Law has shaped decades of development. The EDA industry has been moving up the layers of abstraction to increase the productivity and predictability of design flows in our efforts to address the ever-increasing complexity of semiconductors and electronics developments. I had written about it in “Chasing The Next Level Of Productivity” not long ago.

In Aart De Geus’s keynote at the Synopsys User Group (SNUG), he likened the challenge architects face today to a Vegas slot machine in trying to balance the characteristics of chiplets for memory, computing, accelerators, and connectivity. Artificial intelligence and machine learning will certainly be part of the answer. Since my chip developments in the 90s, the architect’s dilemma has fascinated me: We would like fast and accurate simulations to answer trade-off questions and make early architectural decisions. In reality, we get simulation accuracy at a restrictively lower speed, much later during projects when more basic architecture decisions are no longer feasible. One can always use brute-force approaches, as I outlined in “Confessions Of An ESL-Aholic” from almost a decade ago. In his 2023 keynote, Aart used the term eDT – electronic Digital Twin – when announcing Zebu 5 and using an automotive example. Zebu 5 ran “real-life automotive scenarios, billion-cycle software workloads, NoC throughput stress testing, and 5,000 regressions a day”—a mix of analysis, optimization, and validation.

Is there a magic wand?

Electronic System Level (ESL) design promised to enable early decisions, albeit with later refinement. The magic wand would allow users to reliably predict what happens later in the implementation flow to make architectural decisions. At this SNUG, we showed how we used RTL Architect for Networks-on-Chip (NoCs) to confidently predict what the rest of the place and route (P&R) flow will do. My colleague Shivakumar Musini and myself presented in a session moderated by Synopsys’ Jim Schultz, the product management lead for RTL Architect.

Arteris announced in February how we are making network-on-chip development physically aware. In the associated flow diagrams, you can see how we help optimize the NoC topology for a given set of initiators and targets that connect the blocks on a System on Chip (SoC), meeting specific requirements for throughput, latency, and priority. We then automatically generate the RTL feeding it into the implementation flow through synthesis, place & route (P&R), and timing closure.

The reference project examples found timing issues after P&R that could only be addressed by returning to the topology development to make adjustments or inserting pipeline registers to accommodate the timing to move signals across the silicon. By abstracting a limited set of technology characteristics – gate and wire delay, gate and flop area – we can avoid some of those painful loops using high-level estimates.

So what are other automation opportunities?

In the examples we have seen with some customers, the P&R for the NoC can easily take several days. If developers then encounter timing issues that require returning to the topology development, they wish they had earlier reliable estimates on timing. Looking at the project flow diagram, lowering the 8-10 of those turns to a lower number can significantly reduce the overall project schedule, as shown in a rather idealized form here:

(Source: Arteris)

(Source: Arteris)

And that’s where RTL Architect came in for us. It helps with the challenge that the margins for synthesis don’t predict all the physical effects during implementation when using DC and DC Topo for synthesis. In our day-to-day interaction with our customers, this situation led to several iterations with our application engineering team. DC run times increased with design sizes, too.

(Source: Arteris)

In our experiments, RTL Architect predicted the implementation PPA more accurately, allowing our customers to arrive at implementable NoC configurations faster. Using RTL Architect in early congestion analysis allowed the teams to review cell density maps and hot spots, review utilization and consider potential floorplan changes. Our customers were able to check placement issues early and address them well before the actual layout runs by refining the floor plan, running explorations with placement bounds, and considering RTL recoding.

With RTL Architect and Fusion Compiler (for P&R) built on the same engines, in our assessments, the overall estimates were within 2% for the area, 10% for timing, and 5% for power (Fusion Compiler as reference). But importantly, RTL Architect runtime was three times faster than Fusion Compiler and more than six times faster than DCNXT.

Where does that leave us, and what is next?

As an industry, we are still far from solving the architect’s dilemma. AI/ML, as brilliantly outlined by Aart in his SNUG keynote, may offer a path forward. We may be on track toward an EDA version of Tony Stark’s “Just A Rather Very Intelligent System” JARVIS that seems to help him quite a bit in the Marvel Universe. Until then, within the scope of RTL through synthesis and P&R to layout, flows like RTL Architect through Fusion Compiler have the potential to offer significant productivity boosts.



Leave a Reply


(Note: This name will be displayed publicly)