ECO Should Not Stand For Extended Challenge Order

Why ECO has become such a dominant factor in design signoff.

popularity

There’s an old saying that the first 90% of a task takes 90% of the schedule, and the remaining 10% takes the other 90% of the time. In chip development, design-signoff closure has become one such task. Ideally, when the design has been placed and routed (physical implementation), final analysis of timing and other metrics is performed and an engineering change order (ECO) file is issued to the physical implementation (layout) tools to make final tweaks and correct any issues found. This process worked well for designs at older technology nodes, but it has broken down in recent years.

As projects have moved to advanced nodes, the ECO task frequently takes numerous iterations back through physical implementation, with a lot of manual effort needed to have any chance of signoff closure. The ECO challenge consumes resources and extends the project schedule just when the chip should be taping out. Increasing cost and delaying time to market (TTM) are serious issues that can compromise the viability of the project and the end product.

There are several reasons why ECO has become such a dominant factor in design signoff. One key factor is that the types of analysis have increased due to the physical effects of finer geometries such as 7nm, 5nm, and 3nm. Timing is still the first metric that comes to mind, but both dynamic and leakage power are also extremely important. Dynamic and static IR (voltage) drop analysis has become critical due to IR effects on timing.

Other checks include the clocking network, metal layers, area, route congestion, reliability, robustness, and ability to handle process variations. All these optimizations are typically run sequentially, and ECO tools provide varying levels of support for improving the results. A traditional ECO solution built from within the timing analysis tool architecture designed for optimization of timing closure may run very inefficiently when used for area and timing closure, or not handle these dimensions of optimization at all.

The enormous size and complexity of deep-submicron designs means that every analysis and ECO optimization run to fix issues takes longer and consumes more compute resources. The situation is exacerbated by the number of corners or scenarios that must be considered. Dozens or even hundreds of scenarios must be run, growing to hundreds as processes shrink. The result of all this analysis is that millions of violations are reported on early physical implementation runs. Handling this large number and driving convergence to reduce it to zero is a big part of the ECO challenge.

The turnaround time (TAT) for each ECO iteration loop (implementation to signoff) is also an issue. Two to three days for 7nm designs and four to six days for 3nm designs are typical TAT numbers, requiring more than 100 compute servers to run all scenarios simultaneously. On large designs, a single complete iteration may take as long as two weeks. It’s bad enough that each ECO pass is so hard, but dozens of iterations are typically required to achieve convergence.

The biggest reason for many iterations is the lack of correlation between digital implementation tools and the post-layout signoff checks, especially static timing analysis (STA). Timing-aware place and route cannot converge quickly if its calculations don’t match STA. In the other direction, ECO tools should be layout-aware so that the optimization instructions sent back to place and route are appropriate and likely to achieve the desired effect of reducing violations in STA and other analysis steps.

It is common for the ECO process to consume 50% or more of the design-closure time at advanced technology nodes. Accordingly, chip developers are clamoring for a modern ECO solution that converges accurately, and much more quickly. Their wish list for such a solution is extensive. They require close cooperation between signoff STA, ECO, and physical implementation tools. The ECOs must be physically aware so that the optimization instructions produce the best results in terms of power, performance, and area (PPA). The place and route process must evaluate timing fully aligned with signoff STA results.

100% correlation and predictable results are the keys to reducing the number of ECO iterations down to a handful, another requirement high on the list. Each ECO run must be much faster than today’s technology, capable of covering thousands of scenarios, combining as many as possible into single runs to save time and compute resources. The ECO solution must have high capacity to handle today’s designs, capable of running several billion instances with TAT of a single day or less. ECO solutions must use artificial intelligence (AI) technologies to offer smart optimizations for the best PPA.

It must be possible to run block-level ECOs from the top level of the design, perform hierarchical ECOs for blocks instantiated many times, and permit manual ECOs when desired. An efficient and intuitive graphical user interface (GUI) is essential for manual optimization. Two-stage ECOs must also be supported, since some designers fix the base layers and start silicon fabrication as they are finalizing the design and routing the metal layers.

The solution must be capable of handling finFET designs where design convergence requires ECO awareness of the placement and routing rules. 2.5D/3D designs are becoming much more common as designers reach of the limits of what they can put on a single die. Thus, a modern ECO solution must also be capable of performing multi-die analysis and providing instructions for optimization and closure of multiple dies at the same time.

Above all, a better ECO solution must be golden signoff accurate. Quick convergence and design closure can happen only with a unified flow and tools that are fully aware of each other. The industry and its chip developers are waiting eagerly for an answer.



Leave a Reply


(Note: This name will be displayed publicly)