Monitoring how process variation and aging affect timing of actual chips in real-world deployment.
Accurate static timing analysis is one of the most important steps in the development of advanced node semiconductor devices. Performance numbers are included in chip and system specifications from the earliest marketing requirements. The architects and designers carefully determine clock cycle times that can achieve the required performance using the chosen high-level architecture, micro-architecture, detailed implementation, and underlying silicon technology. Failure to achieve the specified cycle times due to long paths will compromise chip performance and may render the final product uncompetitive in the marketplace. On the other hand, races and other flaws related to short paths can keep a fabricated device from operating at all and may require a very expensive chip turn to fix.
The criticality of static timing analysis has influenced the path of modern electronic design automation (EDA) solutions. Historically, logic synthesis and physical layout (floorplanning, placement, and routing) were accomplished with separate EDA tools. Designers were forced to allocate timing margins during synthesis to accommodate unpleasant surprises that might arise in post-layout static timing analysis. This often led to overdesign, with unnecessary power consumed and precious silicon area wasted to compensate for potential layout issues that never arose. Even with timing margins, path deviations sometimes exceeded the guard bands, leading to manual iterations between synthesis and layout tools. This has largely been resolved by combining these two stages into a single solution, with much smaller margins and any iterations occurring automatically.
Even though the term “timing closure” almost always refers to the pre-silicon design process, in fact timing is not truly closed until the physical chips are fabricated. Nasty surprises related to both long and short paths can appear in the post-silicon stage. There are many reasons why there may be a gap between predicted values and actual chip timing, including:
To some extent, designers and EDA tools can anticipate these effects and increase timing margins to try to accommodate them, but overdesign is a significant risk. In addition, these effects do not necessarily affect timing for every device. Process variation means that not every fabricated wafer or chip has identical timing characteristics. Increasing long path timing margins in anticipation of an issue that may affect only a small percentage of parts is probably not a good tradeoff; it is better to accept that these few chips will be binned separately during production test and marked for less performance-critical applications. Further, some of these effects may only occur due to—or be exacerbated by—silicon aging. Having every system in the bring-up lab running at full speed does not necessarily mean that millions of chips in the field over their lifetime will retain the same timing characteristics.
These are daunting challenges indeed, and ones that can only be addressed by extending the notion of timing closure to production silicon. This implies that there must be some way to gather information on how process variation and aging affect actual chips in real-world deployment. Fortunately, there is a technology available today to harvest and report this feedback. A recent blog post introduced the concept of path margin monitor (PMM) intellectual property (IP). PMM units provide fine-grained observability of silicon timing while chips are operating in the bring-up lab, in production test, or in the field. They measure the delay of real functional paths without compromising functional operation.
PMM units provide a novel and valuable form of timing analysis for fabricated devices. The first steps can be performed in the bring-up lab as soon as the initial chips arrive from the foundry. Much more data reflecting process variation can be gathered as production quantities of chips are tested. As these chips are installed in end systems and deployed in the field, even more data under a wide variety of operating conditions becomes available. Over time, the continuous collection of path delays from the field will reflect aging effects, but of course this may take years. Fortunately, the chip bring-up and qualification process already uses burn-in chambers to check for infant mortality and model accelerated aging. Performing burn-in with PMM units included in the silicon provides early warning of timing issues that might not be detected in the field for a long time.
Clearly, PMM units collect and report a lot of data, but the key question is how chip developers and manufacturers can use this information for better designs in the future. Path delay information gathered from bring-up, burn-in, production chip test, and the field can be used to refine the margins used during the design process and the silicon models used during static timing analysis. If the chip is turned for any reason, tighter margins and more accurate timing analysis will yield a revised design better optimized for power, performance, and area (PPA). The same is true for chip variants, derivative designs, and follow-on projects using the same silicon technology. PMM feedback from deployed chips can also improve the manufacturing and test processes, resulting in better yield and more accurate binning.
The Synopsys Silicon Lifecycle Management (SLM) platform includes a PMM solution and is tightly linked to Synopsys PrimeTime static timing analysis tool and Synopsys PrimeShield design robustness analysis and optimization solution. When Path Margin Monitor IP is included in fabricated chips, all the benefits outlined above are available to designers of related devices. PMM feedback provides greater visibility and insight into post-silicon timing, from the first units in the bring-up lab to aging chips deployed in the field. No timing analysis or timing closure process can be truly complete without bridging the gap from pre-silicon design to actual chips.
For more information on the complete SLM solution, a white paper is available.
Leave a Reply