Next-Generation Liberty Verification And Debugging

How machine learning can improve the traditional process of verifying Liberty files.

popularity

Accurate library characterization is a crucial step for modern chip design and verification. For full-chip designs with billions of transistors, timing sign-off through simulation is unfeasible due to run-time and memory constraints. Instead, a scalable methodology using static timing analysis (STA) is required. This methodology uses the Liberty file to encapsulate library characteristics such as timing, power, and noise for standard cells, memory macros, and custom cells. The Liberty file is then is used by all downstream tools for synthesis, physical design implementation, as well as full-chip timing, power, and signal integrity sign-off.

In order to achieve accurate results through STA, the Liberty files must be meticulously checked for errors and problems. There are only a few off-the-shelf EDA tools for library verification, so verifying Liberty files has traditionally been accomplished through a combination of first-generation commercial rule-based checkers, in-house scripts, and creative manual hacks by highly skilled experts. This process is effort-intensive and time consuming, taking weeks or months to verify and debug a single characterized library. Despite the effort required, the current process is still highly prone to missing key problems in the Liberty files.

The Solido team at Mentor are experts in machine learning-accelerated EDA tools, and have applied their techniques to achieve fast and accurate variation-aware design and verification of memory, standard cell, and analog/RF analog circuits. Recently, the team has developed a set of machine learning and information visualization techniques to address the challenge of Liberty verification and debugging. These techniques have been integrated in a new Liberty verification and debugging tool called MLChar Analytics, which forms the basis for the next-generation approach to verifying and debugging libraries. The tool verifies and debugs libraries by:

  1. Building a machine learning model from the characterized results to detect outliers and properties that go against semiconductor behavior, and
  2. Uses information visualization methods to provide a master dashboard, linking all identified library issues with their source information enabling users a global view of their library, while easily navigating to details of any error found.

This new approach speeds up library validation and debugging by more than 10X, and can consistently find entirely new classes of errors that conventional rule-based checkers miss.

The Liberty verification and debugging challenge
Both in the design and sign-off stage, STA tools rely heavily on the accuracy of the Liberty files. The functionality of STA tools center around using information from the Liberty files to verify the final design, and do not thoroughly validate the Liberty files themselves. If the Liberty files are incorrect, STA tools give the wrong answers, resulting in re-spins, lower chip yield, schedule delays, or unnecessary over-margining, which are all very expensive problems.

Unfortunately, characterization results can still contain many errors and inaccuracies that library teams need to detect and fix before releasing the library. Modern libraries contain up to a few thousand cells that need to be characterized for a range of process, voltage and temperature (PVT) conditions. Many cells also have body-bias variants and different threshold voltages (Vt). This leads to hundreds of millions of simulations that are required to characterize a library. Due to the sheer scope of characterization, it is naive to believe that a library can be characterized correctly 100% of the time.

Consider an example where a few values of a propagation delay table were incorrectly characterized to be optimistic. If the conditions in the design and particular cell instance use those delay times, the STA tools report that timing path has met timing, when in reality it has not. This can lead to functionality errors that must be fixed in a re-spin, or timing errors where the path marginally meets timing, causing the design to fail at certain PVTs. Both cases can result in decreased yield for the design.

The situation in reverse is true too: if, due to characterization or simulation errors, the Liberty file specifies overly-pessimistic timing or power, the STA tool will report inaccurate timing or power numbers accordingly. This means that implementation tools must work harder to meet false requirements, which might lead to over-margining, increased chip area, or higher power consumption.

The key components that go into a successful library characterization run include: transistor models from the foundry PDK, testbenches from the library team or characterization tool, as well as measurement definitions, characterization configuration, and simulator settings. It is not surprising that a characterization team cannot replicate a characterized library without significant effort, even when using the same simulator and characterization tool.

There are additional causes of characterization errors to consider. To decrease the overall run-time, aggressive simulator and characterization settings are used. There may be bugs in the tool chain or in custom scripts. Arcs may be identified incorrectly, and there may be IT failures. Any of these issues will “pollute” the final characterization results. All characterization errors need to be caught, debugged, and fixed before the library can be released to the product team.

Given the vast amount of characterized data in a library, the task of validating it is daunting. Validation requires understanding and analyzing hundreds of millions of characterized values, some of which are waveform shapes (CCS), or statistical distribution parameters (LVF). Without the right tools, this becomes a significant hurdle to the library release process.

Limitations of the first-generation approach
Traditional Liberty validation flows include a first-pass check using rule-based checkers, where static checks are performed to identify specific issues such as pin inconsistency, minimum or maximum capacitance and transition time violations, cell mismatches, and any detected syntax failures. This step is typically used in combination with in-house scripts to perform additional checks, and even more scripts to parse reports from the rule-based checkers.

After the static checks are complete, library teams go through a time-consuming task of tracing any problem to the source, comparing reference library data, and finding other data points with the same issue. In most cases, finding out why an error occurred is a harder problem than identifying the error itself, due to the myriad of potential causes. While this traditional approach has worked out of necessity so far, there are two main problems with it: missing critical library issues and adding weeks to months of delay to production schedules.

Static rule-based checks will miss critical issues in Liberty files
Static rule-based checks are only as good as the explicitly-coded rules they are programmed to check. Identifying and fixing library issues using this method is the EDA equivalent of trying to find a cure for all strains of the common cold. It is an uphill battle because there are so many combinations to consider, and each new process node brings new considerations to check. As soon as a batch of static rule-based checks are implemented, more requirements will emerge due to the endless mutation of factors that must be considered.

With rule-based checks, if the specific rule is not programmed into the rule checker, any violation that is not covered by the rule will not be checked. An outlier within a propagation delay table or a propagation delay outlier across a voltage sweep are examples of errors that static, rule-based checks cannot find. In addition, new potential issues become relevant with each new process node, caused by shrinking process geometries or unique properties of processes such as FD-SOI. Adding statistical information with Liberty Variety Format (LVF) also exacerbates the problem by extending the domain of the Liberty format.

Library teams might attempt to compensate for these challenges by constantly adding and fine-tuning rules, which will eventually be able to catch some of the non-categorized errors. However, not only is this inefficient due to new factors that are found with each process node or variant, but it also forces the library team to constantly play catch up with new problems that emerge after verification, leading to more schedule delays and re-characterization.

Manual debugging adds weeks to production schedules
After a potential issue has been identified, understanding the scope of the problem remains tricky. Library errors seldom happen as one-off problems. Issues usually surface in clusters across certain voltage, temperature, process, extraction, arcs, and slew-load table regions. Therefore, after an error is discovered, the library team must also analyze why the error happened in the first place. For example, they must determine whether it was due to circuit layout issues, characterization configuration, simulation settings, or other factors, then determine what other characterized results are affected by this issue.

The task of understanding the problem scope itself takes considerable data mining effort and skill to accomplish. Re-running the problematic characterization point at the SPICE level is required, followed by analyzing the root cause of failure. After the root cause is identified, perhaps due to lowered SPICE accuracy settings to reduce runtime, the library team must determine all instances of the issue, and identify a fix that can be systematically applied to the characterization flow without breaking other results.

Fixing all characterization results impacted by a particular issue (which may run up to hundreds or thousands of data points) within the time pressures of a production schedule is a daunting task that often results in schedule delays. Using current generation library verification methods, validation, and debugging of Liberty files often takes longer than the characterization itself.

The need for a new approach
As mask and wafer costs rise, wasteful over-margining and re-spins are increasingly expensive mistakes for a product team. In terms of production and schedule, the cost of problems in Liberty models can potentially be massive. Library release and rollout is a key component of the chip development critical path, so every additional week spent verifying, debugging, and fixing library issues translates to schedule delay for the final full-chip product.

Instead of playing catch up by tweaking static library checking rules in an attempt to include all possible scenarios for failure (which is impossible), what is needed is a data-driven approach where the tool “knows” the correct answer to all characterized data points after analyzing the library in its entirety. The tool must be able to detect errors accordingly and enable the user to debug and fix all related errors in the most efficient way possible.

Next-generation Liberty verification and debugging
The use of Machine Learning in EDA is gaining traction today as we identify problems suitable to be solved using this approach. The Solido team at Mentor has identified Liberty verification and debugging as one such problem, and they have developed the next-generation solution for this challenge: MLChar Analytics.

MLChar Analytics, part of the Solido Machine Learning Characterization Suite, is an all-new technology built from the ground-up for identifying and understanding characterization problems. Instead of relying solely on static, rule-based checks for Liberty verification, MLChar Analytics takes a new approach by using machine learning techniques to identify all potential errors, and pairs that with information visualization methods that provide the user with a fast and powerful dashboard to streamline all library debugging tasks.

Machine learning-driven outlier analysis
The main idea behind machine learning-driven outlier analysis is to create a machine learning model of the entire library’s characterized results. This model is built across input variables such as process, temperature, voltage, extraction condition, pin input slew and output load, and other factors that affect the behavior of library cells. The outputs are the library characterization measurements, which include: propagation delays, transitions, constraints, CCS waveforms, LVF, statistical moments, power, and noise.

By modelling the behavior of each output, MLChar Analytics can find outlier data points that do not fit the trend and it can find regions where trends are noisy. This is done by comparing the characterized data directly with the predicted output, and flagging areas where there is disagreement between the two (Figure 1).


Figure 1: An example chart from MLChar Analytics showing a group of outliers identified by comparing machine learning model results to actual characterized data.

After all of the outliers have been identified, MLChar Analytics sorts by outlier severity (those having the largest differences). This allows the user to filter by criteria, narrowing down the scope of the data to investigate (Figure 2).


Figure 2: A summary of outliers sorted by severity, produced by MLChar Analytics.

By sorting a global list of all possible outliers, and by filtering the sort by input variable (for example PVT condition, cell type, or channel length), the user can quickly see outlier trends across any combination of input variables. This helps the user to quickly determine whether this category of outliers is confined to a certain voltage, temperature, cell type, or another type of input factor, making it easier to determine how to fix this issue.

Information visualization-powered debugging
The example above highlights the importance of incorporating information visualization (InfoVis) methods into a Liberty verification and debug solution. Historically, InfoVis methods in EDA tools have been more of an afterthought, with the justification that if information is present, that is sufficient because the value of the tools is in the underlying engines. While there is some validity to this approach for tools running primarily in batch mode, it is certainly not the case for verification and debugging tools.

The success of a verification and debugging tool is driven heavily by its ability to help the user understand at a glance the information that is provided. MLChar Analytics was built with information visualization in mind. Dashboard views are organized in a way that not only displays all relevant information needed to debug an issue in an intuitive manner, but also allows the user to access all related information with a single click on a point of interest from any of the windows in the dashboard (Figure 3).


Figure 3: The MLChar Analytics Dashboard showing information visualization methods to accelerate debugging of Liberty files.

In Figure 3, the user is debugging CCS versus NLDM output inconsistencies in the library, which is one of the major culprits behind timing differences between full-chip implementation and signoff STA results. In this case, the user is not only able to see how CCS versus NLDM values correlate (or not correlate) with each other within the entire library, but the user can also see a sorted summary starting with the most severe cases. By clicking on any point on the correlation chart or the summary list, the tool brings up CCS and NLDM waveforms, as well as the cell and pin information related to that point of interest.

Liberty files are huge and take a long time to debug due to the sheer amount of data that is present in text and table formats. Information visualization methods in MLChar Analytics eliminate the bottleneck of tracing through large files using text editors or specialized scripts. It also presents data in a way that makes it extremely easy to navigate from one point in the debug space to another. For example, the user can quickly navigate from PVT information, to cell and pin properties, and to the characterized Liberty data tables for those pins.

The next-generation verification and debug solution is here
Liberty files are crucial to ensuring correct design functionality and they need to be verified and debugged prior to release to product teams. Errors in Liberty files directly result in costly consequences such as chip re-spins, schedule delays, and over-margining. Traditional commercial Liberty verification tools based on static rule-based checks only validate what they are explicitly programmed to check, and are inadequate to identify all potential issues in a library. With the increase in the types of library problems due to new and smaller processes, adding or modifying static checks to cover the ever-growing list of potential issues is not a scalable approach. This also leads to a costly delay between the emergence of new types of issues and the ability to detect them.

In addition, due to the historical lack of effective commercial library debug tools, library teams often develop their own scripts and internal tools to make sense of library verification reports, and often have to manually hack their Liberty files to fix any issues found. Library releases are often in the critical path of the product delivery schedule, so this significantly prolongs the library verification and debug cycle, leading to delays in the product design schedule.

MLChar Analytics uses a machine-learning approach to identify potential issues in Liberty files. This forms the basis to MLChar Analytics’ data-driven approach to identifying Liberty issues and enables the tool to detect whole categories of errors and outliers that traditional, rule-based checks are not able to find. The tool incorporates powerful information visualization methods to speed up the process of debugging and fixing library issues through a fully-connected database that allows users to get a global view of all issues in the library, while quickly locating the root cause of issues and any other surrounding problems due to the root cause.

Library teams can drastically speed up the Liberty validation cycle by 10X or more, allowing the task of verifying and debugging libraries to be completed within hours, instead of the weeks or months required using first-generation library verification tools.



1 comments

Sumit Vishwakarma says:

Great article.
Crux is : Instead of playing catch up by tweaking static library checking rules in an attempt to include all possible scenarios for failure (which is impossible), what is needed is a data-driven approach where the tool “knows” the correct answer to all characterized data points after analyzing the library in its entirety.

Leave a Reply


(Note: This name will be displayed publicly)