Systems & Design

SPONSOR BLOG

Overcoming Regression Debug Challenges With Machine Learning

Automatically discover the root causes of simulation regression failures.

June 22nd, 2023 - By: Robert Ruiz

Development of a modern semiconductor requires running many electronic design automation (EDA) tools many times over the course of the project. Every stage, from architectural exploration and design to final implementation and manufacturing preparation, has multiple methodology loops that must be repeated again and again.

Even in such a complex development flow, functional simulation stands out. It takes billions of simulation cycles to verify that a chip design is doing everything it’s supposed to do without unintended behavior. This is not a one-time effort. Every time that any part of the design changes, the entire simulation test suite—or at least a very good portion of it—must be rerun. The suite expands throughout the verification and development effort as tests are added to verify new features or increase focus on areas of the design where bugs are being found.

Simulation regressions require a large number of tests run on a regular basis, usually nightly for a sample set and weekly for the full set. Running these tests consumes lots of resources, and creates a bigger challenge whenever tests fail. Engineers make mistakes when adding new features to the design and enhancing the test suite, so the resulting errors must be debugged and resolved.

Furthermore, some previously passing tests fail in updated regression runs. New features often break existing features, and any code edits can have ripple effects. Sometimes every test fails, especially after major changes are made to the verification environment. Debugging these failures is primarily a manual effort, requiring multiple steps:

Check in the latest changes to the design and testbench code
Run the regression simulations
Analyze the log files containing thousands of test failures
Categorize the failures and sort them into “bins” based on the type of error
Triage each bin to determine where the problem most likely occurred
Perform root cause analysis (RCA) to try to pinpoint the actual bug
Change the design or verification code to try to fix the bug
Start the loop all over again!

This process relies heavily on the expertise of the development engineers. Years of experience help them develop a sense of how best to bin the failures, triage the bins, and assign the failures to the correct design and verification engineers for root cause analysis and fixes. However, it’s difficult to find enough experts, so it takes significant project time and resources for this manual approach. Chip development teams have long been clamoring for a better way to manage and debug regression loops.

Recently, artificial intelligence (AI) using machine learning (ML) technology has become available to automatically analyze, bin, triage, probe, and discover the root causes of regression failures. By leveraging the enormous amount of information gleaned from thousands of regression runs on the project, AI acts as a companion to traditional engineering expertise. By automating and accelerating three steps in each loop, ML techniques can provide faster and more accurate debug than manual methods. By helping the engineers find, understand, and fix the bugs much more quickly, ML improves overall debug effort up to 30X.

The Regression Debug Automation (RDA) capabilities in Synopsys Verdi Automated Debug System use such ML techniques to automatically discover the root causes of simulation regression failures. RDA classifies and analyzes raw regression failures and identifies the root causes of failures in the design and testbench. Automating the regression log analysis, binning, triage, and RCA greatly reduces manual effort.

RDA starts by collecting data from the regression run, including simulation log files, value change dump (trace) files, and compiled simulation databases with the design and testbench. It uses ML to mine relationships among the verification log failures and bin the results. This process has been shown to be 90% accurate in determining related results, reducing the overall triage time. After binning, RDA performs failure analysis and triage. It takes the bins of failures and determines whether the issues are from the design or the testbench based on the characteristics of the failures.

RDA uses multiple technologies to find the root causes of failures. For the design, it compares the values of signals from passing and failing tests to isolate failure points that differ near the test errors. Visualization shows the RCA path along with the signal value changes in the design. To root cause testbench failures, the RDA debug facilitator automatically collects debug data for each failure bin. It shows protocol transactions with associated details and uses a reverse debug capability to view the source of the issues back in time.

Synopsys Verdi RDA includes additional capabilities to save the engineers even more time and effort:

Failing tests are automatically rerun in simulation with reverse debug and other debug features enabled
Testbench RCA includes awareness of the widely used Universal Verification Methodology (UVM)
RCA is performed on test failures related to unknown (X) values to reduce the number of groups
Test failures due to simulation X-pessimism are filtered out

All these automated techniques, backed by the power of ML, accelerate the three most challenging steps in regression loops. More accurate debug means that fixes are much more likely to be correct the first time, considerably reducing the number of loops throughout the project. Verdi RDA saves significant time and effort for debugging every failing test while reducing the number of required failing tests for debug. This maximizes regression utilization, focuses manual efforts on high-value debug rather than automatable tasks, and cuts the overall debug regression effort on a chip project in half.

For further information, a white paper is available.

Robert Ruiz

(all posts)
Robert Ruiz is the director of product management at Synopsys, Inc. Ruiz has held various marketing and technical positions for the test automation and functional verification products at Synopsys, Novas Software and Viewlogic Systems. His background includes over 17 years in advanced design‐for‐test methodologies as well as several years as an ASIC designer. Ruiz has a BSEE from Stanford University.

Knowledge Centers
Entities, people and technologies explored

EUV’s Future Looks Even Brighter

Demand for AI chips is growing exponentially, but costs and complexity limit the technology to a handful of companies. That could soon change.

by Gregory Haley

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

Overcoming Regression Debug Challenges With Machine Learning

Robert Ruiz

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

EUV’s Future Looks Even Brighter

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

Linear Pluggable Optics Save Energy In Data Centers

Interconnects Approach Tipping Point

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Recent Comments

About

Navigation

Connect With Us

Overcoming Regression Debug Challenges With Machine Learning

Robert Ruiz

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

EUV’s Future Looks Even Brighter

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

Linear Pluggable Optics Save Energy In Data Centers

Interconnects Approach Tipping Point

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored