Systems & Design

SPONSOR BLOG

Measuring The Complexity Of Processor Bugs To Improve Testbench Quality

Determining the ability of verification methodologies to find the last bugs.

April 28th, 2022 - By: Philippe Luc

I am often asked the question “When is the processor verification done?” or in other words “how do I measure the efficiency of my testbench and how can I be confident in the quality of the verification?” There is no easy answer. There are several common indicators used in the industry such as coverage and bug curve. While they are absolutely necessary, these are not enough to reach the highest possible quality. Indeed, such indicators do not really unveil the ability of verification methodologies to find the last bugs. With experience, I learned that measuring the complexity of processor bugs is an excellent indicator to use throughout the development of the project.

What defines the complexity of a processor bug and how to measure it?

Experience taught me that we can define the complexity of a bug by counting the number of independent events or conditions that are required to hit the bug.

What do we consider an event?

Let’s take a simple example. A typical bug is found in the caches, when a required hazard is missing. Data corruption can occur when:

A cache line at address @A is Valid and Dirty in the cache.
A load at address @B causes an eviction of line @A.
Another load at address @A starts.
The external write bus is slower than the read, so the load @A completes before the end of the eviction.

External memory returns the previous data because the most recent data from the eviction got lost, causing data corruption.

In this example, 4 events – or conditions – are required to hit the bug. These 4 events give the bug a score of 4, or in other words a complexity of 4.

Classifying processor bugs

To measure the complexity of a bug, we can come up with a classification that will be used by the entire processor verification team. In a previous blog post, we discussed 4 types of bugs and explained how we use these categories to improve the quality of our testbench and verification. Let’s go one step further and combine this method with bug complexity.

An easy bug can require between 1 and 3 events to be triggered. The first simple test fails. A corner case is going to need 4 or more events.

Going back to our example above, we have a bug with a score of 4. If one of the four conditions is not present, then the bug is not hit.

A constrained random testbench will need several features to be able to hit the example above. The sequence of addresses should be smart enough to reuse previous addresses from previous requests, delays on external buses should be sufficiently atypical to have fast Reads and slow-enough Writes.

A hidden case will need even more events. Perhaps a more subtle bug has the same conditions as our example, but it only happens when an ECC error is discovered on the cache, at the exact same time as an interrupt happens, and only when the core finishes an FPU operation that results in a divide-by-zero error. With typical random testbenches, the probability to have all these conditions together is extremely low, making it a “hidden” bug.

Making these hidden bugs more reachable in the testbench is improving the quality of verification. It consists in making hidden cases become corner cases.

This classification does not have any limit. Experience has shown me that a testbench capable of finding bugs with a score of 8 or 9 is a strong simulation testbench and is key to delivering quality RTL. From what I have seen, today the most advanced simulation testbenches can find bugs with a complexity level up to 10. Fortunately, the use of formal verification makes it much easier to find bugs that have an even higher complexity, paving the way to even better design, and giving clues about what to improve in simulation.

Using bug complexity to improve the quality of a verification testbench

This classification and methodology is useful only if it is used from the moment verification starts and throughout the project development, for 2 reasons:

Bugs must be fixed as they are discovered. Leaving a level 2 or 3 bug unfixed means that a lot of failures happen when launching large soak testing. Statistically, a similar bug (from the same squadron) that requires more events could be unnoticed.
Bug complexity is used to improve and measure the quality of a testbench. As the level of complexity matches with the number of events required to trigger the bug, the higher the complexity score the more stressing the testbench is. Keeping track and analyzing the events that triggered a bug is very useful to understand how to tune random constraints or to create a new functional coverage point.

Finally, by combining this approach with our methodology that consists of hunting bugs flying in squadrons, we ensure high-level quality verification that helps us be confident that we are going beyond verification sign-off criteria.

Philippe Luc

(all posts)
Philippe Luc is director of verification at Codasip.

Knowledge Centers
Entities, people and technologies explored

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

Measuring The Complexity Of Processor Bugs To Improve Testbench Quality

What defines the complexity of a processor bug and how to measure it?

What do we consider an event?

Classifying processor bugs

Using bug complexity to improve the quality of a verification testbench

Philippe Luc

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

Big Changes Ahead For Interposers And Substrates

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Recent Comments

About

Navigation

Connect With Us

Measuring The Complexity Of Processor Bugs To Improve Testbench Quality

What defines the complexity of a processor bug and how to measure it?

What do we consider an event?

Classifying processor bugs

Using bug complexity to improve the quality of a verification testbench

Philippe Luc

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

Big Changes Ahead For Interposers And Substrates

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored