Can ML Help Verification? Maybe

A lot of engineers are trying to use machine learning to improve the functional verification process, but the industry is struggling to make headway.


Functional verification produces an enormous amount of data that could be used to train a machine learning system, but it’s not always clear which data is useful or whether it can help.

The challenge with ML is understanding when and where to use it, and how to integrate it with other tools and approaches. With a big enough hammer, it is tempting to call everything a nail, and just throwing ML at a problem doesn’t necessarily solve it.

“I’ve seen a lot of examples of people trying to use AI for problems that it is not well suited for,” said Clark Barrett, professor of computer science at Stanford University. “Even the government is guilty of this, saying we are going to give you money to solve this problem, but only if you use AI. There are things AI is really good at, but if you need to do precise reasoning, if you need to find things that have never been found before — like bugs that nobody’s seen before — then maybe AI is just not the right tool for the job.”

Machine learning is like a rear-view mirror in that it can use patterns found in the past and apply them to the current problem. It is much less effective in predicting something it has never seen before. So how could it be used to help with the verification problem?

“Verification is trying to fight machine learning,” said Avi Ziv, research staff member at IBM Research. “Verification is about trying to create chaos, to create a lot of randomness, and this makes it harder to learn. This is one of the main reasons that you don’t see a lot of ML contributing to the main methodologies. If we want to make it happen, then we may need some new machine learning technologies, maybe changes in verification methodologies, how and what data we produce, and how we feed into the machine learning.”

Still, that doesn’t mean we should give up. “I am the founder of a startup with an audacious goal of reducing time to market and speeding up verification by 50%,” said Sandeep Srinivasan, CEO VerifAI. “If we look at it as bringing a beginner’s mindset to an old problem, then sometimes when you do that there is dramatic innovation. We have seen this happen in several domains of machine learning.”

Several problems potentially need to be solved, or aspects of verification need to be improved, but to Erik Berg, principal for SoC verification at Microsoft, one stands out from the rest. “I’ve been focused on the debug problem for several years and I want to hear about whether or not it’s something we should try and solve, or whether we could solve it. Whoever solves the debugger automation problem — that’s kind of the big prize for us in verification.”

If ML only can use the past to predict the future, it would make sense to concentrate on data that has inertia. “The area in which we see significant uptake is knob settings,” says VerifAI’s Srinivasan. “This is not creating randomness, but it’s figuring out the saliency of inputs to optimize a particular output. Using reinforcement learning you take away a little bit of randomness. From similar input settings, you can optimize objectives like coverage and bug finding exploration. Customers say it results in reduction of regression time, simulation time, and increase in coverage.”

Still, this is a non-trivial challenge. “Reinforcement learning is a good way to do machine learning for verification,” says IBM’s Ziv. “But it’s not easy, because what we are chasing is an unknown target. That means it is very hard to define the cost function, or whatever function we are trying to optimize. One area of success is with formal verification, where you have a bunch of engines available to you. ML has been successful in deciding which engine to try first in order to try to prove or disprove a property.”

The long tail of verification
While this may be helpful, it is not really attacking the primary problem. “It may be that you can reduce some of this time, which is great,” says Stanford’s Barrett. “But isn’t the bottleneck really the long tail of verification? The long tail is where you’re looking for bugs that are extremely rare, very difficult to trigger. I’m skeptical that a technique like machine learning is going to find those.”

It is tempting to focus on the easy parts. “I agree that the long tail is the interesting part,” says Microsoft’s Berg. “There’s a certain portion of coverage that is super easy to hit, and I don’t actually care that much about it. It’s that long tail, the conditions that didn’t make it to the test plan, didn’t anticipate, didn’t think about, and we do not have appropriate ways of identifying and measuring low frequency events that you couldn’t really anticipate. The trick is figuring out what the proxy is for quantifying those low frequency events.”

Many of the optimization methodologies in use today are based on statistics. “Statistics are good at looking at what your regression is doing,” says Ziv. “Then you can identify efficient tests, utilize them more and reduce the effort, the number of machines, and improve the efficiency of verification. These things are not necessarily machine learning.”

It is important to understand the capabilities and limitations of the techniques being used. “Automated reasoning is a technique where you are trying to prove theorems,” says Barrett. “You’re doing a very precise mathematical analysis. This is what’s underlying all formal engines and hardware verification. The great thing about automated reasoning is you get a yes or no answer. There is a bug, or there’s not a bug. There are certain kinds of problems where that’s really the kind of answer that you want. In many parts of verification, especially on the long tail, that would be the ideal. Now, there are other problems where fuzzy reasoning is good enough, and that can help solve more problems. In a previous project the fuzzer was finding more bugs than the developers had time to fix. They didn’t need the formal hammer. Do we really need the ML hammer? Or should we just use fuzzers to find most of the easy bugs, and then if you really care about the hard bugs, use some kind of formal?”

It is more about creating the right tools for the toolbox. “We need techniques for advanced analysis, whether it’s machine learning, statistics, reasoning, and things that help verification,” says Ziv. “Humans don’t have enough time, enough brain power, to do all the work that is needed for modern designs.”

Choosing a tool itself can be confusing. “You’re probably never going to get a fully automatic formal solution,” says Barrett. “You are probably never going to get a fully automatic ML solution. But if you can somehow bring all these tools together in a smart way, maybe that’s the role for ML — helping the human coordinate these different pieces and knowing when to use the right tool for the right job.”

Still, the amount of data and complexity continues to increase. “It is very hard for a human to maximize cumulative reward when there’s a lot of data,” says Srinivasan. “It means making non-obvious short-term decisions to achieve a long-term goal. Reinforcement learning is getting there. It can help us to optimize long-term decisions. Not that a human cannot do it, but it is just too much data.”

Some researchers have started looking into neuro-symbolic computation. “This is using machine learning to give you a shape, template, or something that gets you part of the way there,” says Barrett. “Then you want to use precise analysis to do the actual verification or checking. For example, if you’re looking for an invariant, you might use machine learning to give you a template for the invariant. Then you would actually check that with a formal tool. There are some really interesting things that can be done there.”

Is it possible to learn from previous designs? “I think you can,” says Ziv. “By analyzing the differences between specifications in a natural language, or the design itself, you can identify what’s changed and identify, to some extent, at least the semantic differences between the designs. That can help you create properties for formal, or tune your machine learning base or other test benches to do whatever they need.”

But there is not universal agreement about that. “I don’t think you can quite do that, because a lot of the properties are design-specific, and only the designer knows what they are,” says Barrett. “It requires that it should behave similarly when given similar stimulus, even if the microarchitecture is different. That may help us get some of the way there.”

A focus on debug
The piece of verification that is the most unpredictable and often takes the most time is debug.

“We pay the smartest human beings, the engineers, to look through log files,” says Srinivasan. “With 40,000 failures a week, using natural language processing on unstructured log data helps to speed up the debug process and make decisions. It’s not one silver bullet, but if you look at it through this lens of different categories, there’s a huge opportunity to save in each one.”

Others report similar findings. “In Microsoft, roughly 50% of our headcount goes toward debug,” says Berg. “Part of the problem is that the vast majority of DV engineers don’t understand how they debug. Has anybody ever found debug manuals to be tremendously useful? There is just a lot of hand waving. When you get a strong sense for how people are debugging, and about the data that’s relevant to debug, then go through that feature engineering process — then perhaps you can automate it. The debug problem is too highly dimensional unless you do a really smart approach toward engineering your features. You are never going to be able to find the patterns that you need, especially in the failures are very infrequent.”

Is there a middle ground? “The art of debug is very hard,” says Ziv. “But machine learning has already failed in something that a lot of people consider much easier, and this is triage. How can you take the thousands of failures that you have every night and bucket them into things that have a common root cause? Identify things that are already known and move them aside, and then assign the new failures to the right person. Machine learning wasn’t able to provide solutions that are good, that people are willing to trust. Before we get to debug, we need to solve these kinds of problems.”

Srinivasan says this can be split into two tasks. “One is triage and the other is clustering. Clustering is when you have 10,000 failures and you place them it into 10 buckets. And then triage is taking your bugs that you classify as a DV bug or RTL, creating a machine learning model, and then predicting new bugs that occur into one of these categories. I would say that this has worked really well. I don’t know why you would say that it doesn’t work. Maybe in some specific cases, maybe your model didn’t have the right features. Or maybe your clustering has very high dimensions.”

The way forward
Perhaps, if machine learning is having difficulty with verification, the industry should consider how we do verification and see if that can be changed. “Maybe we need to write our test benches a little bit differently to make it easier for machine learning to interact with them,” says Ziv. “Maybe we should change the way we get data out of the design, such that it will be easier for machine learning tools to use. Trace should not be a natural language. Trace data should be as structured as possible, because it’s much easier to understand structured data. And since it is produced by machines, and going to be consumed by a machine for the machine learning, that makes it easier for both sides to work. There are things the verification community and EDA vendors can do to help machine learning get there. But is it going to be enough? I’m not sure yet. I believe that at some point we’ll find the right combination of how to use it, and which machine learning techniques to use in order to make them really efficient.”

Does the industry have the right models? “We’re not actually trying to develop new models. We’re just trying to figure out how to leverage existing solutions,” says Berg. “If you ask the ML folks to do a classifier for dogs and cats, they can look at the results of their work. It’s very difficult for them to do the work with our data sets, absent our input, to help them judge their results. Most verification folks can’t really describe their process in any kind of algorithmic way. We need to understand how we are debugging, to understanding the data that’s relevant, understand the features that we need to be able to be cleaner about the data that we’re giving to the ML folks.”

Progress may be possible if the two groups come closer together. “I’d like to see the DV engineer and the domain expert learning more about machine learning, and machine learning engineers coming closer to learning more about DV,” says Srinivasan.

Berg agrees. “Verification engineers need to become much more data-aware to be able to think about how the collateral we’re generating can map to, or be used by, machine learning models. We don’t need verification engineers to be ML folks, but we need them to be ML-fluent. We need to be much more aware of the data that we’re creating, the formats we’re creating it in, and how we’re exposing parts of the design as we run simulations.”

And finally, this may all need to be looked at from a fresh perspective. “Sometimes you have people walking around with a hammer looking for a nail,” says Barrett. “They’re not actually trying to solve a real problem. They are just looking for anything that looks like a problem that their hammer will hit. And that’s certainly been a problem with AI and other techniques. I always tell my students, first find an interesting problem and be open minded about what technique you’re going to use to solve it. Maybe ML plays a role, maybe it doesn’t.”

Leave a Reply

(Note: This name will be displayed publicly)