(Artificially) Intelligent Verification

While it seems like an obvious application, utilizing data for verification is highly nuanced and harder than it looks.

popularity

Functional verification produces a lot of data, , but does that make it suitable for Artificial Intelligence (AI) or Machine Learning (ML)? Experts weigh in about where and how AI can help and what the industry could do to improve the benefits.

“It’s not necessarily the quantity,” says Harry Foster, chief scientist for verification at Mentor, a Siemens Business. “It’s the quality that matters. How unique is that data in exploring the state space? We claim to have a lot of data, but we don’t have that much compared to many other industries. We need solutions that are basically abstracting the data or which allow us to do data mining analysis with small data sets.”

Acquiring sufficient amounts of good data is only part of the challenge, though. “It’s important to look at what questions you want to ask about the data,” says Raik Brinkmann, president and CEO of OneSpin Solutions. “You can run your simulation or emulation and you get these traces, but what’s there to be asked? What’s the thing that you want to know about it?”

Adding to the complexity, verification produces many types of data. “Verification data sources are almost inexhaustible,” says Darko Tomusilovic, verification director for Vtool. “Data comes in many different forms including log files, waveforms, coverage data, and the code base itself. This data must be understood as a whole. We need to find the minimum amount of data that is useful for a machine learning (ML) tool. We must not overload the situation with a lot of garbage data which will make it inefficient.”


Fig 1: Analyzing data to get to find root cause. Source: VTool

In fact, effectively leveraging data requires a complete verification environment. “AI requires both results data and, with high granularity, the input data used to produce that results data,” says Daniel Schostak, architect and fellow for the central engineering group at Arm. “In general, the first is readily available, but the second may require a fair bit of work to extract. Unlike some cases, there is no standard set of data that can be used for training. Changes to the DUT or the verification environment over time will invalidate data sets that have been used previously. However, none of these difficulties are insurmountable.”

Data can be used in different ways, too. “We spend a lot of time extracting project data,” says Colin McKellar, vice president of verification platforms for Imagination Technologies. “How long does it take each task? How long do we spend on this testbench, how long on that testbench? How many problems did we uncover? What was the best way to find bugs? The amount of data and the number of tests is getting so mind blowing that anything that can consume some of that data and find the needles in that haystack would be advantageous.”

Projects have been using analytics for many years. “There is a fair chance that you can extrapolate something about the project sizes, about how many hours will you need to find a bug or look at trends and closure of coverage,” says OneSpin’s Brinkmann. “You can learn about that for your organization. But it will be different from design to design, and from project to project, and from company to company.”

Ultimately, verification is trying to assess if the design works. “You need to piece the data together in an intelligent way, and be able to identify to an engineer, possibly from hints in a log file, or from this coverage point, that something could be an issue,” says Farzeen Nathoo, platform engineer for Metrics Technologies. “As you keep mining that data, your AI gets better and starts learning these patterns. So there is room for ML and AI.”

And companies are reporting initial success. “We monitor code commits in order to understand what kind of changes might cause disturbance in the design,” says VTool’s Tomusilovic. “By tracking the code base, we can try to pinpoint changes that might help us understand which of the modified pieces of code may have broken the design.”

Removing unnecessary data
If simulation and emulation create large amounts of useless data, it might be fair to assume that reducing the amount of irrelevant execution would be advantageous. Perhaps a fair question would be, ‘What is the minimum number of tests I need to run to close my coverage?’

“That technology is kind of there already with smart coverage analysis,” says Metrics’ Nathoo. “You run some regressions and find that certain tests don’t contribute to the coverage that you need, so you take that seed out and say that is a useless test.”

Care needs to be taken. “There have been studies that failed miserably,” says Mentor’s Foster. “IBM has been attempting to use ML for coverage closure for over 20 years. Coverage closure is not the easiest of problems, and it’s going to take a lot of work to really get that done.”

It’s also potentially risky. “This is a dangerous thing to do,” warns Brinkmann. “We need to put some feedback into this and ask what it means in a negative sense? If I do this and then make a change in the design, is there something I am now missing? You’re not covering as much anymore. If you only think about what coverage you need for this specific case, you can obviously optimize for that.”

Optimization relies on the usefulness of the metric. “We need to be looking at the effectiveness of the tests and the effectiveness of the coverage,” says Imaginations, McKellar. “It would be interesting to bring together several companies, a few projects, and analyze the lack of effectiveness of tests. Where are you spending lots of time and gaining little traction?”

And efficiency needs a qualifier. “It is difficult to understand whether regression is efficient enough, or we simply have too many scenarios targeting one feature,” says Tomusilovic. “Today, we only look at how many times each coverage target has been hit. We do not explore, or fully understand, if regression is inefficient and ML can probably help with that.”

Defining coverage
Functional coverage is not an absolute metric, and that creates problems. “If you have the right data, you can apply AI and be successful,” says Nathoo. “Unfortunately, when someone defines functional coverage, it may not have been written well and doesn’t do a complete job. You are limited by what somebody wrote. If there’s nothing to check the code against, other than functional coverage, then you have a problem.”

And defining coverage is one of the toughest problems. “If people with massive intelligence cannot work out when they are done, or what they need to cover next, why would a relatively immature AI framework be able to do that?” asks McKellar. “The market needs to start thinking beyond coverage as the end goal. Coverage is one piece of information. Conformance, compliance and stress testing environments around real applications and representative environments would potentially be a better way of measuring overall quality.”

Another problem with coverage is the lack of standardization. “It’s already possible to capture a lot of information that is useful for AI with coverage and the process of extracting it so that it can be used with AI is relatively straightforward,” says Arm’s Schostak. “However, the extraction process could be more standardized across different tools if the promise of initiatives like UCIS was fully realized. That would help streamline the process.”

It comes back to asking the right questions. “Coverage closure is too vague,” says Tomusilovic. “To effectively use AI, we need to look at the data we have, structure it, label it in a way that is useful and then we need to choose the algorithm from the palette of algorithms that exist in the market. Finally, we can train and test the data. We are still somewhat far from the real solution.”

Cheap compute and storage
AI has taken off because of changes in the data centers. “What is really driving these discussions today is the fact that we can handle big data,” says Foster. “Storage has become cheap, compute has increased, and then we’re seeing the emergence of open-source solutions. It’s that kind of convergence that’s making it possible now.”

It also enables us to generate more data. “We do have a lot of computing power, but we do not have the luxury to draw in every possible method and every possible signal that we want,” says Uri Feigin, product lead for VTool. “If you do, you will find out that computing power is still not enough. First, we have to reduce the amount of garbage data that is being created. The hardest question that data scientists are dealing with today is how to clean data. Too much misleading information will not help. It will just lead to wrong answers. So while you have more toys to play with than before, you still need to play with them wisely.”

But does the increase in compute power make us less efficient? “The increase in compute power does but not necessarily mean that it is better or more efficient,” says McKellar. “It’s easier for people to run more tests today than it was a few years ago. It’s easier for people to add lots of features and seek high-quality coverage than it was a few years ago because of improvements in the tools. But that has not made the process any more efficient. It has not made the overall quality any more bang for the buck per unit of time.”

Any overhead from using AI has to result in an improvement. “Having more compute power available definitely expands what is possible,” says Schostak. “However, the use of AI only makes sense, in terms of making verification more efficient, if the cost of running the optimized verification is still lower than that of running the unoptimized verification — after factoring in the cost of optimizing the verification with AI.”

Fixing the design
The goal of verification is not to complete coverage, but to ensure the design works in the field. Along the way, bugs are identified, analyzed, and fixed. That is a time-consuming process. “There are aspects of the flow, such as root-cause analysis to help debug failures, where it would be useful if AI could help,” says Schostak. “This problem is complex to solve.”

Several companies have attempted to perform root-cause analysis. “From a formal perspective, it is not easy to identify the essence of a set of failures,” says Brinkmann. “But maybe, with all the modern technology at our disposal, we could look at it in a different way. By putting more compute power at the problem could we save human power? You want to save on human intelligence and time, and make them more effective, but you have to invest in the infrastructure.”

Progress is being made. “If a test is built with X transactions and some of them are passing while others are failing, we can use AI to tell us what is unique to all failing transactions,” says VTool’s Feigin. “This is just one of the examples of how AI can be used in debugging.”

Such an approach can look at what has changed to cause those differences. “Not only what’s been changed, but what is the impact on other related components,” says Foster. “It’s a non-trivial problem, because you could have things that are essentially distributed state machines. I make this change here, and I don’t see this connection, but it is kind of somehow connected.”

It could be viewed as a form of differential analysis. “Two designs, where you essentially think that the features are supposed to be the same,” says Nathoo. “But the design has changed, and you’ve added new parts or modified code. Being able to differentiate those two designs could be very helpful for test engineers.”

That analysis could lead to more efficient regressions. “I tried to do this back in the ’90s using BDDs, but it ran into many issues,” says Foster. “We realized very early on that if we could identify its impact, then we could order the regression suite so that we can test the stuff quicker. And that was an overall objective.”

It also can help to explore around a problem. “Consider taking a design and a few assertions,” says Brinkmann. “Then let a tool figure out more traces that actually violate the properties. If you do that in iterations, you may be able to focus toward the essence of why it’s failing. That’s looking like a pretty promising approach.”

Who builds the model?
AI applications depend on a trained model. “The main process in machine learning is to take data, train a model, and then you sell the model,” says Feigin. “You then use the model to input more data and see the results. The big challenge is the creation of a useful model. You cannot really take a verification engineer, or design engineer. You need a data scientist for this task. This is one of the challenges that we tackle in-house. And we hope we can make the technology accessible, because building a machine learning model requires a different modeling skillset.”

The problem is that model is not necessarily portable over a large enough set of customers. “Users are likely to have more success where they have the knowledge of what is important,” says Schostak. “For example, users are more likely to know what should be used with AI to guide a testbench to focus on an area of interest. However, when it comes to selecting between the algorithms used for constraint solving, or finding proofs, EDA companies are more likely to know, and have access to, the information AI would need to help with this.”

This isn’t so simple. “When people talk about AI and ML, they envision a push button tool in which they input a bunch of data, click a button, and they expect to get a solution to their problem,” says Tomusilovic. “This is not the case in verification. Human intelligence is mandatory, and I believe it can’t be replaced. You cannot just ask for the solution to the problem. But you must be very specific. You must provide meaningful inputs to the tool, and even more important, you must be aware of what you expect from the tool. You must ask the right questions.”

We continue looking for ways to improve. “I want to put the focus on looking at what we can do with the data,” says McKellar. “It is wrong to push the focus to AI rather than data analytics. A lot of people are still scratching their heads about the data that is generated and try to work out what they need to do next. Or what’s the highest priority thing I need to do based upon having so much data?”

Hype can drive people in the wrong direction. “Given the buzz around AI at the moment, the term is used rather broadly,” says Schostak. “It is important to not restrict ourselves to what can be strictly described as AI when deciding what algorithms and techniques to use with the data. AI can be used to provide more actionable information from data, but it is unlikely to be able to replace human judgement completely at this point in time.”

Others agree. “As a tool, AI is useful for some things and not relevant to others,” says Feigin. “We are wrong to give too much emphasis on machine learning. On the other hand, it would be wrong to say that we should do analytics and not machine learning. We should just understand the capabilities of each technique and use them where best. Sometimes, visualization can provide everything you need to answer a question. Sometimes it can be an engineer sitting next door, and sometimes it can be machine learning.”

Fundamentally, though, all of this raises an important point — do we know the right question? “The learning that people will apply here is that they have some notion of what they want to build, and they have some experience on what it is going to look like,” says Brinkmann. “In order to apply machine learning, you would need to formalize what they actually want. And there’s not too much available on this level as data that you can mine or use.”

So at least for the foreseeable future, engineers still have an important role to play. “Real intelligence is always more important than artificial intelligence,” concludes Foster.



1 comments

FAZANNA HANNI-QURESHI says:

Great insights – until we see robust data model paradigms AI modelling will still have a soft underbelly.

Leave a Reply


(Note: This name will be displayed publicly)