Using Verification Data More Effectively

There are more tips and tricks than ever to get the most from verification.


Verification is producing so much data from complex designs that engineering teams need to decide what to keep, how long to keep it, and what they can learn from that data for future projects.

Files range from hundreds of megabytes to hundreds of gigabytes, depending on the type of verification task, but the real value may not be obvious unless AI/machine learning algorithms are applied to all of that data. But increasingly, verification teams are looking for ways to ensure better coverage and faster time to results, and much of that can be found in data that often was discarded in the past.

A number of data types are used in verification:

  • Input data, which is the design source code;
  • Testbench source code, scripting, and files, which are required either for generating or creating the tests;
  • Verification output results, which tend to be the biggest data sets;
  • Simulation logs, which include such things as failure tracing for tracking failures for checkers in the test benches.

Each regression can generate huge files, but keeping everything comes with a price tag so it’s important to understand the value of the data.

“There is debug data for waveforms, and debug tools,” said Mike Stellfox, a Cadence fellow. “These can be in the tens to hundreds of gigabytes depending on how much data you’re dumping. There’s also coverage data — that’s the other big one — which is used to track the progress toward the verification objectives.”

Exactly how big those files are depends on the size and complexity of the project. For some time, engineering teams have run simulation, especially with constrained random stimulus, whether UVM or Portable Stimulus-based, and this has allowed automation to create lots and lots of tests.

“Typical regression runs will have thousands, or tens of thousands of tests, with the first order of business being to manage that data effectively so that the engineering team can determine where to focus,” Stellfox said. “This includes how to optimize the coverage collection, how to optimize and merge the data, connecting it back to the verification objectives/verification plan so that you can immediately see that 1,000 tests were run, all the data was merged, the holes can be seen, and where to focus. You can also see if you have bugs. Then, if there are lots of failures from the checkers, failure triage comes into play. How do you take all of the errors that you get out of a simulation, cluster those in some meaningful fashion, and then distribute them out to the engineering team so they can be root-caused and then debugged?”

Data management
Just being able to manage all of that data is a challenge. “A lot of focus has been on taking the data for those specific tasks and trying to present it to the user in an optimal and efficient way for the engineering team to actually do effective work on top of it,” he said. “It’s also agreed in the industry that we’re really just at the tip of the iceberg in terms of leveraging the data compared with other industries, such as data mining and really discovering what more could we do — how to be smarter with the data by taking the disparate forms of the data and coming up with new automation leveraging machine learning — techniques like that.”

Accomplishing those goals requires getting a handle on data, and in some cases reducing the amount at the source.

“The data needs to be compressed as much as possible, giving the user the option to allow the data to be saved on their server,” said Vladislav Palfy, director of applications engineering at OneSpin Solutions. “Formal tools can provide advanced in-code waivers, as well as the possibility to have complex waiver definitions. This allows the user to reduce the amount of data generated by the tool. Having focused info on real design issues can be achieved through formal techniques and can help improve design quality early in the process. Pulling the verification curve further to the left ultimately cuts verification time significantly.”

In cases where the volume cannot be reduced enough, different approaches are required. “If we are working in the old way of transferring these back and forth, it is a problem,” said Neil Hand, director of marketing for IC verification solutions at Mentor, a Siemens Business. “The first tier way [the industry] has addressed this is to take a debug file, for instance, which can be in multiple gigabytes, and perform reconstruction. If you’re dumping that much data, the I/O bandwidth on the machine becomes the bottleneck, which makes simulation seem slow. But it’s because you’re dumping so much data. You can reconstruct it later if you know all the inputs and outputs. When you have the key inputs and outputs, you have a netlist. Later on your debug system can say, ‘I happen to know this because I can propagate these things.’ The idea here is that it’s actually quicker to reconstruct in real time than it is to dump all the data. The old way of doing this was to dump everything. You’d say, ‘I want all these waves.’ The tool said, ‘You want every signal? I’ll give you every signal.’ Today’s tools will look at that and say, ‘What does that mean? What data do I need to give you?’ That will drastically reduce the amount of data being dumped, which then accelerates the simulation so you don’t have that backlog. You have smaller files. This is done with simulation and with coverage.”

The next question is how the files are used. “If you look to coverage, these files are used across a division,” said Hand. “Should you be dumping individual files, trying to merge them together, and then moving those files to a verification management system? The answer is, probably not.”

There has been work on behalf of EDA tool providers to do more with the data, including better automation, as well as technologies that have traditionally been applied to big data problems like machine learning, because there is the ability to collect, store, and correlate all the data.

“Then, based on set coverage goals, machine learning techniques can be used to reach 99% of the same coverage achieved in regression in 3X to 5X less time,” Cadence’s Stellfox said, pointing to a huge opportunity in analytics. “Humans actually can make better judgments on the data, and by bringing the data together in some form, it makes it more obvious where the engineering team should focus.

Stellfox used failure clustering as an example. Lots of failures may be coming back from a regression, but many times the checker is pointing to the same failure, albeit from different messages. “If you have a regression of thousands of tests, and you get hundreds of messages, you can waste a lot of time trying to weed through that and figure out where to start the root cause analysis.”

And because the aim with verification data files is to reduce noise and provide useful data, with too much data, it can be impossible to focus on real issues.

Philippe Luc, director of verification at Codasip, described an example scenario in which the verification file was 100GB. “This was a Linux boot, and during the boot — from reset to the prompt, we logged all the instructions, all the memory transfers, all of what happens inside the CPU. This was used for debugging. One of the constraints of such a big file is that it takes up a lot of space, but also, it’s also complex to open in a simple editor.”

From the perspective of random testing using this file, lessons were learned from going back and looking at the data that could then be used for a future project. “We ran regressions testing on the cluster, which would generate 100GB during a night of regression, separated into 1,000 tests, where each test produced about 100MB of data,” Luc said. “[Custom] tools were developed to handle this because you can’t produce 100GB of data every night and store it for more than a week. It really doesn’t fit.”

For this purpose, it’s important to define verification. Two considerations are important here, he stressed. “First, you must ensure the quality of the CPU as a deliverable, meaning you have to prove to the customer that you have run billions of cycles with plenty of good results showing that your product has been tested, and it works. The other part is quality. Quality means you have to find every bug before the customer. This is about breaking the system, going into the corner cases, and finding the bugs. And these are treated differently.”

There is data to show to the customer that the device works. There is also more internal data used to look for bugs. “I used to joke and to say we want more clusters to launch more tests, have more space,” Luc said. “I used to say that when I run verification, and I want to break things. I only need to run the test that failed. When I want to break things, I don’t care about the test that will pass. I’m just consuming power of the cluster, but I don’t care if we don’t find any bug. With the tests that pass, I want to count the number of cycles, count the number of instructions in order to log and to prove that this amount of data has been applied on the CPU, on the test. I keep the necessary information to be able to reproduce exactly the same test. I don’t keep all the data produced from a test.”

Using that data to reproduce the test requires extra work on the data. The first iteration would be to collect simple statistics and to communicate to the customer how much work was done to prove the quality. The next step is what to do with data, and that’s where coverage and other statistics come into play.

“You should use the statistics to tell if your verification is strong or weak for this area. This requires engineering time to deal with the data and improve the strength of the test bench,” Luc said. “Some engineers may be happy saying, ‘We found a bug. Just fix it and go to the next bug.’ But good engineering practice is to look at why, and how to improve repeatedly.”

Understanding and utilizing failures
Optimization of regression runs adds new challenges. “Some companies don’t talk about the seeds for the failing tests, so in some cases they don’t see it in the next regression failure and don’t know if they fixed the bug or not,” said Olivera Stojanovic, business development manager at Vtool. “In this case the test should be traced with the seed that was failing, leave it in regression until it passes, then somehow remove it. Ideally, all of this should be done automatically, not left to the verification engineer to do manually. Also, there are questions about how long the results should be saved in terms of the regression logs, or whether just the results about failing tests should be saved.”

Darko Tomusilovic, verification manager at Vtool, agreed that only information about the pass/fail ratio and coverage needs to be saved. “There, in principle, at least in the software but also in the verification world increasingly, continuous integration should provide that. In software, it’s standard, but in hardware we are starting to notice that it has quite a lot of benefits. So, how do we have as little data storage as possible to save the meaningful information?”

This speaks to the ability to find bugs. Is it possible to take a verification run that passes, look at it, and then learn from it? Can the verification strategy be reproduced for a specific kind of activity for finding bugs in a certain way, even though something passed?

“In general, we are running random scenarios, and pretty much everything here is based on randomization, which is not really stable,” Tomusilovic said. “Any change in the behavior of either verification environment, or the RTL for that matter, can disturb your scenario, and it gets tougher and tougher to reproduce the case you once had. What I usually do in such cases is, over the course of development — even if I don’t think of some critical scenario in the beginning — as soon as it catches my eye, I make sure that I dedicate a specific cover point for it. Even though it is very hard to achieve, I make sure that I don’t rely simply on running the same seed, because you simply can’t trust them. In the end you must make sure that your test case will keep this scenario, regardless of the seed and regardless of the changes that happen.”

This points to the idea of continuous integration, whereby the tool will ping the verification engineer every time there is a failure. “They can then make sure to add some mechanism of tracking it forward,” he said. “And for all safety-related areas, this is a requirement. You must prove over the course of development cycle that the lifecycle of the bugs, and all of the failures, are traced.”

Stojanovic agreed. “Some people, when they reach a certain point of verification, like to use a tool to analyze the coverage per test so they can, for example, see which test or which seed reached how much coverage, and then create the subset of the test with certain seeds that reach 100% coverage. On top of that, they add some randomization. In this way, the size of the recession logs is controlled by decreasing the number of the runs so it will help you to reach coverage easily, but with additional randomization you will hit some corner cases or different combinations that might not be visible in the RTL coverage.”

Profiling can help here, as well.

“There’s always a ton of focus by our engineering teams and within our customers on how to tweak out more performance,” said Stellfox. “Profiling the user code — the test bench code especially, where complex object-oriented test benches and SystemVerilog are being written — provides a lot of room to make bad coding mistakes. But if you can profile that, you can actually identify places where a couple lines of code can improve simulation performance by 7X or 8X. ln verification/performance audits, we try to bring those best practices in, and we recommend to periodically do some profiling just to find the gross issues, because it’s pretty easy to do. Taking advantage of that kind of data to get better utilization is something that should be part of the discipline as part of the process. Verification is probably the biggest workload of EDA, so anything you can do to speed it up, besides what we do to speed up the simulator, involves the code that runs on it. That really can help get more efficiency out of your compute.”

The space intersecting verification, data, and efficiency is where new approaches are coming to light, including large data concepts that follow a modern, distributed-data approach in the cloud. For instance, engineers from their desktops could attaching to a multi-user database, which doesn’t have to be in a commercial cloud. It could be in a private cloud at a customer site.

“It’s a distributed database that everyone is feeding into, so you may have huge amounts of data, but everyone’s putting their little bits and pieces in,” Mentor’s Hand said. “And now what that means is you have centralized data, which then feeds into how it can be used. For this, we’re using AI/ML to ask what tests can we eliminate. Where are we not seeing benefit? Where should you be looking for the key issues?”

Moving forward, questions will center not so much around how much data there is, but on how data is being collected and what is being done with it.

“We’re all going to find the fastest way to get the data, whether that’s through reconstruction, whether it’s through the duplication, whatever it is that you need to do,” Hand said. “You’re going to make those files smaller. The bigger question is, do you have a collaborative environment or an individual environment? For some data, an individual environment is all you need, so a reconstructed database for desktop debug for a designer is all good. But if you have a design methodology where you have to start handing off data — and coverage is a good example — then you want to have a database-driven approach where you can share that and people don’t have to sync up all the data. They can just say, ‘My tools can look into this data, give me the information I need.’ And not only that, now you have a centralized database, you can start using AI and ML on that database, and it becomes so much more powerful. And honestly, if the database size is terabytes, I don’t care because I’m getting richer data, and it becomes a different question. And then, how do you analyze that data holistically?”

Leave a Reply

(Note: This name will be displayed publicly)