Improving Verification Methodologies

The verification problem space is outpacing the speed of the tools, placing an increasing burden on verification methodologies and automation improvements.

popularity

Methodology improvements and automation are becoming pivotal for keeping pace with the growing complexity and breadth of the tasks assigned to verification teams, helping to compensate for lagging speed improvements in the tools.

The problem with the tools is that many of them still run on single processor cores. Functional simulation, for example, cannot make use of an unlimited number of cores, except by distributing separate simulations across them. While this helps total simulation throughput, it helps less in a tight debug cycle where time-to-results is all that matters.

Three areas are showing improvement, but at different speeds and varying effectiveness, according to Frank Schirrmeister, executive director for strategic programs and systems solutions at Synopsys. “There is the base speed of the engines, and that is increasing,” he said. “The second thing we refer to as ‘smarter verification.’ That’s where you have methodology and you have the right scope. Solving the right problems at the IP level may not transpose to the SoC level or the system level. The third is automation and AI. It is a three-legged stool — making things faster, smarter verification, which is methodology, and then there’s automation on top of that.”

Doing things smarter and applying automation are big steps forward. Included in this is knowing when you are done, which is one of the toughest problems in verification.

“Knowing what you’ve tested is critical,” says Bryan Ramirez, director of solutions management within the digital verification technologies division of Siemens EDA. “Tying that back into requirements is going to be increasingly important. How to automate going from requirements, or design specification, to your test plan and coverage is an area where the industry is putting a lot of investment. But that doesn’t solve the problem of how do I close coverage. Another part of this is by being able to bring together coverage from different engines within the verification spectrum. Things that I can solve in formal, let’s mark that as solved and make sure I don’t try to close that in simulation. Or where can I leverage emulation or FPGA prototyping to accelerate some of that, so that’s running the cycles faster. You have to be able to merge your coverage results across all different engines so that you can have one holistic view of the solution.”

While closure is the ultimate goal, an equally important question is which tests must be run now. “When you make a change to your RTL, do you need to run the whole regression suite? You have tens of thousands of test cases, and you don’t know which test cases are giving you the biggest bang for the buck,” says Bradley Geden, director of product marketing at Synopsys. “One of the goals of automation is to figure out which ones are the high-value tests and run those first. Can we use AI to focus on running the regression that just focuses on those changes? It’s a very dynamic environment, because designs are changing, testbenches are changing. It’s not always the same recipe, so you have to continuously reassess and figure out which is the highest-value test. Now they can hit their targets much quicker. The goal of running regression is to fail. Now you can fail much quicker, and you can find those bugs much quicker. You free up the hardware.”

It’s also important to understand which tests you do not have to run. “Somebody once said to me, the fastest simulation cycle is one that you don’t have to run at all,” says Siemens’ Ramirez. “So how can we help them better determine up front when don’t you have to run a simulation or an emulation cycle. That’s an area where we can improve.”

Constrained random test pattern generation is not the most efficient environment. “This is where we have seen the biggest gain from AI,” says Matt Graham, product management group director at Cadence. “AI can help optimize wasted cycles and redundant tests. Which tests are useful and can we target those areas of the design we actually care about, rather than hitting other areas of the design for the thousandth time? By its nature, it is massively inefficient. But we might be cracking that nut, as well, in improving its efficiency.”

Running the best test cases is the first step. “The problem is that you come back the next day and have failures that need to be debugged,” says Ramirez. “How do you address them? This is where we really start seeing how AI and ML are helping to solve the problem. How do you start addressing these issues in a more data-driven approach? There is a lot of information coming out of verification, and how do you help the customers make more sense of that? We’re seeing customers look at that very strongly over the past one or two years. It makes them smarter about how they address verification.”

Debug currently takes more time than any other aspect of verification. “We’ve done a lot of investments with AI, to do smart binning, to do triaging, to try and automate the process of getting as close to the cause of the bug as possible,” says Synopsys’ Geden. “When a verification engineer comes to work in the morning, that job is made a whole lot easier being able to predict before you even start regression, based on history, where and what is most likely going to be the cause of the failure.”

But AI also could make the situation worse. “Everybody’s hopeful that GenAI is going to help us create RTL much faster,” says Ramirez. “GenAI is going to make the problem worse, because you’re going to be able to create code much faster, but it’s going to be of lower quality. There’s going to be more stuff to verify at the end of the day. You look at software companies and they’re relying heavily on GenAI for their code — 20% to 25%. But that’s a very different world, where you can go patch software. When we’re creating silicon that costs millions of dollars to re-spin, you have to get it right the first time. Those two domains, the software world and the hardware world, do not have the same challenges in how we can fully utilize GenAI.”

Abstraction and Hierarchy
Verification engineers have long tried to ensure the right kind of bugs are found at the most appropriate stage of development. “If you find IP bugs at the integration phase, then you go back to the IP methodology and figure out why you didn’t find it earlier,” says Synopsys’ Schirrmeister. “If you look into the scope, which includes IP, subsystem, chip, chip in system (see figure 1), and if you look into the engines — transaction level, simulation, emulation, prototyping, silicon — our customers are telling us that if they find an IP bug in emulation, there will be a strong-worded conversation over in the IP department. Why has that not been found in advance? From a hardware speed perspective, you have these very big designs. And from a scope perspective, you start at the IP level, you go to the sub-system level, and so forth.”

Fig. 1: Appropriate bugs targeted at each stage of development. Source: Synopsys

The best solution is to approach it from a couple of different perspectives. “Higher level abstraction is one of them, and then focused,” says Bernie Delay, senior engineering director at Synopsys. “At the SoC level, the scenarios you’re trying to create are not lower-level IP scenarios. They are focused on areas like cache coherency. You need to have both languages and methodologies specifically focused on those problems. You also want to scale it up and do more system-level stimulus generation. Focus your stimulus and coverage toward those types of scenarios. That’s where the SoC level and the multi-chip level issues are going to be. Doing smarter generation of the tests, both from what you’re focusing on within the system, and using the right abstraction level to do that, is another key to this whole thing.”

In some cases, repetition can be used to simplify the problem. “Repetition is essentially a hierarchical approach, where you capture the behavior of one of the repeated elements and then model that and replicate it across the array,” says Marc Swinnen, director of product marketing at Ansys. “There are two problems that complicate this approach. One is that the elements at the edge of the array see a different neighbor than the ones in the central, so the edge needs to be treated somewhat differently. Also, some may have wires running over the top of them, which has some influence. But a lot of designers avoid running wires over these elements precisely for that reason — to not break up the hierarchical symmetry.”

Model replacement has significant benefits. “Black-boxing is becoming more popular,” says Cadence’s Graham. “This is where you take parts of the RTL and replace them with a higher abstraction model. This has been made possible by advances in sequential equivalence checking that compare a C algorithm and the RTL that’s been developed. That enables an overlap where we are working in C, we’re working in RTL, but you don’t have to completely verify a C algorithm before we can start to build it. We can now parallelize those, because we can guarantee equivalency for some portion of that flow and shift everything left. Our customers are getting smarter about doing that.”

As more people adopt a top-down development methodology, having multiple abstractions will become common. “If someone initially creates a C model, before they move into implementation, you can start swapping in some of those models for lower-level blocks, especially where there’s repetition,” says Ramirez. “That’s where you can start seeing some big performance improvements. What will be interesting to see is how we can better automate the swapping in and out of models. What is a safe time to use the abstract model, and when do you actually need to have the RTL? Can I do that dynamically, on the fly, based on the activity that’s happening in the test bench or the simulator?”

Selecting the right models, and design pruning, becomes even more important when looking at issues such as power. “If you are doing a specific use-case analysis, like power, you are not taking the processor subsystem at the detailed level and just mapping that into the whole chip and trying to do it there,” says Schirrmeister. “You just take the subset of the chip that is relevant to this question. You become verification task-specific in your verification setup. For a particular verification task, I might only select a power model for one individual processor, and higher abstraction models for the PCI express interface, the memory, and the caches so that I understand the software workload. I do not need the full chip.”

Methodology issues remain, such as how coverage from different abstractions of models can be combined. “We are seeing more use of C and C++ driving RTL, but to what level of trust they believe that works is still unknown,” says Ramirez. “We also are seeing customers wanting to create abstract models of their design, and they want to verify those models. They want to do coverage on those models as a way of better solidifying the architecture, but also define what their test environment is going to be.”

Doing more verification than necessary is wasteful. “Constraining verification is crucial to ensure the best ROI and achieve high confidence that the IP will work correctly in the target project,” says Sergio Marchese, vice president of application engineering at SmartDV. “If, for example, it is known that the target chip will operate at a certain frequency or range of frequencies, verification should focus on those frequency scenarios. When the IP will be used in a different project, it stands to reason that verification will have to be executed again according to the new scenario. Or if it is known that the bus topology of an interface IP is using a single manager, verifying multi-manager scenarios is a low priority. Effective communication between the IP provider and IP integrator is important to make sure that verification is executed in an optimal way.”

Conclusion
Verification is a process, not just a collection of tools. Finding the best way to make the optimal usage of them is highly dependent on the design being undertaken. The best methodology will be different for every company, even though they may have significant overlap.

“You are trying to verify a design in the least amount of time and still maintain the highest level of quality,” says Synopsys’ Delay. “If you’re looking for one silver bullet, I don’t think you’re going to find that. What you are going to find is that there are AI/ML techniques, there are formal techniques, there are portable stimulus techniques, there are base performance considerations. You need all of them to get to the point where you can tape out a chip with a high level of confidence.”

Related Reading
Improving Verification Performance
Verification tools are getting faster and capacity is increasing, but they still can’t keep up with the problem space. Verification is crossing more silos, requiring expanded skill sets.



Leave a Reply


(Note: This name will be displayed publicly)