Measuring Verification Accuracy

When is enough verification good enough to tape out?


Verification is the unbounded challenge that continues to confound engineering teams across the globe, who want to know when “enough” is “good enough” to proceed to tapeout. The answer is not straightforward, and it includes more variables than in the past, particularly around power.

Harry Foster, chief verification scientist at Mentor Graphics, noted that when using the term ‘measuring,’ it implies that there is some metric, which is difficult because realistically in functional verification there are only two metrics typically used—code Coverage and functional coverage—as well as a new and emerging one, statistical coverage.

“Code coverage is an automatic metric, and really all that does is quantifies the accuracy of the input stimulus in terms of its ability to activate a line of code or an instruction in the design, and that’s it,” said Foster. “That’s pretty fundamental because I could actually activate a line of code that has a bug on it, but never observe it. So the accuracy in that is only how bad my input stimulus is. That’s all it’s telling us.”

Another metric used in a way to determine the accuracy is functional coverage. But unlike code coverage, which is an automatic metric, functional coverage is a manual metric. “I have to create a functional coverage model,” he said. “What that does is tells me the input stimulus’ ability to activate key functionality in the design, and generally that functionality is finding a test plan by creating a functional coverage model. The challenge there in terms accuracy where code coverage is automatic, functional coverage is inherently not that way. You run the risk of the fidelity of the functional coverage model not being good, which means that you’re missing things that you need to check, and you would never know it.”

The coverage model beginning to emerge is statistical coverage, he said. “We used code coverage and functional coverage basically as a metric to quantify the ‘goodness’ of our IP test environment but that starts to fall apart when you move to the system or SoC and those type metrics don’t work there. Statistical coverage is totally different. It also requires you to manually create this coverage model, but what it’s doing is looking across IPs and across many simulation runs and evaluating this in terms of a system as opposed to an IP. In other words, there are aspects to the design that you’d never know that you verified by just looking at a single IP. You have to look across IPs and look across time, which is complicated. An example of that would be if I’ve got multiple caches in my design and multiple processors (which you do in large systems/SoCs). The traditional code coverage and functional coverage starts to fall apart there. Again, it suffers the same problem in terms of accuracy. It’s like my dad always used to tell me, ‘If you don’t ask the question, you won’t get the answer.’ It’s the same thing in these coverage models, functional or statistical. If you don’t think about what you want to check, you’re not going to check it.”

Along these lines, Frank Schirrmeister, group director, product marketing for System Development Suite at Cadence, said the intent behind what you’re trying to do must not be forgotten. “There are really two intents and they are often confused. First, there’s a desire to get the design as bug-free as possible, and that’s really a desire that translates into product quality. ‘I have no recalls, I have no bugs.’ It may translate in certain application domains into security aspects, like in military, security or safety in automotive. ‘The brakes shall never do this, otherwise people are dying.’”

Second, what is overlaid on this and often confused is the question of when is the design ready for tapeout, he said. “In the context of measurement, we’ve been leading the charge on the verification side with what we call metric-driven verification, which is essentially saying, ‘Here’s my plan of verification of what I want to verify. Then I use coverage of all sorts, from code to functional to assertion, to track against that.’ Then the metric-driven piece says, ‘I have my targets so I’m currently getting the following coverage, for example, which gives me certain verification confidence.’ And then the automation piece is able to tailor new tests in the direction to cover the uncovered spaces that haven’t been checked yet.”

Tapeout has many pieces, but if you wait for verification to be finished it’s unlikely that anyone would tape out. “Verification by itself is an unbound problem,” Schirrmeister said. “You’re never done. It’s really a question of how much confidence do you have that you’ve done enough for a certain purpose. Because of this ‘unboundedness’ you need to decide what not to verify. You need to decide, ‘I’m building within a set of scenarios I have defined to say that’s what I’m verifying against, but I also need to be explicit about what I’m not doing so that I’m not surprised later on if somebody is using it in a mode it wasn’t intended for.’ There may be bugs. The verification may not have covered that.”

He stressed that what’s needed in verification accuracy is a top-down approach. “The IP level verification, and perhaps smaller subsystem verification I would argue are solved problems at this point to the extent that UVM and so forth helps to do those at the block level. But that’s a bottom-up view. What needs to change is shifting this from a bottom-up view to a top-down view. That’s what’s missing. It’s a question of, ‘I have 150 IP blocks I’m integrating, I’m creating 20 new blocks, some of them with high-level synthesis, and some of them manually coded. How do I make sure in a top-down fashion that everything works?’ That’s what’s missing and that’s what we’re going toward.”

More than coverage
Foster stressed that accuracy is even more critical than coverage. “If you look at the way we do verification today, we quite often compromise accuracy for performance. We do that deliberately with a risk. For example, in an RTL model we compromise accuracy to speed up the simulation by throwing out timing and power. That means we run the risk that there’s aspects of the design that we can’t even model in the RTL. A good example would be meta stability across clock domains. That’s the reason you’ve seen a lot of technologies that have emerged like clock domain crossing verification.”

Another good example there is power, he pointed out. “Power is an interesting one today. This is a problem we didn’t even have 10 years ago. We have multiple IPs with different power domains. We basically compromise accuracy by throwing that out, but we’ve gotten to the point that we can’t do that anymore and we can’t model that in RTL. That’s led to the emergence of UPF to allow us to describe the power intent and then enable us to do power aware type simulation to give us a little more accuracy but see what we’re doing? It’s interesting, we go to the techniques to speed up simulation but then all of a sudden the accuracy is poor and we’re getting [burn on the nets], so we have to bring it back in.”

Michael Sanie, senior director of verification marketing at Synopsys, pointed out that when moving up to the system level, the tools used are in addition to the simulation and Formal Verification. “You start to do architectural exploration or performance analysis at a very high level, even post RTL. You also start looking at doing Emulation where you have the whole design running with a bunch of real software thrown at it. And you also may end up doing FPGA prototyping. People are bringing the notion of coverage into those tools as well.”

He expects that the industry will start to demonstrate how to track coverage within emulation environments, but wouldn’t be more specific as to tool announcements.

“At some point, we need to start looking at performance coverage, because when you look at a system, you have now a metric that you need to be running this at a certain speed. Otherwise, the whole thing is useless. And how much of your tests have you done for a performance metric? This is not going to get easier, of course, but maybe at that point there will be other ways of managing confidence, not just coverage. Maybe look we’ll take it to the lab and run a whole bunch of tests.”

He agreed the whole area of verification is an unbounded problem that’s just getting bigger and bigger. “Our understanding as an industry on how much of it is handled is that we have no idea. Still, we will come up with new ideas and new techniques.”