Experts at the table, part 1: The definitions of coverage are changing and the industry is struggling to find ways to define what is needed.
Semiconductor Engineering sat down to discuss the definition of sufficient coverage as a part of verification closure with Harry Foster, chief scientist at Mentor Graphics, Willard Tu, director of embedded segment marketing for ARM, Larry Vivolo (who at the time of this roundtable was senior director of product marketing for Atrenta), Simon Blake-Wilson, vice president of products and marketing for Cryptography Research, a unit of Rambus and Pranav Ashar, chief technology officer at Real Intent. What follows are excerpts of that conversation.
SE: Are coverage metrics sufficient for recent advances in design?
Ashar: Coverage has to cover the failure modes. Over the past few years, the SoC design process has become better understood and as a result, the kind of failures a chip is likely to encounter have become better known. is meant to provide the confidence that those failures modes will not happen. Sufficiency should be measured in that regard.
Tu: If we think about this from an automotive perspective, which is one place that a lot of SoCs are being used, we have to consider ISO 26262. People using this are going through a learning phase. Previously these types of solutions used dual lock-step MCU oriented solutions, but now the SoCs don’t lend themselves to this because they want performance. So now they have a challenge in that they either have to do software lockstep or have other hardware architectures. I can define the failure modes I am going to try and detect and that is where the challenge is. There is not sufficient coverage for this and the industry is hungry for more tools to help them address this space.
Foster: Another example is DO 254, and medical has another one, but we have to take a step back and think that verification is all about mitigating risk and this is a process. As with any process, there has to be something that tells you when you are done. The problem is that there is no perfect coverage model and it depends on what is really critical. In reality we can’t close on coverage because there are so many things that we could look at.
Ashar: There are two dimensions in terms of getting a measure of coverage. First there is the application oriented narrowing of the scope in terms of what it means to have good coverage and then there is the failure mode orientation and it is the combination of these two that provides a good way to take a fuzzy problem and pin it down to something real.
Foster: The challenge is that it is more art than science. It requires skill to decide what is important and relevant and what is not.
Ashar: Hopefully the line is moving more toward science.
Tu: if you want to get to 100% coverage, then it takes too much engineering time, and that is where automation tools come in and help you prioritize. When we have a failure, we can look at the cause of the failure.
Vivolo: We take a more generic approach to that and look at it from the perspective of any type of coverage that requires a human to create constraints and to define it is inherently risky. We only think about known problems when it comes to testing. What are the things that we did not consider? One approach for the error modes is to look at what you test and you basically say that whenever you see something that you never tested, then that by definition is suspicious. It could be an error, but it may not be. That is a narrowing down of the focus as you move to the system level.
Tu: One thing you said is that you want to take the human testing out of it. I agree with that, but we are still using humans to come up with the failure mode and effects analysis (FMEA). That analysis is still art. Let’s take away the automated testing aspect from the human.
Foster: Maybe art is the wrong word because it implies that it is artistic. It requires skill to come up with a coverage model that really represents what is important and what we can realistically close.
Ashar: As we understand a lot of these design processes, the static techniques to address verification of those steps is getting clear and anchored in the verification process. As formal analysis engines come into play, coverage is a side effect of that. By definition, it is full coverage. As you do more things statically, you become less reliant on coverage which is possible to achieve in simulation. Static analysis techniques mean that the coverage side of the management process is implicit and the areas in which you need these simulation measures are getting marginalized.
Foster: You still need a notion of coverage. I still have to check this in the system, or this. There will always be a checklist.
Vivolo: I am not suggesting we get rid of all the basic coverage—just that we do it more intelligently. As far as checking off the behavior, this is an area where automation makes sense because the behavior is high-level and that is where we need engineers to define what are the behaviors that I want to look at. The flip side is that if you take an automated approach and you go in deeper, looking at the state machines and transitions. That is where the corner case bugs will be lurking. If you focus on the high level, it is very difficult to debug and to fully define 100% coverage. At the end of the day it is a blend of all of the above that you have to use. As we start moving to the SoC from IP blocks, that is where you have to start getting more intelligent about the areas in which you focus.
Foster: We do not have the mechanisms to describe interactions between IPs distributed over time at the system level. Automation can help me with aspects of the lower level but it will not help me at the system level.
Vivolo: The problem with the system level is a scalability problem. You may have a problem that only appears when 10 processors interact in a certain way, spanning billions of cycles. How do you solve that with coverage? Those customers are increasingly relying on emulation and prototyping, trying to bridge coverage into those types of technologies. That is a real challenge. How do you do that and make it only target the things that are important?
Ashar: You do need a little bit of an analytical and formal approach. Coverage is addressing the known unknowns. You know the areas that will be sensitive to issues and you want to cover those. But there will be things lurking that you don’t even know about. Coverage does not measure how good your verification process is. A combination of application oriented and failure mode oriented, and a combination of formal and simulation are needed.
Blake-Wilson: Coverage is an interesting discussion when you come to cryptography. I heard a few times in the discussion that there is no such thing as 100% coverage and from my perspective, at a macro level, I can think of regular testing as probabilistic analysis and working out what is the most likely outcome. In security, the challenge is that the threat is not random, it is malicious. And so ideally, if you get to 99% coverage, it means that the 1% is what the attackers focus in on. In the academic world one of the hot topics is provable security. The analogy to formal analysis is good here. They used to get excited with the notion that something could be made bulletproof. And then you would ask the more pragmatic people and they would say that while they like provable security, and a formal model on which you can work out which things can be ruled out as being weaknesses, I can also look at a formal model and find out how I can work around it. What hints can it give me?
Ashar: This is played out in the normal design space, as well. Formal properties were originally written for formal proofs and they realized that they don’t scale very well. You need to have a reasonable expectation that many of them will not complete. So a lot of people use these checks as a coverage metric. I look at the number of times I hit an assertion and this provides a measure of how good the stimuli are since the assertions capture some notion of the things that can go wrong.
Vivolo: A lot of the challenge there is that someone wrote the assertion and customers are combining assertions that they write, cover groups they create, and then they rely on automated solutions to create even more assertions, then combine that with the formal analysis and each is a peeling of the onion to the point where you whittle down the number of properties that someone actually has to look at.
Ashar: A lot of groups are looking at simulation output to mine information for candidate coverage targets. They create a hypothesis and you use that to figure out how good the stimuli are.
Foster: This is a multi-dimensional problem and those solutions work well at the lower range, but they still do not solve the higher-level notion of coverage where I have distributed state machines across multiple IP blocks. It does not solve the problem.
“Coverage” should have been defined. Is coverage in this article referring to verification/validation ?
I apologize that the knowledge center link to Coverage was not placed on the first use of the term – http://semiengineering.com/kc/knowledge_center/Coverage/56