System Coverage Undefined

What does it mean to have verified a system and how can risk be measured? The industry is still scratching its head.

popularity

When is a design ready to be taped out? That has been one of the toughest questions to confront every design team, and it’s the one verification engineers lose sleep over.

Exhaustive coverage has not been possible since the 1980s. Several metrics and methodologies have been defined to help answer the question and to raise confidence that important aspects of a block have been verified. But those metrics and methodologies have struggled to keep up with growing complexity, and they do not scale to the system level. A new framework for understanding coverage and completeness is required.

Functional coverage, as defined for constrained random verification at the block level, is not an appropriate metric. It is based on unverified observations of activity within a design. Coverage models based on this notion are difficult to define, and completeness can be ascertained only by relating it back to code coverage of the design. Meanwhile, code coverage was discarded a decade earlier as an inadequate primary coverage metric because it does not take into account concepts such as data dependence and concurrence.

Put simply, two imperfect metrics are being relied on today to provide the necessary confidence that a design has been adequately verified.

The advent of the Portable Stimulus Standard from Accellera should be initiating a new round of discussion to answer this question in a way that scales for the future. At the moment, the industry is primarily focused on working out how to retrofit existing notions and ideas into the new arena.

In an age where chips are built from pre-verified blocks, it is important that system-level verification concentrates on what has not been verified at the block level. But the question remains: What are good metrics that are strongly correlated with system-level issues?

A change of focus
Several things have changed in the verification environment since SystemVerilog and UVM were created. Some of these are related to the mechanics of verification, while other are related to the coverage space. “For IP and sub-system verification, using UVM and constrained random, the goal is typically to verify all of the design features exhaustively, or as exhaustively as possible, independent of any SoC context,” says Mike Stellfox, fellow at Cadence. “The design features map nicely into functional cover groups and crosses of those. It was not built for large scale SoCs where you are typically verifying and integrating software across multiple platforms, such as simulation, emulation, FPGA prototyping and even silicon.”

While coverage may have been defined for each of the blocks, that may not help provide an answer to system-level coverage. “The coverage model at the system level is not just an aggregated model of what you did in all of the blocks,” asserts Mark Olen, product marketing manager for Mentor, a Siemens Business. “That would just make you repeat all of the work and doing it in a more difficult environment. So, it must be something different.”

In fact scale takes on a different dimension. “System-level coverage and big data have some overlap,” states , CEO of Breker. “A fairly simple ARM system has 10^93 possible paths through a Portable Stimulus graph. We may also need to look at all of the possible combinations of these. So while Portable Stimulus may define the coverage space, it begs the question of what coverage means. And we also have no way to know if this is complete.”

Stellfox agrees. “At the SoC level, you are dealing with sparse coverage. Nobody will try to verify every interaction of every design feature in an SoC. The scale is just too big.”

Defining the coverage space
A Portable Stimulus graph defines the intended functions that a design is meant to perform. These are often called scenarios or use-cases. It is essentially an executable requirements document and is being defined in a way that tools can generate testcases from that graph. These testcases are complete self-checking testcases and may be generated to run in several execution environments.

The graph is a partial specification in that it does not have to be complete for it to be usable. As soon as a single path through the design has been defined, it can be used to generate testcases. If the graph does cover every intended system-level function of the target device, then the graph does define one aspect of the coverage space, but not all of it.

Fig 1: Coverage Space. Courtesy Breker.

In addition to every possible path, we also have to include every combination of concurrent activities through the graph. If only two things could happen concurrently, then the total coverage space is now X x (X-1), where X was the total number of paths. This assumes that all paths are independent, which may not be the case. In fact, there are aspects of that which cannot be known from the PS graph.

PS does not model what the hardware is, only what it should be capable of doing. So it is not always possible to know ahead of time what a given test will actually do. Consider a test where two threads each require access to a resource that can only service one at a time. A PS tool can schedule when each thread should have access, but the test generator does not know how long preceding tasks may take and thus may not know the order or way in which the requests from the two threads may overlap. There has to be an information path back from the executed test to the PS tool so that coverage data can be annotated on it.

There are other reasons why aspects of test generation may also be deferred until execution time and these decisions taken within the target also have to get communicated back to the coverage tool.

Completeness
Completeness of the PS graph is difficult to assess because it relates back to the requirements document, which is probably written in a natural language. “You need to identify the major things that the system is meant to be able to do,” says Hamid. “Then you can start successively decomposing them so that you think through all of the use models that a design can go through and cross-check that against the feature list.”

In the past, completeness was verified by comparing against a different kind of coverage metric. For example, functional coverage was verified by looking at code coverage and identifying parts of the design that had not been exercised. This would indicate tests had not been written or that cover points were missing.

That was only possible because functional and code coverage were both tied to the implementation and thus there was a natural overlap between the models. But implementation coverage and intent coverage are two different things and at the system level, it is intent coverage that is important. “We have used implementation coverage as a proxy for intent coverage for the past 20 years,” continues Hamid. “Verification is proving that an implementation matches intent. Intent is what we need the chip to do. 100% implementation coverage is no guarantee that you have any coverage of intent.”

So how do we ensure completeness of the coverage space? We know that a realistic coverage goal can never approach all of the coverage space. With sparse coverage, there is no expectation that all parts of the design will have been covered by system-level tests. That is still the role of block-level verification. What is required is a way to define the subset of the total coverage space that is to be deemed sufficient for verification at the system level.

“You can’t look at it in the same way as the current approach, which is code coverage, FSM coverage, etc.,” says Ashish Darbari, director of product management for OneSpin Solutions. “While they provide incremental value-add, you need a holistic view of the entire coverage space which you cannot get from the parts.”

Defining coverage goals
This is where the industry is struggling today. System-level coverage is not just looking at functionality. It is the place where performance and power have to be verified, as well.

“In SoC verification, power and performance become peers to functionality in terms of their importance,” says Stellfox. “Consider an automotive chip with vision recognition processing, and you have a use case where you can’t get enough bandwidth between the camera interface and the accelerator. That can be a catastrophic failure, and it is a performance issue. What’s needed is the ability to build environments that can stress performance and power, and to track that you have achieved the use cases that you wanted.”

It may seem reasonable to ask what is the worst-case combination of scenarios? “If you design the product for worst case, you end up with a product that is so heavily overdesigned that it is not interesting economically,” says Drew Wingard, CTO at Sonics. “The simplifying assumptions associated with worst case are too conservative. The art is trying to figure out in the time domain what things are not going to be happening at the same time and convince yourself that this is safe and reliable. This is a hard problem. That makes the job of the architect more interesting.”

The industry is currently trying to define some best practices. “How do we bring a more systematic approach to SoC verification, in a similar way that UVM brought a systematic approach to best practices for the application of constrained random at the IP and sub-system level,” says Stellfox. “PS has the possibility to bring the same type of approach to SoCs and large systems.”

Defining a coverage model

So should the working group be looking to define a coverage model for Portable Stimulus? “The standard brings agreement on syntax, format and grammar,” says Olen. “It is not looking to standardize what one does with it. Will there be standardized ways of writing metrics? We have mechanisms in place today that allow our customers to set up some of these metrics. They may use a standard syntax (SV cover groups, models, points), but they are adapted very specifically to the target measurable. We envision something similar to that where there are standard ways to describe these measures.”

Do we even need coverage at the system level? PS is primarily about creating directed tests from a model and while they may employ some randomization, the test areas are going to be defined by the user. “I have not seen many people doing directed testing who worry about coverage,” says Darbari. “Most of the time they just write tests.”

Adds Stellfox: “The bulk of what you are trying to do is to make sure you have someone with application knowledge clearly define what are the most important use cases to test, and even how you should chain those use cases together in ways that could easily approximate how an end software application might be using the system.”

We have to acknowledge that companies are doing system-level verification today. “They have a lot of home grown stuff, so maybe it means that we as vendors are not doing a good enough job of providing a standard yet,” says Olen. “As an engineer, if I can’t find something that works for me in the market, I build my own. Once someone has a scalable, reusable alternative, then I cannot waste my time doing it anymore.”

For implementation coverage, vendors have gone off in their own directions, making it difficult for users to aggregate coverage from multiple tools. UCIS was defined to try and bring those metrics into alignment, but each vendor’s coverage semantics are different enough that this has been problematic.

It is very likely that users will adopt multiple PS tools because each vendor is likely to excel in certain areas, such as targeting emulation or chip bring-up. This means they will want to relate all coverage data back to the original graph which is a common model that feeds all of the tools.

System-level coverage needs to be defined or, at the very least, the mechanisms to define it need to be common across all of the vendors. Without that the industry will be doing a disservice to the design community.

Related Stories
When Is Verification Complete?
The answer depends on an increasing number of very complicated factors.
FPGA Prototyping Gains Ground
The popular design methodology enables more sophisticated hardware/software verification before first silicon becomes available.
Portable Stimulus Status Report
The Early Adopter release of the first new language in 20 years is under review as the deadline approaches.



1 comments

Paddy3118 says:

| “The standard brings agreement on syntax, format and grammar,” says Olen. “It is not looking to standardize what one does with it.

Standardising before its use-cases are known? How will that affect the usability of the “standard”?
It sounds like horse-before-cart, but it at least gets people thinking about what comes next…

Leave a Reply


(Note: This name will be displayed publicly)