Software-Driven Verification (Part 2)

Experts at the table, part 2: Defining use cases is more than picking the obvious scenarios, it requires deep analytics.


has been powered by tools that require hardware to look like the kinds of systems that were being designed two decades ago. Those limitations are putting chips at risk and a new approach to the problem is long overdue. Semiconductor Engineering sat down with Frank Schirrmeister, group director, product marketing for System Development Suite at Cadence; Maruthy Vedam, senior director of system validation engineering at Intel; Tom Anderson, vice president of marketing at Breker; , founder and chief executive officer of Vayavya Labs and John Goodenough, vice president of design technology at ARM, to talk about software-driven verification. Part One provided the panelists views about software-driven verification and how it relates to constrained random generation and UVM. What follows are excerpts from that conversation.

SE: Do tests for coherence and security have to be run on a cycle accurate model.

Goodenough: Yes, You can develop them, and we do develop them, early on virtual models but when you want to verify functionality or more critically performance you need a cycle accurate model, otherwise it behaves differently.

Anderson: This is an important point when it comes to bare metal testing. You said that if you are just testing your hardware with one version of your software, your coverage is not good. And when the next version comes out, the software doesn’t work anymore. So the whole idea that you have some orthogonality to that particular piece of production software and the ability to go and manipulate the design which is orthogonal to the view the OS provides is very important.

Goodenough: We have to distinguish between verifying the hardware, where I might need different bare metal layers to get coverage, and verifying the hypervisor switcher layer on top of the hardware which is a critical integration. I have to do both.

SE: How do scenarios get defined especially when we consider the different use models and different types of IP?

Vedam: This is a very important point. You have a very different problem between an open-ended ecosystem and a closed ecosystem. A tablet or an entry-level smartphone are closed ecosystems and you don’t see too many people trying to change the OS on these. But if you are looking at the processor that goes into laptops or servers, it is a completely open ended ecosystem. The levers we have to pull to be able to meet some of our schedule challenges are very different at either of these edges. Understanding the ecosystem is critical because this helps us to optimize. If we had the same validation strategy for high-end servers and a low-end smartphone that sells for less than $10, we are really not optimizing things. The challenge is looking at the use cases. From the open-ended side, another challenge that has been around for a long time, is compatibility validation. If you come up with a new processor you have to make sure that existing software and all versions of existing software operate in the same way. For closed systems, it is partnering with your customers so that you understand what their problems are and what problems they are solving and how that translates back into use cases in your system. That drives a lot of the validation. This defines the context in which it is to be used and no other. If we can cover that we get much higher quality in that context and get to market faster.

Schirrmeister: I can easily imagine, and make up on the fly, scenarios such as being able to watch a video when a call comes and ensuring that the whole phone doesn’t shut down. That is a definition of things that just have to work.

SE: At a recent Cadence summit it was said that Apple has something like 40,000 or 50,000 scenarios that define the operation of a single device.

Schirrmeister: It is fairly easy to wrap your head around the scenarios that you want to enable. There are use model scenarios that the user does at the product level and requirements trickle down from that. There is another aspect which is that someone will come up with a use model that you had not considered. You have no ability to predict all of them yourself. So there is a second component which is an inside out component, which is where a lot of software-driven verification comes in. You can’t write all of these test cases by hand, so you need to make sure that when it comes to the integration of components, that you understand their properties and characteristics and requirements and from those you can make sure that you write tests or get tests that a particular block in the system is never used in a way that will shut the system down or that it cannot support. It may not be scalable to write all of your tests, even in a graph-based fashion, and while better than writing C code, the scale may be limited so you need the technology inside out as well as one where you can say these are the characteristics of my blocks and then automatically generating the tests that will stress that module in the ways it is supposed to be verified.

Anderson: Questions come up in a graph world such as can you develop a model, and a graph is an example of a model, where you have captured the behavior in a way that is possible to generate all of the scenarios including those that you may not have been able to anticipate. I think the answer is yes but it is not easy to do that. For some of those tests you may have to go back and consider if this is a reasonable thing for the device to do and if not put some constraints into the model. I think it is possible to anticipate scenarios that the use might discover.

Goodenough: Today, people write the use cases down and from that they write a validation plan and from that they farm out the creation of a load of test cases. These may be canned pieces of software or a set of coverage points and random stimulus. They try to meet the test plan through a combination of the two. I know what I can do, but it is the bit I can’t do that worries me. I know I will complete my validation plan and I can still have a bug escape. What I am interested in is not what I defined I was going to do but what happened in and around it. Because of this we take a very metrics-driven approach, not to coverage but to state-space traversal, which is subtly different. The payload you are running, be it unit level and constrained random, or software at the system level, tells you what happened to the things that you care about. The science that we then apply is: what do I care about given the system I am building, and this is partly experiential. Analyze the bugs, look at them and where they came from. We have looked at Cyclomatic Complexity analysis where you might be comparing two critical events in a system such as an interrupt and a cache eviction event. I can write lots of tests in UVM, or using graph-based methods or software and the net result may be that in all of those tests the cache eviction always comes two cycles before the interrupt. If I never thought about that, the first bug in the field may be caused by the cache eviction happening two cycles after the interrupt. We need to critically look at not just what is it doing but how does it relates to the hardware. In this case it may mean looking at how deep the pipeline is for a write from the processor to getting posted in main memory and all the way back through the coherent network. This maybe 60 cycles, so maybe you should look at your testbench and try and make it hit scenarios through that entire range. We started using data analytics, about three or four years ago, to guide what the software-driven validation should do. The key is knowing what you are trying to do.

Leave a Reply

(Note: This name will be displayed publicly)