Experts at the table, part 2: Is it time to embrace big data and will it tell us that we do too much verification?
Without adequate coverage metrics and tools, verification engineers would never be able to answer the proverbial question: are we done yet? But a lot has changed in the design flow since the existing set of metrics was defined. Does it still ensure that the right things get verified, that time is not wasted on things deemed unimportant or a duplication of effort, and can it handle today’s hierarchical development methodologies? Semiconductor Engineering sat down with Harry Foster, chief scientist at Mentor Graphics; Frank Schirrmeister, group director, product marketing for System Development Suite at Cadence; Vernon Lee, principal R&D engineer at Synopsys and Yuan Lu, chief verification architect at Atrenta. In part one, the panelists talked about the suitability of metrics, adoption rates and the need to think about how each are used. What follows are excerpts of that conversation.
SE: Have we passed the buck to the users? We still don’t know how to close the gap between coverage and generating stimulus. Should we rethink part of the system to make automation more possible?
Foster: Absolutely. Everyone is working on that. There is also research using analytics to help with this. It is early stage technology and involves machine learning and data mining, but finding the outliers and exploring around them is one thing we don’t do well today.
Schirrmeister: Big data and analytics in EDA. We generate so much data from the engines that we sometimes don’t have the apps to deal with it.
Foster: And it gets worse when you go to FPGA prototypes and post silicon. You cannot solve the Coverage problem by coming up with functional coverage that can be applied at those levels. It has to be techniques that are based on statistical coverage.
Schirrmeister: Another dimension to all of this is software. The amount of data we are generating from Emulation, where we have data coming from trace which is synced with software, and you want to run for a second of time using offline debug – that generates in the order of hundreds of Gigabytes or a Terabyte of data. That is a large data handling problem and then you have different disciplines who want to look at it. The hardware guy understands that something happened to this register that got rewritten, but he doesn’t know that the software guy switched him off and put him into a low power state – those interactions are more difficult to cover at the system level.
Lee: Assume there is a magic solution that can use data mining to automatically close the loop, it still doesn’t mean anything if the coverage model is no good.
Foster: You have to capture some domain knowledge about the architecture. You still have to think. Data mining only works if you know what questions to ask.
Lee: First ask why you are using coverage. If the answer is that it is the standard thing that people do, then there is an issue.
Schirrmeister: Development teams are uneasy because they don’t really know how far they are. They think in terms of confidence intervals – 90% confident that I am there. Coverage is part of it.
SE: The fact that so few chips fail probably means that people are doing too much verification. Are we taking enough risk?
Schirrmeister: How do you define if a chip fails? First time right silicon means that software has to follow all of the errata. Successful chips may have problems and in the strict sense they are failures. We see this a lot. This comes back to what does not need to be covered. Which areas, from a functional perspective, can I live without fully working?
Lee: We need to define the threshold. Are the corner cases small enough and can I describe what is wrong? Even if there is a major flaw, if you can describe it and the work around, then that is successful silicon.
Schirrmeister: Exactly and this is why there are not many failed chips.
Foster: In 2007 the signoff criterion was, ‘Did I execute this set of tests?’ It has shifted to where coverage is now more important and says, ‘Did I achieve this coverage and were these tests run?’
Lee: There are still many major design companies that remain more about the tests. The coverage is secondary. These companies are using scenarios with randomization in them.
SE: But processor people found functional coverage to be inadequate and defined their own metrics.
Schirrmeister: Yes, it depends on the domain, and you need to know about the application domain in order to be able to write the test that represents the end goal. Assertions haven’t caught up. The complexity that people deal with is beyond what one person can understand. Consider a power expert who has to consider the power regions on a chip and which can be shut down and then think about another guy who is a coherency expert in some cache interconnect. If you attempt to combine these two and try to shut down part of the cache, you need a way to combine these two sets of information because someone has to think about problems that could relate to these two issues when defining the tests. If the cache has a problem because a power domain is switched off at the wrong time and nobody has created that test case, then I could be in trouble and a software work around may require flushing all of the caches and that is a costly operation.
Lu: SoC coverage is an interesting problem. If I think about code and functional coverage at the block level and integrate those blocks into a system, I don’t want to be redundant. I don’t expect to see any coverage as being non-reachable because this is one of the reasons why people refuse to use code or functional coverage. Secondly, you need it to be automated. Code coverage is slightly easier. If we do data mining on the IP block, we can migrate this information to the SoC level and it will tell me things that have been covered at the IP level and which don’t need to be repeated.
Lee: Do you mean observing a subsystem or block and turning that into a system-level coverage model?
Foster: The problem with that is that there are interactions that you cannot capture that way.
Lee: If you have one finite-state-machine here and another over there in separate blocks you need to be able to capture their interactions.
Schirrmeister: The concept is valid. There are three levels of reuse – there is reuse from IP-level coverage and test reuse. This is vertical reuse.
Lu: Yes, that is what I am thinking.
Schirrmeister: Then there is horizontal reuse between engines from virtual prototypes through FPGA systems, and finally there is reuse between users. Someone focused on power is looking at a different domain compared to someone looking at coherency or performance. The Accellera Portable Stimulus Working Group is attempting to cover all three. So we do want to ensure there is reuse from IP to system, but the guy verifying the IP may not have foreseen all of the use cases.
Foster: I agree, but when doing reuse you run the risk that it is unreachable at the system level. This may be a risk worth taking.
Lu: The fundamental difference between SoC coverage and IP coverage is that at the IP level all coverage is of equal importance. At the SoC level the weights of coverage change and things like scenarios are high value coverage.
Lee: Is this intuitive or measureable?
Lu: Measureable. Consider a loop-back test. This has high value because it touches a lot of blocks and gives me confidence that the whole system works – at least for that path, and for that one packet. Often people write coverage at the interface level, but this is low value.
Foster: Exactly – the latter example does not correlate all of the activity, or the distribution of state.
Lu: Scenario-based SoC coverage is extremely high value, but very difficult.
Schirrmeister: Users are not always willing to invest the time and this is where automation comes in.
The final part of this roundtable will explore the work of the Portable Stimulus Working Group and new coverage metrics that may be required.