Reliability concerns throughout a device’s lifetime are driving fundamental changes in where and when these functions occur.
While the disciplines of functional verification and test serve different purposes, their histories were once closely intertwined. Recent safety and security monitoring requirements coupled with capabilities being embedded into devices is bringing them closer together again, but can they successfully cooperate to bring about improvements in both? Getting there may be difficult.
Three phases in a system’s lifetime have to be considered:
The first two phases have become highly specialized disciplines, and the amount of contact between them is minimal today. What is changing, as indicated in figure 1, are the capabilities available for system monitoring. More applications require continual checking that a system is operating correctly and that they are up to date on security. That requires the ability to update devices in the field, and the continuation of post-silicon functional verification and validation — including performance and power consumption — while the devices are in operation.
Fig 1: Overlap between product phases. Source: Semiconductor Engineering.
“Many applications, like automotive or avionics, have lifespan expectations measured in decades,” says Ryan Ramirez, strategic marketing manager for Siemens EDA. “There is a lot that changes over that time, from over-the-air updates that improve functionality, to increased likelihood of device degradation due to environmental effects over that period. All of these alter the functionality, and thus we will need to continue to monitor for correctness as these changes happen. It is also unreasonable to think a design can be fully verified upfront. Not only are designs increasingly complex, but new requirements like safety and security are impossible to fully understand upfront because you are attempting to verify the unknown space. As a simple example, you can verify that a design is free of known security threats at that time, but new vulnerabilities are constantly being uncovered, and you need a way to keep monitoring for those.”
How did we get here?
Specialization has created this separation. “I don’t think people are really thinking about this in terms of commonality,” says Simon Davidmann, CEO of Imperas Software. “There are definitely overlaps and everybody has the same sort of goals, but they have evolved techniques to solve different types of problem. It’s all about quality. But each team has different requirements for quality, whether it’s getting the specification right, whether it’s the design to the specification, whether it’s the fabrication to the spec, or whether it’s actually the lifetime of the device in the field. People are getting more confused as the scope of verification is widening to include verifying for vulnerabilities. It is actually fragmenting more because everything is becoming specialized in some ways.”
Test and verification are different when it comes to notions of time and abstraction. “In verification you can keep everything at a higher abstraction level,” says Juergen Jaeger, product management group director at Cadence. “In testing, you are dealing with physical effects and you have to deeply understand the characteristics of this cable or this trace between point A and point B. Also, in testing every millisecond counts. Test has to be very efficient in terms of run time, and it needs a very good understanding about coverage. You don’t want to re-test something that has been tested because that is a waste of time and it costs money. In verification, we don’t seem to have that concern. Optimizing a testbench or test suite is not seen as being as important. There could be a lot of opportunity to enhance that significantly.”
When monitoring something in the field, time continues to be important, but there is no tester present to help you. It all has to be done internally. “The ASIL spec says you need to be able to find out if something is broken really quickly,” says Colin McKellar, vice president of verification platforms for Imagination Technologies. “But because these are genuinely complicated chips, people struggle to get good enough answers in the required timeframe. Self-test needs to run very quickly and needs to ensure that your logic is functionally sound and hasn’t broken or had any transient faults. This is also done by a separate test team today.”
Safety may be the driver for integration. “Work is being done in some of the standardization committees to formalize how safety data is shared throughout the supply chain,” says Siemens’ Ramirez. “In the future, you will see more standard approaches to sharing information throughout the supply chain, as different stakeholders will need proof of testing at different points along the lifecycle. There are technologies being deployed to enable traceability between requirements and functional verification results. These techniques can sometimes bring in results from other testing approaches if you have a coverage database that is open and can accept any type of user defined test data.”
It is here that the need for commonality starts. “With ASIL-D, people get hung up on this idea of random bit flips to do with photons,” says Imperas’ Davidmann. “But it’s about how you can trace back flaws.”
Defining behavior
Behavior ties functional verification and device monitoring together. “The first place I see these coming together is with requirements and traceability down to the testing results,” says Ramirez. “Product requirements drive testing requirements, which drive test plans. As your product is being tested, the results should be incorporated back into the test plan and ultimately linked to the requirements to provide end-to-end traceability. We are seeing automation for this at the functional verification level, and I anticipate this will continue as you move to other testing techniques. Functional verification is the obvious starting point because of the vast amount of coverage data generated from regression testing. That makes it impossible to manually link verification results to requirements.”
Formal verification has forced functional verification to look more broadly at coverage metrics. “High reliability hardware requires a combination of functional verification technologies,” says Rob van Blommestein, head of marketing for OneSpin Solutions. “Simulation is the most widely used method, but simulation alone cannot get the job done. Formal adds a second efficient method for thorough verification because of its exhaustive nature. Only after understanding where the coverage holes are can exhaustive verification be achieved.”
We have to keep in mind that the goals are different. “If you’re doing design verification, then your starting point is a specification — some kind of documentation that says this is what it is supposed to do,” says Cadence’s Jaeger. “As a verification engineer, your goal is to make sure the design itself functions within the parameters of the specification. For testing, the functionality is set. You have to be way more creative in the verification phase because you have to come up with all kinds of scenarios. That work is already done once you get to testing. At that point you know what it is supposed to do. Then you want to make sure that it keeps doing that during the lifecycle of the product.”
This leads to different approaches. “There are similarities in terms of how people try to look at things, such as fault coverage,” says Imagination’s McKellar. “But most people in the test areas seem to look at it from a statistical analysis point of view, as opposed to a functionality point of view. Going from a functional verification point of view down into functional safety, down to manufacturing failure rates, is challenging. For that reason, redundancy plays a large role.”
Analysis often requires the injection of faults, and then to find out if those faults can be detected. “That happens both in the test world and the functional verification space,” says Davidmann. “Consider the challenge of creating a compliance suite for RISC-V. We basically inject faults, such as register stuck values, run the compliance suite, and write out the signature. We are basically doing mutation testing. What we found were tests where an input change didn’t propagate to the output. Mutation testing looks at the quality of the tests. It has nothing to do with the device. It shows you which functional coverage should be discarded because the result never went into the signature. So a lot of the testing was a waste of time.”
Common points
Fault analysis creates a common point. “Another related area is automatic test pattern generation (ATPG),” says Jaeger. “If you try to come up with a test pattern that exposes potential issues, those two are pretty tightly linked together when it comes to the functional tests. Fault insertion is probably the area where, from my perspective, there is the most progress, and the most standardization efforts.”
There also are limitations in manufacturing test. “With design for test (DFT), we are seeing more functional tests being created to get to zero defects, because ATPG by itself isn’t sufficient,” says Ramirez. “Fault simulation is performed to grade the effectiveness of those functional tests. The coverage results are getting combined across ATPG and functional fault grading in order to give a single view of their test effectiveness.”
Functional verification also is being extended beyond sign-off. “Functional tests are often looking for things like performance or latency, integration between systems, error handling, etc.,” adds Ramirez. “You need more than pass/fail to help triage this, so you build in additional tracking information to help understand why something is failing and how to reproduce it in a higher-fidelity environment. However, much of this is non-standard, and thus can be difficult to re-use and combine with other testing results. The other challenge with functional tests is getting real-world use case stimulus to test under the right conditions. This is again where something like a digital twin, which we are seeing for autonomous driving, can help create real use-case scenarios for performing functional tests before you build the vehicle or do road testing.”
Built-in monitoring provides new opportunities. It allows you to measure actual system performance or power consumption, and that data can be collected and transmitted back to the company that developed the system. “Functional verification has not done a good job of connecting with software, but we are making progress,” says McKellar. “There are safety features in the hardware that are above basic error checking and correction around memory, and there are features that are a combination of software control and specific hardware. For example, you may have a range of checksums that indicate if you are good. The software can detect these, and then runs a frame with a known good value as a reference.”
New functional verification methodologies are using generated software that runs on the embedded processors to exercise the design and check the results. These often use micro-kernels that facilitate the verification and often use either real drivers or specially modified test drivers.
One example of this is produced by Valtrix Systems. “We have bare-metal software specifically designed to serve as a platform for the design verification of IP/SoC implementations,” said Shubhodeep Roy Choudhury, CEO of Valtrix Systems. “The software stack consists of test generators, checkers, device drivers and a lightweight kernel, which can be configured into a portable program as per the needs of the verification environment. A consistent execution environment and reliable failure reproduction makes it easy to recreate a post-silicon failure in pre-silicon tools. In addition, debug hooks and mechanisms implemented in the tool allow rapid failure debug and resolution.”
Testing from the inside-out is becoming a widespread practice. “Sometimes you load a specialized kernel, and that may be used for diagnostics,” says Jaeger. “We ship test programs with our emulation products, with a runtime environment around it. The user can, at any time, run these test programs on the system. They tell you that this cable is not plugged in, or this wire on that cable is broken, but also things like if there is a bad contact or if you have the wrong cable — and, of course, defects in the individual components themselves.”
Some tester companies are beginning to notice. “There are a few large tester companies that can take in waveform information, or functional design information, and use that as a starting point to develop test programs,” says Jaeger. “It’s very opportunistic today, and there is no industry-wide effort put into it, but there are the beginnings of it — test programs, or developing tests using information that is already available, and was made available, or created, during the design verification of a chip or product.”
In the functional verification space, work has been done to bring together coverage data from very disparate tools. “Common and open coverage databases that were built for functional coverage can be used to collect any kind of test information and then used to help paint a holistic view of the testing,” says Ramirez. “That information is being integrated into requirements management tools to provide closed-loop traceability to prove that every requirement has been tested. This is not only important for safety critical applications, where it’s a requirement, but it’s becoming increasingly used by anyone doing chip development because it’s a good design practice that can help you identify gaps much sooner, and that ultimately reduces costs.”
Conclusion
Can we expect more cooperation in the future? “Test and verification are very different domains,” says Jaeger. “But there are opportunities for each domain to learn best practices from the other domain.”
The teams are a long way apart today. “There is actually more specialization happening today,” says Davidmann. “Is there overlapping in these techniques and goals? The goal is about quality. While they may use a lot of the same techniques, the end usage model and the way the tools evolve target different goals.”
But change also can be a driver. “Change will come from some of the real-time monitoring and how it relates to the digital twin,” says Ramirez. “I think we will see dynamic monitors that will sense threats, or at least that something is abnormal, and send that back to the digital twin to perform a full simulation where it can predict the failures that are to come. This obviously involves significant changes in infrastructure to filter and send important data from the edge devices back to the cloud where the digital twin can utilize it. This is going to be a really interesting revolution for how we test.”
Leave a Reply