High-quality and efficient verification requires a focus on details.
Verification is undergoing fundamental change as chips become increasingly complex, heterogeneous, and integrated into larger systems.
Tools, methodologies, and the mindset of verification engineers themselves are all shifting to adapt to these new designs, although with so many moving pieces this isn’t always so easy to comprehend. Ferreting out bugs in a design now requires a multi-faceted and more holistic approach, particularly with AI/ML, safety-critical designs, and the proliferation of multi-chip packages. As a result, verification engineers now require a deeper understanding not only of problems that are found but also of those that aren’t found, shifting the verification task both left and right.
This requires a certain mindset for verification engineers, as well as a willingness to adapt to the new design landscape. “You have to be a person that likes to understand how things work,” observed Hagai Arbel, CEO of Vtool. “This is engineering, in general, but to be able to take apart or to want to take apart a machine and understand how all of it works is one fundamental. This also comes with a sense of responsibility. Good verification engineers are the kind of people who say, ‘It will work. Don’t worry, I have it covered.’”
Success also needs to be tied to a solid understanding of the overall verification challenge — something that has been less defined in the past due to divide-and-conquer approaches and the short lifespan of most chips.
“What’s missing is a definition of verification quality, and a definition of when the verification is done,” Arbel said. “Automotive standards are contributing to more stringent quality definitions because nobody can live with a scenario in which a chip company does the best it can, but if the chip doesn’t work, the customer will not buy it, and it’s the developer’s problem. If you run out of time, and you’re going to miss the market window, you will take a bigger risk, but you should be able to say, ‘Tool X gave me a grade of 82 out of 100, and I will still go to tapeout.’ Or to say, ‘No. We are only going out with 95 and above.’ Functional verification and code coverage wants to give you all of that, but they are not enough. What is needed is a good way to formalize the logging of information and the ability to make conclusions, because you can only remember so much.”
In the same vein, Juergen Jaeger, product management group director at Cadence Design Systems said a few qualities that make a great verification engineer include being very organized. “In verification, it’s all about structure, discipline, accurate documentation, and the desire for perfection.”
Great verification engineers are also very curious, have a desire to understand the to-be-tested product, and also have a desire to understand how an end-user would use the product, he said. “Great verification engineers are persistent and have a mindset of never giving up in finding elusive bugs. Creative and courageous in finding new ways of doing things, no matter if it has been done before or not. And, they are not obsessed with perfection. They know when enough is enough, and that nothing will ever be 100% perfect.”
Fig. 1: Zeroing in on an endless loop bug. Source: Vtool
Safety-critical markets such as automotive add other important elements into the verification process. One is a longer projected lifetime, which means that latent bugs may become serious problems as devices age. Bugs that may go unnoticed in a consumer device may cause injury, death, and liability issues in a car or medical device, requiring a level of attention to detail and a focus on finding hidden bugs that were previously less vital.
“For safety-critical systems such as avionics, safety is of the utmost importance,” said Louie De Luna, director of marketing at Aldec. “An FPGA chip with a Design Assurance Level (DAL) A has 10⁻⁹ probability of failure per flight hour. It’s extremely improbable, but a failure of this FPGA is classified as catastrophic and would prevent the safe flight and landing of the aircraft, resulting in fatalities of all occupants. Just having a mindset to verify such functions that would produce extremely improbable conditions is insufficient. What’s needed is for the entire team to adopt an appropriate culture where they can follow well-structured best-engineering practices, such as those defined in the DO-254 specification.”
What gets verified
The first step in any verification process is understanding the problem that needs to be addressed.
“For verification quality, one of the first things a verification team has to do is define what is being verified, and what the metrics are,” said Simon Davidmann, CEO of Imperas. “As the famous quote states, ‘If you don’t measure it, you can’t improve it.’ You’ve got to measure what you do in detail. Good verification teams sit and write a verifications spec right upfront. When they look at the problem, they determine what they have to do, what they have to specify, what they have to verify. They’ll study the specifications before they pick anything up. If the spec says it can be A or B when they’re writing the verification plan, it’s got to be tested at A, it’s got to be tested to B, etc., and they write down the details of what they’re doing.”
As the verification plan is being written, the team needs to think carefully about the specifications. “When they work out a plan, they will tend to review that plan and work out what the metrics are. Then they’ll start putting those things into place, including specifying what it is that they will try to measure. Then they’ll try to put the measurements in place, and start on the testing to see if they get there,” he explained.
There are likely different teams for test and design, so the spec is interpreted by both to check for correctness. If it’s a processor, there may be a third team doing the toolchains, as well, so they’ll end up with three different sets of eyes on it. The goal in all cases is functional coverage, even though the level of coverage and the metrics to define them can vary greatly by application, as well as from one component of a system to the next. But understanding where to draw the line requires a deep understanding of potential problems and what can cause them.
“There are two aspects of verification quality,” said Philippe Luc, director of verification at Codasip. “One is to cover everything to ensure that the system is working. The other requires a white hat hacker to try to break the design before the customer does so that the bug is found before the customer finds it.”
Like a metal brain teaser, there is a solution to break it, and it can be figured out, he said. “We can say there is a bug in this design because there is a way to detach them. If a project manager puts more pressure on the delivery, more pressure to have 100% test passing, what a young engineer might do is just make sure to write some tests. Even if they achieve 100% coverage, they may conclude the brain teaser can’t be separated. Similarly, every design is a brain teaser, and you have to break it. You have to break it, and then there is a very simple solution to unlock it. And the goal of the verification is to try to find where the design is broken. Go ahead and break it. Then you will find a solution.”
Experience counts
While a verification plan in black and white is critical, there is a less-obvious sense that many verification managers and engineers obtain over time. It’s more like a verification hunch, a sixth sense, or feeling of where to look for bugs and other issues. This is difficult to describe, let alone teach to other engineers, but it’s a trait that great verification engineers have.
“In the office, we have four different brain teasers,” Codasip’s Luc said. “This is a way to develop the critical spirit to say, there are bugs everywhere, you have to find them. In order to develop that, there is a trick I will share with you. It is just asking the designer how the block is working. You get information on where there’s a FIFO, where there’s an arbiter, etc. You listen to that and start to ask questions. ‘What happens if there are two requests at the same time on this arbiter?’ They might answer, ‘When this happens or this happens or this happens, then it works well.’ Or the designer may say, ‘When there are two requests, I don’t know. Perhaps it will behave this like.’ This is a sign that the designer doesn’t understand the big picture. As a verification guy, I keep that in mind, and it means I probably want to test more with arbiters.”
Unfortunately, there’s no good way to learn from one process to the other. “This hunch is somehow related to a kind of déjà vu,” said Vtool’s Arbel. “It is similar to being on a street in a familiar place. You need to find a particular store. You know where to go, but if you had to explain it and the other person doesn’t know the place, you will have no idea what you did. Similarly, in chip design, because there are a lot of similarities in chip functionalities, the bugs that are being created can be grouped. But with a lack of good methodology of how to log that, and how to remember that, in order to really benefit from it you need to see a lot of them. Then you develop this kind of hunch. It’s simply that you saw enough, and you have a good enough database in your head. Even though you’re not working with conclusion or logic, necessarily, you simply know. To some extent, it was burned in your memory.”
Experience weighs heavily in verification, and it can have a significant impact on where verification engineers spend their time — and equally important, where they don’t. “As a verification engineer or manager, over time you realize the things that have been overlooked, and you get a feel for where people are going to overlook things,” said Imperas’ Davidmann. “In the RISC-V CPU world, there have been a lot of people spending time worrying about the ISA definitions. But actually, the issues and bugs that are found aren’t really in the ISAs, they’re in all the complex asynchronous bits that interact with other things. Yes, there are going to be bugs in the ISA’s simple instruction set, but those things are relatively simple in comparison to the three-dimensional, time-driven asynchronous aspects that people can’t really get their heads around because it’s all over the place.”
Additionally, when a bug is found, a smart verification team will look around it for the other bugs. “How come that bug wasn’t found by the other methodologies? Some people do it wrong,” said Davidmann. “They call it debugging designs into existence, where they’ll fix this bug, then fix the next bug. That’s not the way to do it. When you find a bug you’ve got to say, ‘How come that bug wasn’t designed out in the design process? How come compilers didn’t find it? How come it got to me? What could we do better?’ Rather than fix the bug, what you should do is fix the upstream processes and things like that to make sure the bug doesn’t happen in the first place because what happens is that bugs typically cluster. There are other routes close to it that have come through to it. Solving this is an art, not a science.”
At the same time, Cadence’s Jaeger does not agree that verification experts often develop a hunch of where problems may occur. “Verification is not about following a ‘hunch.’ Verification is about discipline, logical thinking, deduction, statistical probabilities, and mathematics. You might get lucky occasionally following a ‘hunch,’ but you will not be able to repeat it predictably and reliably. And you will not be able to teach ‘following a hunch.’
Preventing bugs in the first place
So how do DV teams get to the point where they understand how the bug came into existence in the first place, and how to prevent it?
“There are three different aspects to this,” said Neil Hand, director of marketing for IC verification solutions at Siemens EDA. “First, there are tools that prevent you from even getting the error there in the first place. We’ve had formal tools for a long time. We’re now working on linting tools. There’s also the idea of a platform meant for designers, taking verification tools — but without the need to understand verification — just to eliminate bugs before they even get into the design. Run early, run often, don’t worry about constraints, don’t worry about lots of false failures, just get the low-hanging fruit out of the design.”
Second, identify how to target the verification to areas that are most at risk. “No one wants a bug in the design. Where is the highest probability of that bug? Try to use AI and ML to go after that,” he said.
Third, learn from the bugs. “Identify why this happened. If we look at some recent ML projects, some of them are about ranking tests, some of them are about identifying which check-ins are the most likely to be problematic, some about identifying causes and effects. You can start to use this history over time, even if you can’t share it between customers. This can be used within your own company, by moving to a cloud-centric data model, by moving to an ability to look at coverage over time, and across products. Doing analysis on that, you can start to get insight,” he said.
Fortunately, there is already a natural tendency to design so there aren’t bugs to begin with, he observed. “That’s by using more IP, where IP is already pre-verified, and by using high-level synthesis where you’re writing fewer lines of code. We, humans, are imperfect, and the more code we write, the more mistakes we will make. With high-level synthesis, we’re writing a fraction of the amount of code. Also, there are just good design styles. 20+ plus years ago when I was an engineer at Ericsson, we had several binders of design style guides that would tell us how to build these things. So whether it is linting tools, good design style, the use of high-level synthesis and/or formal verification technologies, you can try to eliminate bugs before they get in, or at least make them easier to find.”
A lot of issues could be avoided if designers and verification architects sat together before they started creating the RTL itself,” noted Pratik Mahajan, group director R&D, formal verification at Synopsys. “At the verification plan level, it’s important to clearly distinguish a number of things. What is the control part of the logic? What is the data path there? Where are the security regions? All of this information is available to the architect at the time of designing the architect plan. Do this before you start doing the RTL, and then you could even identify and debug critical parts. ‘These are the parts of the interfaces, this is the IP that I’m going to get from someone else, so I’m not bothered about the verification part of it, I’m only bothered about the interfaces.’ That’s the part users can identify. Designers themselves are writing white block assertions. Those are the most effective, and they are the best for the formal tools because from a designer’s perspective, they’ve written the assertion in the scope of the module or the block that they’re looking for but the scope is very limited there. Formal verification can churn them out very fast, and once the segregation is very clear that everyone knows, this is the part for which the designers are going to write all the white block assertions, and the verification engineer is going to write all the interface assertions there. Then there is the data path, which exists in the design, and a completely different tool will be used to do the datapath verification.”
One of the areas that is getting much more attention these days is security verification, which needs to be done at the system level. “There may already be a set of blocks identified as secure, or insecure, and how the dataflow is happening,” said Mahajan. “At the architecture level, sometimes it is decided that this chip is going to go into automotive, so redundancy logic is added. Once you have that complete thinking, that makes it very difficult. This is why the architect, the designer, the verification architect, the security architect, and the functional safety architects all have to come together to do that planning upfront.”
Conclusion
With all of this complexity, it takes a certain mindset to keep everything in a straight line for the highest quality, and most efficient verification. This comes down to finding the right people, tools, and methodologies.
“This speaks to the first days as a student starting to do verification with the whole notion of creating a design, and wanting to see that it works,” said Vtool’s Arbel. “How will we see if it works? We’ll create another design in a different language. We’ll call it verification by a different team. We’ll compare them one against the other. If they are not behaving the same, then we will see who’s right. If they are behaving the same in enough scenarios, we can conclude that it is safe because the two teams both had the same interpretation of how this should work. This is crazy when you come to think about it. We’ll do it twice, in two different representations, and then we’ll compare.”
Still, not everyone is cut out for this kind of work. “There is a certain type of person that is good at finding bugs,” Davidmann added. “Some people’s minds don’t pay attention to the detail. Some people are wishy-washy in architecture, and that’s what you need in certain places. But when it comes to verification, you’ve got to be pedantic, and precise, but also look at things differently. To be a good design or verification engineer you’ve got to think differently. Some people are really good at verification, and some people aren’t. It’s not the brightest, the smartest, or the most educated. It’s just the way their mind works. It’s all about detail, and not making assumptions. If you assume nothing then everything becomes explicit and nothing becomes implicit, which means that, when it says it’s going to do something, don’t assume it, test it, and be precise.”
Leave a Reply