…Why the heck did it take me three days to find this bug?
While semiconductor verification techniques have evolved considerably over the last 25 years, the debug of design problems found during verification has barely changed. New algorithms including machine learning, visualization approaches, and problem-solving ideas allow a different approach to debugging that saves up to an order of magnitude in debug time.
Since the inception of Hardware Description Languages (HDLs) and simulation, we have viewed verification output in the same way we analyzed discrete digital circuit cards, through the lens of the logic analyzer. It’s true that there are new windows on top of the good old-fashioned waveform tool, and the GUI is a bit fancier, but fundamentally we look at tool output, signal- by-signal, one time tick at a time.
In this period, we have seen verification tools surpass simulation, with new emulation techniques, formal verification and more. Test descriptions have transitioned from 1s and 0s, to complex Object Oriented, Constrained Random programs. Of course, the designs themselves have dramatically grown in size and complexity, presenting us today with the full-blown System-on-Chip (SoC). Yet still, we examine and inspect these developments with our traditional signal level perspective. It’s not surprising that debug requires 50% or more of the total verification time, the most resource-intensive phase of the entire chip development process.
We have decided to apply new thinking in terms of debug methodology that makes use of psychological studies into problem-solving and large scale data analysis. For the last three years, we ask ourselves questions like: What is the nature of design bugs? Can we categorize them into groups? Why do some debugging processes take longer than others? Then, we’ve conducted a series of experiments in order to validate our assumptions and findings.
One thing we’ve noticed after logging hundreds of bugs and their debugging process is that bugs can be categorized into several types . There are exceptions, of course, but in the vast majority of cases, a bug would fall into one of them. To read more about these categories, you can refer to our white paper.
Secondly, we explored the debug process itself. The debug process starts when a failing test is indicated and ends when we are sure of the root cause of the failure . In essence, it consists of a chain of assumptions, including questions and validations on those assumptions. We can look at this process as a flow chart, as illustrated below.
In this example, there are three scenarios:
The shorter the chain is, the faster we find the bug. Ideally, we wish to apply shortcuts as often as we can. However, the more times we make incorrect assumptions or answers, the longer the debug process. Achieving a balance between shortcuts with correct assumptions is the essence of effective debugging.
Some assumptions are so obvious that they are not even considered as such. We simply “know” them to be true. However, if these assumptions are incorrectly made, they can sometimes create a huge delay in the root-cause analysis process. A simple example most of us know (but may not admit), is compiling the wrong set of files…
These are examples of research topics that bring us to a deep understanding of the debug problem. It’s also the reason why Vtool has developed Cogita, a next-generation debug solution that enables a logical methodology, allowing engineers to visualize the essence of the data and track down the root cause to problems in an efficient and error-free manner. Cooperating with existing debug environments, Cogita is particularly effective on large-scale designs verified using emulation and regression simulation, offering significant improvements right across the verification process.
Stay tuned for the next blog, in which I will underline some of the solutions to these problems.
Great article.