Getting Formal About Debug

Formal technology is playing an increasing role in the debug process, and some fundamental new directions are being explored to make debug more automated.

popularity

While much of the design and verification flows have been automated, debug remains the problem child. It has defied automation and presents a management nightmare due to the variability of the process. In recent articles about debug, we examined how much time development teams spend in the debug process and some of the reasons why it is becoming a bigger problem. This includes issues such as expanding beyond being a purely functional problem into other areas such as performance, power and security.

Another recent article examined the expanding scope of debug. It is no longer just a hardware functionality problem, and debug today has several stakeholders, all of which may want different data, different forms of visualization and different analysis tools.

In this article, we will look into the ways in which formal technology is becoming increasingly entwined with the debug process, the reasons why debug automation failed in the past, and some new developments that may bring the problem under more control.

Debug and formal
Formal technology can help the debug process in several ways. “Formal can provide huge productivity leaps that enable you to see a problem in a waveform and then get the formal tool to generate a waveform for that example,” says Frank Schirrmeister, senior director of product marketing for the System and Software Realization Group at Cadence. “Then you can do what-if scenarios.”

Even if you have a trace, it may not be very useable. “Formal analysis can describe failures in terms of counterexamples with the shortest-possible path to failure,” says Dave Parry, chief operating officer for Oski Technology. “As a result, the design or verification engineer can see the misbehavior in its simplest and most direct form, with the events leading to the failure concentrated in a small number of cycles. This is instead of wading through thousands of simulation cycles with the events leading to the failure spread throughout.”

But what if you don’t exactly know where the problem is? “Formal is of great value in recreating and refining error conditions from partial error data and limited visibility onto signals, and trying out different scenarios,” says Dave Kelf, vice president of marketing for OneSpin Solutions. “In essence, a verification engineer assumes some error condition, assumes some more things that are known or suspected, and can ask the tool how this can happen. Being able to start the analysis from a partially specified state of the design is the big advantage of formal.”

The combination of simulation and formal working in tandem has led to a number of solutions. “The full solution in some cases is a combination of static-semantic, static-formal and simulation methods,” says Pranav Ashar, chief technology officer at Real Intent. “Even when the bug is eventually found in simulation, the static analysis methods would have created instrumentation in the testbench for monitoring and stimulus generation that would make the post-failure-detection debug process more directed and focused.”

Ashar provides an example of where this could be used. “A good example of the last point is , which is verification in the presence of simulation X values. An upfront static analysis would be able to instrument the testbench for detection of X-optimism, X-pessimism and X propagation during simulation, so that if an X-related failure is detected in simulation, finding the root-cause becomes much easier.”

Oski’s Parry provides another way in which this capability is being utilized. “Present-day formal tools are capable of tracing an identified problem to the specific line of RTL code that is leading to the failure. Using assertion-based verification IP for well-specified interfaces, an engineer can divide-and-conquer more complex debug problems. They can push formal and checkers used at the simpler block level into simulation as assertions at the more complex sub-system or system levels to decrease time to failure when debugging. This provides more detailed insight into the source of the failure.”

Problems found in simulation can also be turned into formal problems. “Bugs discovered in system-level simulation, or post silicon can be reproduced in the formal environment by adjusting constraints and/or checkers in the block-level formal testbench and rerunning it,” continues Parry. “This saves the engineer from trying to duplicate long, complex input/event sequences to get the block or unit level simulation model to precisely mimic the system-level behavior.”

One further use of formal is to trap structural problem types such as an out of bounds array access. “In these cases, the tool can inspect the operation of a design and see if an incorrect array access is attempted, immediately highlighting the problem,” says Kelf. “Similarly, issues such as zero division, parallel case, and other recognizable issues have led to bugs in the design.”

Early debug automation attempts
There have been a number of attempts in the past to bring certain aspects of automation to the software debug process. Many software companies use triage processes where failures from overnight runs are automatically analyzed and shipped to the person they believe was responsible for creating an error. But this has significant limitations when applied to hardware due to the increased level of interaction and concurrency. “In complex systems, the number of latent defects is unbounded,” adds Drew Wingard, chief technology officer at Sonics. “Just because I changed something doesn’t mean I am responsible for creating the defect. It could just be that I have activated a corner that has never been executed before.”

Some failed companies have integrated formal analysis into the debug process in an attempt to get the user closer to the root cause of the problem. “These solutions were heavily dependent on formal technology which had problems scaling,” explains Harry Foster, chief verification scientist for Mentor Graphics. “Another problem is that the technology worked well if they contained assertions, but the whole industry has not gotten behind them. Without assertions, you would have to enter pragmas into the code to identify what is being checked. There were both technology hurdles and methodology hurdles that they ran into.”

Some of them may just have been ahead of the technology curve itself. “There have been early attempts which worked great when used by specialists on the formal side,” points out Michael Sanie, senior director of marketing in the Verification Group at Synopsys. “A lot of debug automation companies were missing key engines. Today, formal is getting adopted when mixed in with simulation, and this really adds value to the process. It is the combination of simulation engines and formal technologies that have expanded the debug capabilities.”

Root cause analysis would be a highly valuable tool for any development team if the technology could be advanced. “We believe that there is much more that can be accomplished using formal in this area,” says Kelf. “One approach could be to take the list of possible fault sources and then use formal to eliminate false sources to accurately depict the real problem.”

While some companies may not be adding assertions in a general sense, verification IP and specialized assertions are helping with a range of problems. “We encourage our customers to instantiate performance monitors and hardware trace capabilities into the target side of the network,” says Wingard. “Then we have rich filtering algorithms that can accept triggers from other parts of the system, detect trigger points from traffic flowing past the target and measure bandwidths and latencies. It can then periodically report the value of those counters out to some memory or actually do hardware trace where you extract out sets of transactions that flow by and dump them into a file.”

Adoption today
Because debug can consume about 50% of the total development time, you would expect every company to be using the very latest technology available. That is not the case, however.

“One problem is convincing people that they need a solution,” says Foster. “It can take education and awareness before people realize that they need a solution, especially when they do not keep metrics to see where their time is going. Time to root cause would be a good metric, and that may increase awareness. I would track the class of bug and how long it took so that I can categorize modules for the future and say that these have a high likelihood of being difficult.”

There is plenty of evidence that productivity can be greatly increased. “Users are becoming more productive by doing this kind of debug,” says Schirrmeister. “The real question is, ‘Is the manager understanding enough to allow the users to buy a tool? Is the ROI of the tool enough to make him more productive rather than asking him to get better at reading trace files?’ We are seeing increasing demand, especially for software execution on virtualized hardware. Being able to look at things are important to users.”

Adoption of some debug tools does not happen without education and training, and that can prevent or slow adoption because design teams are too busy trying to get the product out of the door. “In software simulation, progress has been made in terms of determining causality and these enable what-if types of analysis,” explains Foster. “But being able to see causality requires some sophistication. To really understand some of these issues, you have to do statistical analysis, and you often cannot observe this in a simple simulation. This is often the case with multi-thread behaviors.”

New technologies
One common aspect of debug systems is the huge amount of data they have to deal with. “We have previously announced the use of big data in our debug products,” says Schirrmeister. “The files sizes we are dealing with are huge trace files. Consider the amount of data that can be spun out of an emulator in one minute. Just downloading it takes longer than creating it. Dealing with it in a network is difficult. Big data has a role in this.”

But big data is only a part of the solution. “Big data and machine learning can both help with debug,” says Sanie. “The volume of data and how you manage it is one aspect of this. Then you need to look for patterns and learn from that, and apply that to the next set of data. This applies not just to debug, but to many factions of EDA. The industry is still working on the best ways that we can do analytics.”

“Root cause analysis is related to this,” agrees Schirrmeister. “Big Data techniques can give you the ability to hone into a problem faster and to get to the relevant data within the database. But we still have a lot to do in machine learning before we can give the user unsolicited advice. You want to have a watcher on your emulation run that can make you aware of interesting things that are going on, which may not have been your primary focus. These types of analytics are being looked into.”

How fertile these techniques will be remains a matter of conjecture. “A vast and nascent opportunity is to mine simulation data using static-formal, semantics-aware and machine-learning methods for failure detection and debug,” says Real Intent’s Ashar. “Some attempts have been made in terms of trying to reduce trace lengths, narrowing the design scope for debugging a failure, and identifying candidate corrections using formal analysis techniques, but much more is possible in my opinion.”

Still, it may be easy to get carried away and allow hype to get in the way of concrete advancements. “The use of Big Data analytics could be overkill for debug, but may be useful in merging results from disparate verification environments to provide a more complete view of coverage,” suggests Parry.

Schirrmeister also contends that we keep things pragmatic. “This is not far out research. Customers are demanding it today. We are already talking about systems that enable you to take a database from a long emulation run and make it accessible to a larger group of people. It behaves just like the emulator, but is not calling the emulator. The formal team has the ability to point you to anomalies in the trace and applying that to emulation is what many would call big data analytics. We can do things such as find a combination of signals and to identify when that happened in the past. Then we can ask what happened as a result of that. Using assertions and property checking on top of it is not that far off.”

“Debug is still an open field for innovation and we have many ideas that we are working on,” says Sanie. “Debug has been underinvested in the past.”

It certainly looks as if all eyes are now focused on the debug process as the last bastion of automation in the development process. Improvements made in debug automation could have significant impacts on development times and costs, both of which are under enormous pressures in systems companies.



Leave a Reply


(Note: This name will be displayed publicly)