Debug: Last Bastion Of Automation

Folklore erroneously claims verification consumes 70% of development time. But does debug really consume 50%?

popularity

There have been a number of times when anecdotal evidence became folk law and then over time, the effort was put in to find out whether there was any truth in it. Perhaps the most famous case is the statement that verification consumes 70% of development time and resources. For years this “fact” was used in almost every verification presentation and yet nobody knew where the number had come from.

All development teams are different, and even though this was a mean figure nobody attempted to come up with a definitive answer. In 2007, Mentor Graphics began a series of studies, first using the help of Far West Research and from 2010 onwards using the Wilson Research Group, and the truth became known. In 2007 it was not true. Verification consumed only 46% of development time according to the survey results. It is not true today. While it has been increasing and today stands at around 57% that has not stopped most papers and presentations from using the fictitious number of 70%. However, it may be true that the most advanced designs are somewhere around this figure because 70% is correct for about 20% of the industry.

brian1

Another often-stated “fact” is that debug consumes 50% of development time. Semiconductor Engineering wanted to know if this is true, what the trends are, and what is happening in terms of development of debug tools that can change this figure in the future.

Michael Sanie, senior director of marketing in the Verification Group at Synopsys, says that “conservatively, 35% to 50% of time is spent in debug and it could go beyond that depending on the state or size of the project. Within our internal IP development, 35% is a safe bet, but I am not sure that anyone has sat down and counted the hours. It is not black and white. When you are working on debug, you may be working on other things at the same time. With RTL, you may be debugging the RTL, the testbench, the coverage. Is that all debug? It depends upon how you define it.”

This is the first problem in terms of coming up with a definitive number. “In the Wilson studies, if you ask verification engineers what percentage of time they spend in debug, it has been consistent at 37% for the past two if not three years,” says Harry Foster, chief verification scientist for Mentor Graphics. “But this can be misleading in a couple of ways. First, that does not account for the design engineer’s time spent in debug, and we know that designers spend about half of their time in verification of which a significant proportion of that time is debug. So the total may come close to 50%.”

Sanie agrees. “Anecdotally, verification teams are getting larger, and for many development teams the verification team is larger than the design team. If 50% of verification is debug, and if the team is expanding, then it means that the debug problem is probably taking more heads. We also know about the expanding complexity of verification and design. When you mix other technologies into that, finding the root cause becomes a lot harder.”

Foster also warns about getting too carried away by a number. “What does 50% mean? A couple of years ago I met a design manager whose team was so well organized that they could put together testbenches in almost no time. He claimed that they spent 85% of their time in debug because everything else had been optimized.”

Even though verification engineers are spending roughly the same percentage of time in debug, Foster believes debug is growing due to today’s increased requirements.

He has plenty of company. “Debug is consuming more of the development time due to the increased complexity of the designs, and it’s not just limited to verification,” said Anupam Bakshi, CEO of Agnisys. “The need for debug comes about because there is lack of clarity. The lack of clarity comes about because somewhere in the design flow misunderstandings have crept in. The reason for that is lack of concrete and unambiguous specification — mostly at the interface points, such as the interface between various modules in a chip, interface between hardware and software, interface between verification and validation.”

Debug complexity
It’s important to note that debug isn’t limited to any single group or market. “Debug is a challenge for the whole industry,” says Chi-Ping Hsu, senior vice president and chief strategy officer for EDA products and technologies at Cadence. “Whenever we talk about advanced nodes and costs being high, mask costs, and things like that, they are a very small fraction of the real challenge. The real challenge is putting all of the IPs together to verify that the design works properly, and if it doesn’t, you need to figure out what do you do when you find issues. How do you debug it when you are using someone else’s IP, and on top of that, the only way to really verify everything is to put the software together in the verification environment?”

Others are also seeing the problems related to IP reuse. “Debug has always involved a process of ‘onion-peeling,’ but the onions now have many, many more layers,” says Dave Parry, chief operating officer for Oski Technology. “Also, as designs have grown in size and complexity and product life-cycles have shortened, the use of pre-existing IP, via both third-party IP purchase and internal reuse, has proliferated. This can further extend debug time as the nuances and idiosyncrasies of those legacy/purchased IP cores may not be well understood, and a core that worked fine in a previous design may not function perfectly in a new one. One can be lulled into a false sense of security that the core is ‘known-good’ logic.”

Debug is not a singular problem. “The issue is not only about the size of the project but also different types of data and target engineers performing the debugging,” says Zibi Zalewski, general manager of the hardware division of Aldec. “The complexity of debugging solutions grows even faster for safety-critical projects like avionics or automotive. These often require no design modification and at-speed testing, which generates tons of data. These extremely large datasets require very sophisticated tool debugging architecture. Even simple debugging requires advanced utilities to visualize, analyze and compare with golden patterns.”

There are indeed many data types and targets. “SoCs contain multiple modeling domains,” says Foster. “Functional, clocking, power, security, software and then there are other physical concerns such as hydraulics, sensors… Debug is increasing because of all of the other things that have to be done today and they all need to operate together to solve a designers problem.”

Most designs today are mixed in some manner. “Analog and mixed-signal designs are very different and having to debug across that adds a new type of context,” points out Sanie. “They are totally different and when we add power and abstractions… It is an n-dimensional problem.”

One way to make the task simpler is to view the problem differently. “Partitioning the sign-off process by verification vulnerability is the new and successful paradigm for managing complexity in the billion-gate SoCs,” says Pranav Ashar, chief technology officer at Real Intent. “It has caused EDA companies to focus on providing complete verification solutions for individual narrowly-scoped verification problems, one at a time. This is a major mindset shift from the previous approach of selling generic tools like simulators and assertion-based formal verification. This new paradigm benefits analysis as well as debug. The analysis outcome is actionable data that is aligned with the verification-problem context, which, presented in the right manner, allows for a systematic debug workflow.”

But not all problems can be segregated in this manner. Some problems require a process of discovery that starts with visualization and goes through root cause discovery until the problem is found and can be corrected. There is no one right way to do this. There are two fundamental approaches. The first is to discover a problem and then attempt to find the root cause of the problem and determine a solution. The second method is to do analysis on a system and attempt to discover a class of problems using broad analysis techniques. (Both of these will be discussed in this series of articles.)

A second set of issues is that debug cannot exist at a single level of abstraction. Looking at signal waveforms may be good for some classes of problems, but other problems do not show up until the full application software is running on the complete hardware system. At this point, the system cannot be run in a software simulator, and probably not even an emulator. Prototypes may be the only way to get enough reach into the system to be able to exercise the problem. Capturing traces at this stage may present so much data that the analysis problem becomes untenable, as well as causing a host of other problems. Specialized hardware that peeks into the functionality or performance of the system may become necessary to be able to get a clear picture of what is going on.

The debug process
It all starts in verification. “First you notice that something doesn’t work and then you create a hypothesis as to what might be wrong or where the problem might be,” says Drew Wingard, chief technology officer at Sonics. “The problem is that without adequate tools, finding the right hypothesis can take a long time.”

Anything that can be done to optimize debug is important for a project. “From a management perspective, debug is insidious in that it is unpredictable,” explains Foster. “I can get metrics for various processes such as writing testbenches, constructing tests, and that can be consistent across similar projects, but debug can be all over the map because of its wide variance. It is a management nightmare.”

There also could be several starting points. Consider coverage. “When you start using coverage as a metric for closure, you spend a lot of time on coverage debug,” points out Sanie. “What causes your stimulus not to be able to activate something? What if it can be activated but not propagated? All of these add layers of complexity. Then you can add formal on top of this. If you are using formal for bug hunting, that is a whole debug process by itself. Debug is not just waveform viewing.”

Other people agree that there are multiple tools that can be used to help in the debug process. “Debug is separate from visualization,” explains Frank Schirrmeister, senior director of product marketing in the System and Software Realization Group of Cadence. “Visualization uses waveforms and everyone uses them. They have been some improvements such as displaying transactions and the inclusion of software into the same diagrams. Then there is lots of improvement going on such as bringing in UML diagrams. UML has finally found its way into EDA, but not in the design flow. UML helps with the visualization of scenarios. It provides a way to align the scenario expressed in a UML type diagram where the graphical representation is ubiquitous. When you define a scenario you can set a breakpoint such as ‘when these four blocks have been executed in sequence’ and see where the software is at that point.”

Schirrmeister separates this from another class of tools. “Root-cause analysis (RCA) is where you can trace through a system and you can do things such as click on a line of code or signal and find the last time it was touched. This type of capability allows you to home in on the problem and provide guidance to the problem.”

Another class of debug tools may not even be considered to apply to debug. “The best form of debug is bug prevention,” points out Sanie. “As soon as you write code, you can start finding bugs. Traditionally, something that looks like lint is considered to be a design tool, and something like (CDC) as a verification tool. So debug is a continuum. Debug is the usage of different techniques at different stages of the design coupled with early bug detection.”

Foster is in full agreement. “One of the most active areas of development is actually bug avoidance. If there are stupid bugs that remain in the design to be found by the verification flow, then it will cost me more to find and debug them there rather than making sure they were never there in the first place.”

There is some hope here, though. “The way out of this debug nightmare is adherence to standards and specification automation,” says Agnisys’ Bakshi. “Here is an example: A decade ago development teams used to spend countless time in debugging registers and interrupts in the lab. Now, with better tools that automate the specification, this area of the design requires very little debug. Similarly, earlier much time was spent on debug of proprietary bus transactions between various IPs, but now with adherence to bus standards like AXI and interface standards like IP-XACT, debug time is minimized. Specification automation and standardization do not exist everywhere, but wherever they exist the time spent on debug is minimized.”

An expanding problem
Debug today is not the same as it was even 10 years ago. “It used to be that we could separate physical and functional, but now we have power concerns that can affect functionality, so we have to simulate aspects of both,” points out Foster. “When you start mixing concerns, the complexity goes up significantly and that is reflected in the amount of effort required.”

An additional problem is that the data required for debug may be different for each of those concerns. Recently Mentor and Ansys announced an API that would allow power information to be transferred from an emulator to a power analysis tool without requiring the traditional intervening step of writing the trace data out to a file.

“This uses a very different set of data than what I would need for functional debug,” Wingard explains. “It would be great if all debug required the same data and interfaces. For power analysis you are interested in switching activity and toggle rates. That is all they care about. For functional or performance debug you need a lot more data. The data you may be interested in varies widely. If I am looking for a performance problem, I don’t really care about the bits of data that are moving at all. They are a nuisance. I care about the order of the transactions, maybe the addresses because it impacts the memory system scheduling, whereas if I am looking at the interaction between a couple of processors, who are doing some kind of synchronization, maybe I care about the data a lot.”

Performance and power debug require different approaches than functional debug. “When companies spend time on top-down design, it happens because of performance and power,” says Schirrmeister. “This is about hardware architecture and you are stressing the design with stochastic data representing things such as peak traffic. At the system level where you are looking at threads and tasks and you need information about process load or memory loads.”

brian2
Courtesy: Frank Schirrmeister, Cadence; Rob Kaye, ARM. ARM Techcon 2015

“Performance is one thing that people want to see, including the resources they are using,” says , chief executive officer for . “This is now extending into power, as well. Today, design teams have timing budgets and power budgets. This applies not just to those teams developing new chips, but the problems are equally applicable to devices that are built using standard parts. They really want to know about the power profile so that they can optimize the way that the software is running to reduce power.”

But it is not just power and performance. Davidmann reminds us that “there is another area of software that is becoming more important and that is related to security. People are trying to understand how to develop software that can meet these challenges and we need the tools to help with this.”



Leave a Reply


(Note: This name will be displayed publicly)