Writing test benches is relatively quick, but up to 90% of total verification time is spent debugging. Here are some recommendations from the front lines to improve debug productivity.
In the realm of SoC verification world, it often takes a very short amount of time to write the testbench and the code, and the rest of the time — up to 90% — is spent debugging. After all, verification is essentially finding the bugs in a design.
Debugging essentially has evolved over the years on the same path and complexity curve as design. Now debugging needs to evolve to keep pace, observed Ellie Burns, product marketing manager for design verification technology at Mentor Graphics.
“Not long ago the debugging requirement was for gates and RTL, and a large design was 1 million gates,” Burns said. “Today many designs are 100 million to 200 million-plus. They have multiple processors with hardware-software requirements, -coherent networks, dynamic object-oriented testbenches, and more. Layered on top of all of this, there are additional standards such as UPF for low power and complex protocols like AMBA5 and PCIe that add additional requirements of abstraction and metadata. These requirements continue to grow as we look at the system-level debug space. Debugging really needs to keep up not only with the design complexity curve, but to keep up with the verification complexity curve, which is steeper.”
The size of designs today are driving the need for emulation, and the demand for more productivity in verification causing growth in adoption of methodologies like OVM/UVM, she offered. “When users adopt emulation they need debugging tools that provide an easy and consistent way to explore their design and find problem whether using simulation or emulation so they do not waste time and productivity moving between the two. Also, because the emulator generates a huge amount of data in a relatively short time, the debugging environment must be specialized to serve up exactly what the user needs on demand. Second, when using languages and concepts such as UVM, which is object-oriented and dynamic, the requirements for debugging are completely different than the requirements of traditional RTL. Because these concepts are interacting with the RTL, the debugger needs to be able to move smoothly between these worlds of testbench and design, and help the user make sense of and quickly find the information that is needed.”
Debug time can be cut in at least two different areas, explained David Larson, director of verification at Synapse Design Automation. “One is you’re debugging your own code. If you think your code is working, now you’re debugging the RTL. That makes our job more challenging because you’re not exactly sure where the problem is. It could be in two different worlds.”
And because verification engineers may be spending 2 to 10 times as long as they should to debug, Larson recommends the following as ways to improve debug productivity:
When it comes to the FPGA-based prototyping perspective on debug and validation, which is a growing part of the verification effort today, Angela Sutton, Synopsys staff product marketing manager for FPGA implementation, said they very conservatively estimate that over 70% of engineering teams doing ASIC designs are using prototyping as one of the vehicles to do validation.
“That really tells us that they are making a significant investment as well as time commitment to debug,” she said. “Another interesting observation on the FPGA side is when I look at the ratio of implementers — the people doing the synthesis and place & route of FPGAs versus people spending time in the lab with the system that’s been created on FPGAs to essentially act as a vehicle for validation of the ASIC —I often will see ratios of 4:1 to 10:1 people in the lab using software to do validation. That really tells me the time investment is significant on validation of ASICs.”
It also means that the cost of bugs is so high that engineers will go to great lengths to avoid them. “I look at the flow of what they’re doing — the people that are doing prototyping are simulating first, then they are emulating to gain early functional debug — they want to get the function of the design working at a relatively low clock speed relative to their final clock speed of their ASIC. Then they are going to their prototype, which lets them do functional debug but with the interfaces running real time at speed. So they are going through all of these different phases of incrementally flushing out functional bugs,” Sutton added.
A multi-player game
In addition, Kishore Karnane, product manager for mixed signal verification and Specman at Cadence noted that with the growing design complexity, users are spending much more time in debug – and drawing upon more resources to do it. “Today, it’s not like one person does all of these tasks. Because of expertise, you have an RTL designer, you have a testbench designer and then you have somebody who has expertise in all of these different levels — like you have one emulation guy, you have another guy who is doing VIP debug, another guy who is doing mixed-signal debug. It’s becoming more of a multi-player game nowadays. You don’t have one person who understands everything because it is so complex that you really need to get some of these guys sitting together and looking at an environment.”
This is where Cadence is trying to come up with an environment where all of those levels can be easily met with one integrated environment, he noted.
More tips for improving debug productivity
Verification engineers really should adopt a methodology for all of this, because debug is not just running simulations. There has to be a level of planning, whether it is a block or a full SoC being debugged.
Ganesh Venkatakrishnan, front end team lead at Open-Silicon, said the problem most engineers have is the simulation times when it comes to debugging huge SoCs. As a design and verification services provider, Open-Silicon follows a step-by-step approach, starting with debugging simulations at the block level, moving onto the subsystem level, and finally the full-chip level simulations. “This approach gives us an advantage—whatever we develop at the block level in terms of the assertions or the testbench components, we plan it in such a way that they are reusable all the way to the full chip level, as well as the subsystem level. That gives us a lot of reusability.”
“For example,” noted Dhananjay Wagh, engineering IP manager at Open-Silicon, “let’s say in an SoC there are various blocks which are talking to each other. The verification engineer really starts stimulus from one pin and he has to get something out at the other chip, but there are various blocks in between that could go wrong. Those blocks are verified individually at the block level, but when they come together there could be misunderstanding or misconnection between those blocks. When you run a test case and it just says ‘fail’ to you, it doesn’t make sense and you have to debug all the way through all those waveforms and simulation hierarchy, which is a pain for a verification engineer.”
Added Wagh: “When we do a block-level verification, we bring some assertions at the block level boundary and we use them at the subsystem level, where two or three blocks are talking to each other, or a full chip where an entire chip is working as a chip. If the test case fails, and if the interface is wrong, that particular assertion or checker will prompt a message saying that it didn’t get the data properly. We can directly pinpoint there and go there and debug the issue instead of going through the hierarchy to find the problem.”
Another issue to plan early, Venkatakrishnan pointed out, is how the testbench is modeled. As part of that, it should have meaningful debug messages to say what state of the simulation the DUT is in, or the testbench is in. The messages should help to point out if there is any information or a warning or a fatal error to exactly pinpoint where the issue is.
“If you plan well and if you’ve done the right things in your testbench, if you really plot a graph between the timelines versus the debug time spent, the initial phase is where the maximum debug goes in,” he said. “That could be close to 60% to 70% of the time. This is where initial bring up is done. Then, when the testbench matures and you have more meaningful messages, you really don’t rely on the EDA tool as such to tell you where the error is. Over a period of time, it actually should drop down to about 20%. You just have to look at the log, and see what could be the issue and then go fix it.”
Like all real productivity gains in verification the key to true productivity in debugging is up front planning, Burns agreed.
“Good verification planning is the key. For debugging, users need to plan for two things,” she said. “First, what tools and technologies will be used so you will know exactly what data will be used and how they will interact and how the results will be debugged. And second, what methodologies and techniques will be deployed. This should be planned and thought out carefully from architecture through emulation/prototype and silicon.”
Leave a Reply