Does Fast Simulation Help Debug Productivity?

Optimizing one tool in a flow has less impact than expected and may shift the problem elsewhere, such as debug.

popularity

It is nice when a reporter manages to get the scoop of the century, and that was the case at a lunch panel hosted by Cadence at the recent Design and Verification Conference (DVCon) in Santa Clara, CA. Brian Bailey, technology editor for Semiconductor Engineer was the moderator for the panel and broke the news to the crowd. Cadence had developed a logic simulator that was orders of magnitude faster than anything else in the industry. A few audience members were interested. When it was announced that it was a free giveaway to attendees, many more people were interested, as well as some Cadence executives.

Of course, no such product exists, but it was used to demonstrate that solving one piece of a flow, without tackling the entire flow, does not necessarily produce the expected gains. Having a super-fast, free simulator may find bugs faster, but now the bottleneck has been moved because the bug rate will be greater than the rate at which they can be analyzed and fixed. Thus the problem has been moved.

The panelists for “Mastering Verification and Debug Productivity” were John Goodenough, vice president for engineering systems at ARM; David Lacey, verification scientist at Hewlett-Packard; Normando Montecillo, associate technical director at Broadcom, and Frank Schirrmeister, group director for product marketing of System Development Suite at Cadence. What follows are excerpts from that panel.

Montecillo: Within Broadcom, the formal team is playing a big role in schedule compression and it is about early bug detection and the delivery of higher quality RTL. We want to be able to start exploring the design as soon as RTL is coded. During debug we attempt to match the challenges with the right tool (simulation or formal) and this simplifies the verification task. Formal has also helped with the corner cases, locating the root cause of the bug and then checking that the fix does not have side effects.

Goodenough: Our needs around debug are twofold. First, we need to reduce the latency between finding an issue and fixing it. By this we mean getting the issue to the right person quickly so that it can be fixed. Second, after you debug, how to make sure you don’t make the same mistake again.

Lacey: Our biggest challenge is increasing the productivity of our engineers, and that certainly applies to debug. We have adopted methodologies, such as UVM-e, but there are challenges around debug incorporating two methodologies such as maintaining legacy IP. There are multi-language challenges and how to perform debug in this mixed environment. We also have a new focus on mixed-signal verification and how you bring the knowledge of modeling and analog debug to a verification team that has only dealt with digital techniques.

Schirrmeister: We see three trends in debug and verification. First is the ability to be able to re-use verification from the IP to the subsystem to the chip level. Second is being able to perform verification across engines, including both static and dynamic engines, virtual platforms, RTL executed in a simulator, or FPGA. Third is reuse across disciplines. Systems have become so complex that you need to have many experts interact – hardware, software, power, etc.

SE: Is debug of design or verification more difficult?

Goodenough: They are equally difficult. The problem is that you are integrating lots of other people’s IP and this can come from different people, different companies and different geographies. It is a knowledge management problem. You need to be able to share contextual information from one person to another.

Schirrmeister: Up to 50% of bugs reside in the testbench. I have to write scenarios, use-cases all the way down to what they may mean at the block level. This is why reuse, including reuse in verification, is so important.

Lacey: There are different challenges in different domains and these are addressed with different types of tools. I am more familiar with the tools on the verification side and they are advancing rapidly. When I look at RTL designers, they are more “old school” and their debug techniques tend to be less sophisticated.

Montecillo: My group concentrates on verification at the block level, so for us, we can enable a better quality RTL by starting earlier. Our goal is to deliver the best quality RTL we can to the system-level guys.

SE: What kinds of problems can early deployment of formal detect?

Montecillo: We find both architectural and implementation problems. These are the same issues that would be found in simulation but we can find them quicker and faster at the early stage. If you look at the cost of bugs, when the bug is caught later, the cost of that bug increases.

SE: Formal used to be used to find problems that simulation missed. Now it is finding bugs before simulation starts.

Montecillo: Exactly. That is one of the ‘shift left’ programs that we deployed. We want to empower the design team to deploy formal techniques.

Goodenough: There is a formal use model that has nothing to do with verification – it is about using formal tools to support better quality design. It is a mistake to have verification and validation engineers doing this when the design team could avoid the problems completely by using formal. Another use-model is to hunt for bugs. We were unsuccessful when we made formal a validation problem. We were successful when we made formal a design problem. We have had huge success in shift left by enforcing good design practices through the deployment of formal.

Schirrmeister: Many issues found at the top level are about interconnect. Formal used to be written off by saying it didn’t scale, but it has grown up and for this problem it scales better than simulation.

SE: What kinds of tools help with bug triage?

Goodenough: There is a multiplicity of techniques. One example is promoting assertions from the block to the system level. If you find a problem at the integration level, the guys doing the debug may know nothing about the blocks. You can provide local context by giving them information about what is happening in the design and some system context. Promotion of assertions, promotion of performance counters, use of abstractions – don’t make a software guy look at waveforms, give him access to transaction-level information that may mean more to him.

Schirrmeister: The human brain does not comprehend all aspects from software to hardware to mixed-signal. As a vendor, we try to provide the right windows into what is happening.

Lacey: When talking about triage and debug, we look at automation. The team may run 10,000 to 15,000 tests overnight. If you have a fail rate of 1%, that is 100 cases to be looked at. We have a script that automatically reruns failing tests to provide additional debug information. The engineers can immediately dig into the problem in the morning.

Montecillo: Formal people love to debug because it is easy. We don’t have the problem of simulation where they have to look at long simulation traces and from that attempt to locate the root cause. We usually deal with tens of cycles, so finding root cause is a lot easier.

Goodenough: When you reach a certain level of cyclometric complexity, the depth of the sequential path, you hit an explosion and counter example generation becomes challenging.

Montecillo: That is when you have to use abstractions, partitioning and other techniques. But if the problem gets to be too large, we would probably apply simulation to the task.

Audience Question: I am a believer in you get what you measure. Does anyone have any ideas about how to measure productivity related to debug?

Lacey: We don’t have any formal metrics, but we do have a feel about the types of things that make a difference. Moving to UVM was a debug enhancer. We now have consistency across all the components, which means I can go and help someone else debug a component and I know how to get around their code.

Goodenough: We do measure time to root-cause analysis. We do not apply this uniformly; instead concentrating on the big problems. We also track times from sightings to triage. But there are cultural issues. Engineers are intelligent people and they resent not being trusted. You have to demonstrate how metrics can enhance rather than being seen as a constraining measure. So the real barrier is not what the metrics are, but how do you deploy them. Without complete data sets it is easy to jump to conclusions. Sign off metrics are very different from productivity metrics.

Lacey: Engineers know what causes them the most pain. They know what takes the longest and feels like a waste of time. That is where we focus and try to add more automation.

Audience question: Why can’t we concentrate on the root cause of the problem and make it less likely that bugs will get injected.

Lacey: We are starting to use some of the more sophisticated features of SystemVerilog so that we don’t have to duplicate as much code. We use code reviews to reduce bugs and to provide feedback to younger engineers.

Schirrmeister: Raising the level of abstraction and using means that there are different aspects that you can verify than you could before.

Goodenough: The first step is to make sure you have good RTL coding standards in place and you enforce it. But this is not the real problem. It is the unknown unknowns that are the problem.

Audience Question: Formal allows you to verify a lot of things without stimulus, but formal has limitations for dealing with complexity.

Montecillo: We use various techniques such as partitioning, abstraction. These are the key steps you have to do to use formal.

Lacey: You don’t take a whole chip and put it in simulation. You break it down into smaller chunks and focus on smaller pieces of the design. Formal is no different when you have a large problem.

Montecillo: And you can move code between pieces. A checker for one block becomes the constraints on the inputs for another block. So even though I am partitioning the block into smaller pieces, I am also creating the pieces for verifying the block that it fits in.

Goodenough: Formal isn’t a free lunch. It is the same amount of work to write the formal environment as it is to write constrained random. It is arguably more structured which means it is slightly less prone to bad coding habits. You are ultimately limited by the capabilities of your design and validation engineers.

Audience Question: Testbench bring-up is too slow. How do you deal with this?

Lacey: We know when things will be ready and make sure they align. With the move to UVM we can have an initial verification environment very quickly. We have gone from having something in a week or two to having something in a day for most parts of our environment. This is partly because of reuse, but even for new environments, we have templates, so a more structured environment really helped.

Goodenough: Start using formal and standardize your testbench architecture.

Montecillo: If your design team is sitting there waiting for a testbench, then use formal to visualize your design. You don’t need constraints or checkers, just define a scenario and ask the tool to make a trace for you. Once it has created a trace you can manipulate it and start asking questions. You have the ability to explore the design even before you have a testbench.

Audience Question: We can barely afford a couple of simulation licenses? We cannot afford formal tools.

Lacey: We started with engineers saying that things were too painful and wanted a better way to do it. They started using their favorite scripting language and came up with a solution.

Montecillo: Start by hiring top quality engineers. They create fewer bugs. Also having a good methodology is important.

SE: If there was one additional tool that would help, what would that be?

Goodenough: When you get a failure, an automagic databook will pop up the relevant pages from the specs for each of the IPs involved and from the system spec and guide you down the right route.

Montecillo: There are two tools we really want today. The first is the over-constrained analysis tool and the second is the sequential depth analysis tool.

Lacey: My biggest item would be for existing debug features to work in all tools, be it multi-language or mixed-signal.

Schirrmeister: It easy to generate terabytes of data but being able to make sense of this amount of data is difficult. This is something we need to tackle.



Leave a Reply


(Note: This name will be displayed publicly)