Experts at the Table: How is machine learning going to impact debug, and what other improvements are on tap with debug?
Semiconductor Engineering sat down to discuss debugging complex SoCs with Randy Fish, vice president of strategic accounts and partnerships for UltraSoC; Larry Melling, product management director for Cadence; Mark Olen, senior product marketing manager for Mentor, a Siemens Business; and Dominik Strasser, vice president of engineering for OneSpin Solutions. Part one can be found here. Part two is here. What follows are excerpts of that conversation.
SE: In many cases, we are interested in rare occurrences in a system. Could we be looking at data coming from emulation and saying we have seen these patterns many times, but we have just seen a pattern that has never been seen before? Nothing says it is a bug, but is that an indication of behavior that may be anomalous? Maybe it is a bug that hasn’t been seen yet.
Melling: There are two things. One is the detection of the anomaly and being able to create fixes on the fly in the field to seal holes. The other side, where machine learning (ML) technology will be applied, is in the bug hunting side. Can we do a better job of predicting where bugs are likely to occur or what kinds of checking have a higher probability to cause bugs, do more aggressive regressions and testing in those areas to find those bugs, just because we can predict which types of changes have the highest probability of creating bugs. That way you can focus your energies. You still have to debug them once they are found, but both things are happening — ML to triage and find out and get to root cause, but there will also be just finding more bugs and being able to predict where to find them.
Fish: How do they use formal and ML techniques?
Strasser: We have tried some stuff in predicting parameters for prover selection for the engine selection. We have seen some success with this. There is one for regression and for the actual checks. We have also tried to predict the time for the proof. Two companies that have tried this have confessed that it didn’t work and one says that it does. Formal technology has traditionally been unpredictable compared to simulation. You want to be predictable and that is a huge field.
Melling: We look at ML from the perspective of inside and outside. Inside is the things inside of formal and improving things under the hood which make it a better tool for the user, but the user is only involved in that it is their data that we are learning from as it is working. Then there is the outside of it, which is trying to give the customer access to the kind of technology that allows them to be more productive in their job. Ultimately, we will all have tools that will allow them to customize algorithms and manipulate data sets to be able to improve the prediction capabilities of what is initially delivered and continues to improve and grow over time.
SE: How will we debug ML applications?
Melling: I like to compare it to the early GPU days. With GPUs, when they were doing 3D shading operations, there was always someone they call Golden Eye. That person sits there and looks at the images and says, ‘That is good enough.’ It isn’t an absolute. There is nothing that says it is right.
Fish: It is a value number.
Melling: That is the problem with ML. It is not an absolute. It is a prediction, it is statistical. That is why value matters. You need someone who can apply judgment that says this is good enough in the majority of cases, such that it provides value the market. You release something too soon, and if it is not good enough what do you do? These problems are even harder to debug because you are talking about continuing to improve the network to make the prediction better.
Olen: We have seen some customers developing neural networks and AI applications and they try to use as much formal verification as they can.
Melling: That makes sense. It is statistical. It is numerical by nature.
Olen: Try and cover an exhaustive space, which is difficult to do empirically, with simulation or emulation. But even then, they are running into challenges.
Fish: But to the question of accuracy of the net you produce, you have perfect hardware that is doing its job 100% correctly, and there is still the open question of why does it say a cat is a dog?
Strasser: If only it gets confused between cats and dogs.
SE: What improvements have been made in debug? What can we expect before AI takes over?
Melling: One of the cornerstones of debug is driver tracing. Driver tracing algorithms have really expanded. It is no longer just what caused this. It is how far back—especially X behaviors that are becoming a lot more sophisticated, which allow them to get back to root cause with the click of a button rather than having to go stage after stage back through the design.
Olen: Customers want tools that are built on modern stacks. We have a Qt stack, for example, as opposed to Eclipse or Tk. This enables them to be faster and refresh faster. They are more modern, they are more customizable. They actually enable the tools they are working with — the simulator or formal tools — to exchange the data more quickly. We are seeing big improvements in that area. We are seeing an interesting mix of Qt-based desktop debug and analytics being developed that work together with web-based systems for collaborative debug. Now, with teams all around the globe that are designing this part of the chip, ‘If one group has done this, lets not replicate each others work. I have this covered, you have that covered, lets analyze and understand all of our coverage metrics.’ There are some newer architectures in debug environments that support multiple tools — simulators, emulators, prototypes, formal, C synthesis, analog/mixed-signal — and these are consistent tools that can move from one to the next. They are context aware in the sense that they say, ‘I am doing real number modeling,’ or ‘I need to look at wires, that is what I care about.’ Or over here, ‘I am debugging my UVM testbench, so I need to look at the class libraries and understand what they look like.’ This is a whole new face to debug tools that is not a monolithic product. It also interfaces to cloud-based activities. Do you do debug in the cloud? Or will you run the engines in the cloud and have a clever streamlined pipe to get the necessary data onto your desktop? Or will you actually take advantage of web-based interfaces and technologies to debug something that is not on the desktop?
Fish: Shoving data or pushing data up to a server somewhere, where it can do analytics in a more thorough manner, is more interesting.
SE: That says that by doing it in the cloud you are doing more data crunching, whereas debug traditionally has been seen as being an interactive task.
Olen: We can now define debug to be two pieces. There is analytics and there is visualization. The analytics can definitely happen in the cloud, and that is probably optimal. In the functional verification survey, not only is debug the largest time-consumer, it is the least predictable. If you have peak needs, that is where the cloud can economically be a good solution. You need a lot right now, but I don’t need any for the next two weeks. Then for visualization, you need something that a human can see. That has to get to your display.
Melling: And performance has to be there for visualization. I was going add on to your context-aware comments that the ability to customize context awareness has improved. The user can use search mechanisms, color coding and highlighting different things so they can create their own context because they understand the design and they understand the data flow. Being able to highlight and track a data value through the system can be very powerful in the debug process.
Fish: What is the correct terminology for finding that you do have a problem, and then finding root cause?
Melling: A failure is where you start.
Fish: So you find a symptom. And then you have to find root cause.
Melling: Usually in the verification world it is all regressions to find the failures. And then there is triage, which is taking and categorizing those failure classes and types. The next stage is identifying the best test to use to actually debug and get to root cause.
SE: Formal has a huge impact when you do identify a failure, finding the minimum path to that failure. How important is this as a role that formal plays in the debug process?
Strasser: We also see the trend where you have the human and you say, ‘Follow the red.’ You start from a failure and you trace through the drivers and you get assistance from the tool to tell you where to look. We also see the user being able to query the values through time, and doing their own analytics on the values that you retrieve from the tool.—the tool assistance to the human plus the data analytics. The short path is important for both pre-silicon and post-silicon. You see a symptom and you try to mimic the symptom by writing assertions. These show the environment that points to the root cause of what you are seeing.
SE: Is that the encapsulation of the context and then the usage of the tool to find the minimum path for reproducibility?
Melling: Absolutely. It is a place where dynamic verification and formal come together. Dynamic verification can limit the scope of what you need to look at, and then formal can drill down and do the deep dive on that scope and hopefully give you the root cause issue.
SE: What improvements can people expect to start seeing in the next few years? How far off are machine learning techniques? What impact could Portable Stimulus have on debug?
Melling: It will be continuous. Will there be one silver bullet? I don’t think so. There is nothing that will make the problems associated with debug disappear. We will continue to improve. We are constantly learning from our customers and where their challenges are, where the hardest problems are, and continuing to pound down the nail.
Strasser: I see assisted debug, not autonomous debug. You get aids from the system, what to look at. Something that tells you what is wrong and what to change. That is not in the foreseeable future.
Olen: Engineers, both design and verification, still have highly gainful employment for the foreseeable future. But I agree, there is a continuum of developments. There are some that we are all working on that one could consider crawling in terms of applying machine learning techniques to verification, to coverage and to debug. They will continue to evolve.
Melling: And get more sophisticated.
Olen: We have talked about some examples today. Is it fully automated? Not quite yet. We mentioned things like arbitration, out of order analysis. Those types of things are happening right now in emulation.
Fish: One impact that machine learning has had on all of us is that there are nice new chips that started from a clean sheet of paper. It has been a while since we had many of those. For a number of years, there has been 20% change between chips. In the past few years, be it related to training or inferencing, there are blank sheets of paper. They are able to do things differently.
Melling: And they are big and complex and they take big datasets being run through them, and that is why we are seeing the boom in the emulation space. It isn’t a ‘nice-to-have’ anymore. It is a ‘must-have.’
Fish: Yes, the vast majority of our customers use emulators or FPGA prototypes.
SE: Any progress in stopping bugs from going in in the first place?
Olen: We should start talking about design for debug.
Fish: I am a believer in high-level synthesis for certain applications. If you can write fewer lines of C code to describe something, you are likely to generate fewer bugs.
Olen: So the creation of bugs is going to shift left.
Leave a Reply