Experts At The Table: Debug

Last of three parts: Expanding the definition of debug; different levels of debug for different market segments; the relationship between complexity and debug; what happens after bugs are discovered; why EDA software can’t be perfect.

popularity

Semiconductor Engineering sat down with Galen Blake, senior verification engineer at Altera; Warren Stapleton, senior fellow at Advanced Micro Devices; Stephen Bailey, director of solutions marketing at Mentor Graphics; Michael Sanie, senior director of verification marketing at Synopsys. What follows are excerpts of that conversation.

SE: There are separate areas being created in devices, such as for corporate data on one side of a smart phone and personal data on another. How does that affect debug?
Bailey: Managing the complexity by limiting the number of interactions is a valid way to go—at least to the extent that you can achieve it.
Blake: The question is why you would do that if you need the interaction.
Bailey: If you say you want to launch a multimedia file, there’s enough dedicated memory for the video processing subsystem to deal with it. You just move the data back when you’re done compressing it. It does have to make sense on why you would do that within the context of the system.
Blake: Even if you decide to go with a structured bus, a customer may say they want it to have this extension. The customer is putting a big enough opportunity in front of you that you’ll do what they ask for. And now, suddenly, those interactions don’t work anymore.
Sanie: We see that very often. A lot of the bigger SoC companies license AMBA, not to use it as is, but to work with ARC for their own internal use.
Bailey: It’s a tradeoff they’re making. They’re saying they can get some kind of differentiated value.
Blake: But you’re not going to walk away from that business opportunity. And if it’s third-party IP, you’ve got to hope that your third-party IP vendor will work with you.

SE: Verification needs to start very early in the design process. Is it the same for debug?
Sanie: My definition of debug is more than you can’t do anything until you find bugs. It’s a much broader discussion.
Bailey: There are a lot of things you do to prevent bugs, and you certainly should do all of those. But the debug process is identical. You have a problem. You have to identify it, which requires some amount of discovery. You have to peel back the layer on the onion. The whole process of debugging is making it simpler because that makes it easier to find the bug. Usually, once you find it, it’s straightforward to fix it—although straightforward doesn’t necessarily mean low cost. The hard part is identifying where the problem is.
Stapleton: An adjunct to that is whether the debugger knows how the thing was supposed to work. He may see a symptom that doesn’t look right, but he may not understand the issue as a whole to even make that determination. That goes back to documenting the design and expressing the intent, which is all wrapped up in this debug process.

SE: The more complex the hardware, in theory there should be more bugs due to many more interactions. Is this what’s happening?
Sanie: It’s segment-dependent. For example, in the server market it’s very expensive not to fix a bug. If you’re in the mobile segment, there are some bugs you can fix well enough with software, and some bugs you can fix next time. You also can tape out with some bugs.
Stapleton: Not bugs the customer will see.
Sanie: No, but in some cases you tape out for debug. It’s easier to find those bugs in silicon. It’s more costly. But it allows you to run real silicon to find bugs.
Stapleton: The real silicon is faster than an FPGA prototype.
Bailey: The software folks will release knowing there are bugs. The bottom line is the priority of the bug because the priority is related to the success of the product in the field. I have talked with companies where they had bugs in hardware that were fixed in software.
Blake: 100% of hardware providers do exactly that.
Bailey: It’s all a level of priority. My company was working with a customer to use assertions to isolate the bug. They found the bug, they came up with a workaround in software, and they validated it. Even though the fix didn’t always solve the problem, it significantly reduced by orders of magnitude the frequency of the bug appearing. When you go from having to reboot multiple times a day to once a month, that’s a significant difference.
Blake: If our EDA provider could write perfect software, we wouldn’t be able to afford it. The same is true of hardware. We could make perfect hardware, but you wouldn’t be able to buy it.

SE: It sounds as if you’re saying that EDA is a mirror image of what’s going on inside the chip. Because of that, there are a lot of unknowns that no one knows are there. How does debug work with something like that?
Sanie: You can take all the positives and make them negatives. That’s one approach. That allows you to find stuff you weren’t looking for.
Blake: That’s always been one of the challenges. How do you create test cases you can’t think about? Constrained random was one of the big steps forward in that area, but there are more advanced tools coming. You can try to find a bug that because of latency you know is in there, but you have to come up with alternative ways to pinpoint it.

SE: Isn’t that like debugging your methodology?
Blake: Yes. We’re doing this to find bugs.

SE: How much of this is an education issue, where everyone has to learn new stuff at a rapid pace?
Blake: Most engineers by nature are inquisitive people, so learning new stuff isn’t much of a challenge. We do come across pockets in organizations where there is resistance, but most engineers don’t see that as a problem. It’s a budgetary issue for companies, and there is time out of schedules. But at the same time, when you look at the advantage you gain, you can’t afford not to do it.
Sanie: Verification is easy to learn compared to something like analog. There are lots of blogs talking about new approaches.
Bailey: I agree. When you look at other disciplines, verification engineers are very willing and eager to learn something new and deploy it. That still doesn’t change the fact that it takes about a decade for something really innovative to become mainstream. I don’t know if that time can be shortened. SystemVerilog and
UVM had very fast adoption rates, but that was really standardizing stuff that had been done before. I’d like to go back to the question about unknowns, though. The future is providing better visibility for the user from a coverage analysis perspective, going beyond where you are with debug today. When you have a specification that implies a test sequence, at an SoC level, it’s not just the sequences but the interrelationships of the timing of those sequences if you stressed them in different ways for performance and functional reasons. Some customers thought they had really good coverage for their test, only to find out they were only exploring 40% of the state space. There’s another 60% to go.

SE: Does the debug information get relayed back in the flow?
Stapleton: It’s still in silos. If the SoC team has an issue, they feed it back to the IP team, which may go try to recreate the problem in their own environment. They may start looking peripherally to see if it’s a family of problems.
Blake: I’ve seen cases where it works really well, and others where it didn’t work at all. It’s hit or miss.

To view part two, click here.
For part one, click here.