Experts At The Table: Verification Strategies

Last of three parts: Verifying IP and software; using margin as a buffer; ‘happy gates’; deadly bugs; too many models; improving verification through better design.

popularity

By Ed Sperling
System-Level Design sat down to discuss verification strategies and changes with Harry Foster, chief verification scientist at Mentor Graphics: Janick Bergeron, verification fellow at Synopsys; Pranav Ashar, CTO at Real Intent; Tom Anderson, vice president of marketing at Breker Verification Systems; and Raik Brinkmann, president and CEO of OneSpin Solutions. What follows are excerpts of that discussion.

SLD: How does software fit into the IP verification picture?
Bergeron: IP comes with a very significant software stack.
Foster: The software side is not that well understood, either.
Bergeron: The IP providers are providing more of their verification environment. Also, you often get demands from the customers. So the customer will ask for test suites for an IP protocol, as well. Why? Verifying the protocol and the version are two different things. And vendors have failed to provide integration tests. It’s a lot easier to give them your entire verification environment so they can figure out how to do it.
Brinkmann: Integration verification is one thing. Who makes sure at the IP provider that this set of parameters has ever been verified? There is a chance you’ll hit something your IP provider never looked at.
Bergeron: We may never have verified a particular combination.
Anderson: The chances are good that combination has never been verified because there are so many possible combinations. There are dozens or hundreds of parameters. You could simulate for the rest of the universe and never hit all the combinations.
Brinkmann: If you’re an IP provider, you want to provide customers with a verification environment that allows you to run the different parameters and re-check it.

SLD: One sure way to make sure a chip worked okay in the past was to add extra margin, but that’s affecting performance and power at advanced process nodes. That puts much more burden on verification. Are the tools up for that?
Ashar: Timing constraints used to be handled by synthesis tools, place and route tools and STA (static timing analysis). All of these are qualitative results types of tools, so they are forgiving. If you don’t have your timing constraints right, you have the opportunity to go back and fix it. Going forward, getting timing right is moving up the abstraction chain. One reason is interface and power checking. The chip needs to be set up for doing power verification and RTL, and the setup information is in the timing constraints. You need to have them specified at that level. Also at that level, the RTL degradation is still happening. Various blocks are at different levels of maturity, so you have to put together all of these constraints. It’s a challenge, and it creates an obligation for a tool at that level to maintain consistency between the various sources of the timing constraints and the RTL.
Foster: There are multiple dimensions to timing. I’ve seen many chips have problems with things like power management deadlocks and be unable to get out of that. You have to get it right. If you can’t bring the chip up then you can’t get it into the lab to find the really tough bugs. You do have a minimum requirement.
Bergeron: Given the complexity of the job you have to specialize. At the functional level, there are so many transistors that we do have slack and margin. We make everything programmable that we can. IP is now thousands and thousands of registers. We don’t know how it’s going to come so we can’t always tweak it, so we put the margin in the higher layers.
Foster: One of our customers put pretty good logic in to overcome problems. The debug logic identifies the problem, and then they work around it. Extra logic is still being put in there.
Anderson: That’s an interesting idea. Almost every chip today will have defects, so you have to design your chip to work around those defects.
Bergeron: This is design for yield.
Anderson: Yes, but it’s also design for verification failures. You’re building in extra paths if you find you’ve missed a bug.
Foster: This is the same as putting ‘happy gates’ in the chip. You’re happy they’re there when you find a bug.
Anderson: That’s been done for a long time.
Bergeron: Memory has done that for years.
Brinkmann: So do you put in extra IP blocks? The same kind of thing applies to software, as well. As soon as you start monitoring things, it affects performance.
Foster: Yes, it does affect performance, but at least you don’t have to re-spin.
Anderson: The world you’re talking about with clock domains and power—we’ve lost flexibility there. We find bugs, the chip works, and if it doesn’t work we develop software to work around that. When it comes to some of these new orthogonal areas like CDC and power, if you get it wrong you’re generally dead. With low-power bugs, you basically throw away the chip. If you power down some piece of a chip and it doesn’t come back up, what do you do? If you have a clock-domain crossing problem, you can’t fix it with software.

SLD: There’s been a push to raise the abstraction level with models, but do we have too many models to deal with them all effectively? There are power models, functional models, software models and system models.
Foster: The problem with TLM 2.0 is it defines communication. It doesn’t define computation. We have communication and computation, and that’s left open. That’s an unsolved problem. If we could solve it, we could move synthesis up to the next level of abstraction. But there’s no agreement on what would be that synthesizable subset in terms of computation. TLM stops at how to communicate.
Ashar: You don’t need to go full bore in terms of synthesizing from the architectural model for TLM to RTL, as long as you maintain a link.
Foster: You have to be able to prove that it’s consistent.
Ashar: Consistency is in how you translate the spec from the verification obligations on the architectural level to the RT level. You don’t actually have to synthesize from TLM to RTL.
Foster: We do a pretty good job of synthesizing today everything from DSP to graphics. With any of these algorithmic things, we do a good job of staying at the algorithmic level. That works pretty well today. Unfortunately, you can’t do the entire chip.
Bergeron: That’s what they said 20 years ago about logic synthesis. We still have to some by hand.
Foster: It will evolve.
Bergeron: It has to evolve. The way out the verification conundrum is through design. That’s why we moved to RTL synthesis. Gate-level verification was too hard.
Ashar: We’ve solved the data path problem. You don’t touch adders or multipliers in the technology map. All you maintain is a link from the higher level to the RT level and you synthesize the control of the data path part.
Bergeron: If we can find the same kind of model—transportation parameters we can agree on—and provide a library for those things that can be inferred or implemented, maybe we could go to something that is viable for a broad-spectrum application and do full-chip synthesis.
Foster: The constant in the industry for software design is 50 bugs for 1,000 lines of code. If you can move to the next level of abstraction, you’re introducing fewer bugs in your code. We need to do that, not only for verification performance, but also to reduce the number of bugs.
Brinkmann: Until we are there, we have a verification problem. With these different abstraction levels, we have to make sure they mesh. If you verify manually, it’s very difficult. The only way to do that is you take models and abstract from them again so that you have high-level models that are equivalent to what you’re doing. And you re-verify your entire system using these abstracted models.
Foster: We did that when we moved from gates to RTL.
Ashar: And that’s what I see happening here, too. Otherwise you will never find a bug until the chip is out of the fab. What you could do is go back up with the low-level model and integrate it into your high-level simulation.
Bergeron: The transition cost from RTL to whatever higher level it will be is going to be huge compared to gates to RTL. We did not have huge gate-level IP. Things were recaptured all the time. Now we have all those IP cores modeled in RTL. We will have to provide a high-level of those and make sure it’s consistent. That will be a huge amount of effort.
Ashar: The programming model from RTL to the next level of abstraction is quite different and very challenging. If we set too high a bar in synthesizing from one programming model to another, we’ll never stop working on it.
Foster: When we went from gates to RTL, there was basically a one-to-one mapping of states between the RTL model and a gate-level model. At a higher level of abstraction, there is a one-to-many mapping. How do you know these models are even consistent?

SLD: When you move to a higher level of abstraction, are the bugs harder to pinpoint and can they cause more destruction?
Foster: At lower levels of abstraction you have a whole bunch of things you have to weed out. They’re still very complex bugs.
Bergeron: They should be more obvious at a higher level.
Ashar: You will have a lot of very destructive bugs at the higher level, but it’s easier to extract these symptoms. At some level, if some state becomes unreachable or deadlocks, you should be able to find it.