What verification engineers say about the most time-consuming part of chip development.
By Ed Sperling
System-Level Design sat down to discuss the future of verification with Olivier Haller, design verification team leader for STMicroelectronics’ functional verification group; Hillel Miller, functional design and verification tools and methodology manager at Freescale; Kelly Larson, design verification engineering manager at MediaTek Wireless; Adnan Hamid, CEO of Breker, and Alan Hunter, consultant design engineer in ARM’s processor division. What follows are excerpts of that conversation.
Q: Where are the pain points in verification now?
Miller: Inside of FSL we found there were different groups in different areas doing the same thing. We put a lot of effort into making that more efficient. We used to have teams on the modeling side doing verification IP, teams on the RTL side doing verification IP and teams on the validation side doing verification IP. We developed verification IP in C++ that could be re-used. There wasn’t anything out there for doing this, so we did it ourselves. We are now trying to figure out how to get the vendors—Synopsys, Cadence and Mentor—on board.
Larson: We’re in the midst of developing expertise for System Verilog, we’re adopting class libraries. Where we lack expertise is in our coverage models. There’s a lot more sophistication in the stimulus and in the randomization than in detecting bugs and making sure we have coverage for that.
Haller: A few years ago, our challenge was quality, but we have good ways to address quality in the verification flows. What we need are better ways to bring intelligence into the test benches to improve productivity.
Hamid: We need to build coverage around the outcomes you’ve got to test for and the constructs that will give you these applets. We also see that people need to be able to verify their IP cores faster. The third thing is vertical re-use. We want to be able to test pieces of firmware to work with our systems.
Hunter: We have a different problem. We’re supplying IP, not systems. We have a reasonable handle on verification of our IP. We have fairly good experience with coverage models. We’re starting to move into the system verilog test bench. We have always been an “e” house (Cadence’s design language). We’re starting to move to System Verilog now, and that’s causing some headaches. But the problems are IP configurations, and how do you verify all of these configurations. We have our own internal system we can plug IP into so we can test all these corner cases. Putting this into a real-time system rather than an FPGA is a big issue for us. We’re addressing it, but slowly.
Q: Is coverage a major problem on the hardware side?
Miller: In coverage we have two big issues. One is the design teams knowing exactly what they need to achieve. Coverage crosses a whole bunch of boundaries. There’s coverage for unit level, subsystems, SoC, stimulus, functionality and telefunctionality of signals. It’s a matter of deciding exactly what coverage goals they need to meet. Based on that, it’s a matter of planning the resources. Today we don’t know how many people or workstations or software licenses we need because we don’t know what coverage the design team needs to achieve. Another big issue is there are multiple disciplines in coverage, and currently we don’t have a good standard approach. System Verilog isn’t complete in some areas.
Larson: For us, it’s easy to not generate enough coverage data. But it’s easy to generate too much data, too. You go from one extreme to the other. You really have to figure out what you need to look at and what you don’t need to look at. That’s an art.
Q: But isn’t that the biggest problem in verification? It’s a question of what do you really need to look at because there is so much data. How do you find the bugs?
Hamid: Intel is claiming that 25 percent of the verification guys’ time is wasted chasing ‘don’t cares.’ The complication is rising.
Miller: I’d like to ask designers when they’re running simulation license resources, exactly what value they’re getting out of those resources and how the coverage is improving.
Hunter: What pops up is ‘Don’t care,’ ‘Really do care,’ and ‘Mostly don’t care.’ The problem is these are system-dependent, so we need to go for everything.
Miller: But how do you go after multiple clock domains and frequency relationships?
Hunter: We have to try with our internal system bench. We have a whole set of different clock domains where we can change the memory types.
Larson: We run into the same thing internally, where we supply a verified block and someone uses it in a different way. You don’t know what your ‘Don’t cares’ are.
Haller: We apply a plan-based verification methodology by defining the features with an associated severity and then map them to verification metrics. Doing it this way, you can be sure that the features that are most important to your design are verified.
Miller: We think you need to go in both directions. You need top-down to focus, and you need bottom-up to ensure that you’ve done exhaustive coverage for your IP.
Hunter: But that makes the assumption that your design is correct.
Haller: You are not qualifying the design. You are qualifying the verification.
Q: Does it matter what language you’re writing everything in? We have System Verilog, e, Vera, and a slew of others.
Miller: One of the key things driving this is the consolidation of the semiconductor business. We are partnering much more than we did in the past, and we need to make sure we have everything ready for future partnerships. We use System Verilog because it’s widely supported, and we use C++ for the same reason.
Larson: For efficiency and re-use, you really want things to look as much alike as possible, which is why we’re using class libraries. It’s all about efficiencies. We have very small teams and very huge chips. Vertical re-use is a big issue. How do you leverage unilevel test benches into your sub-unit and system-level test benches?
Hunter: From a design perspective, we have to go to the lowest common denominator, which is Verilog 95. That’s the only thing you can guarantee people will understand. Until recently we had to support VHDL, as well. We’re going to have to move to Verilog 2001 soon so that we can do parameters.
Hamid: Don’t you drive vendors to move up?
Hunter: We try to. We have partnerships with Synopsys, Cadence and Mentor. But a lot of people don’t want to move. From a verification perspective, we supply a simple test bench that executes calls and we can supply simple test streams to those assembly tests. The fabrication division will sell you a piece of verification IP, but from a core perspective it’s relatively straightforward.
Q: How about everyone else? What languages are you using for verification?
Haller: At ST, our main language is e. We have two projects starting with verification on System Verilog. The fact that OVM (open verification methodology) is open-source was one of the factors we considered. We are able to integrate System Verilog with e in our OVM environments.
Hamid: We see everyone doing something different. We had to use a lowest common denominator solution, which is C. Now it’s evolving to a world where different IP blocks are being developed with different languages. We don’t really care about the languages. What we care about is the interoperability. We can drive into eVCs (e verification components) or VMM (verification methodology manual for System Verilog) transactors, and we can drive them all from C. Everyone seems to talk C.
Q: This doesn’t sound like progress. It sounds like a regression to the lowest common denominator.
Miller: System Verilog, from the block level and user level is, in my eyes, a huge success. It’s been a major productivity gain. The debugging tools and the closeness of the Verilog language are progress. The C-C++ comes in because we deliver products that use C/C++ as the front-end. That’s driving a need for C as the lowest common denominator.
Larson: As a verification engineer, we have no interest in design test benches in C. We need a high-level test bench language. We’ve been looking at Specman and Vera, and we never bought off on either one of those because of the resistance of going to something that is proprietary. We jumped on System Verilog. It brings all the things a high-level test-bench language brings. That all adds to the productivity of a verification engineer.
Q: Why not use Cadence’s e?
Larson: That ties you to a specific simulator. We had been flip-flopping, depending upon who gave us the best deal, between different simulators. That’s one of the things that class libraries should be able to solve pretty soon.
Hunter: We were using e, as well, until very recently. It works very well for IP. But financial pressure pushed us to System Verilog.
Q: Is there a way of inserting standards into verification to address some of the corner cases that are creeping into more complicated designs?
Miller: The reason that we have established a verification IP committee in Accellera is exactly that. We are going to start focusing on System Verilog itself. We have to deal with System Verilog and this lowest common denominator, which is C or C++.
Hamid: The test bench problem, whether it’s System Verilog, Specman, Vera or System C, has gotten mature over the past 10 years, so we can start talking about standards. We can only talk about standards when we understand what we’re talking about. We can build a test bench that allows good traffic in and good traffic out. Now it’s a matter of getting good test cases that allow us to hit all the coverage areas that we care about. But we also have to figure out what those coverage points mean to us. We’re not even close to standardizing that because we haven’t figured out what the problem is. It’s not the languages that are the issue. It’s how human beings relate to this incredibly complex things that we’re building.
Miller: We started off standardizing languages. Then we went to formats. Now we are doing interoperability. The next thing will be methodology.
Q: Does the problem become more difficult as we move to 45 and 32nm, and then stacked dies.
Miller: You deliver something to the customer and you need to make sure it will work for the kinds of applications the customer wants to write. You need a good coverage environment on your silicon, as well. That’s going to help us when we go to 45nm, where everything is correct by design. We will have an environment that complements that. We will be testing our functionality with a good coverage model on the SoC.
Hamid: We need to go to a functional, back-end test because that is only way we can be sure our design will work.
Miller: We’ve had this assumption that everything is correct by design and it will work. I’ve never believed in that. At 45nm and 32nm, it’s worse.
Q: The coverage model is worse at those geometries because there’s more space on the chip, right?
Haller: From an RTL and above-verification perspective the issue comes from a scalability perspective since we are able to add more and more functionality that needs to be verified. Re-use of qualified IP is key to enable functional correctness when targeting these dense geometries.
Larson: The other thing that’s been pushing into the verification realm is this whole notion of low-power verification. There are a lot of functional issues with that.
Miller: If you look at this whole thing, we have isolation and retention cells, level shifters and power domains.
Larson: You have starting clocks, stopping clocks, certain sections stay up while others do not, and we have software routines that are restarting some of the registers. You need to simulate you’re blowing away all the values. You can’t just wave it off anymore.
Hamid: We are also seeing that as these designs get more complicated, getting the device through all those states of power up and power down requires a lot of well-managed operations. The test generation is also getting hard.
Q: How about the addition of multiple cores?
Miller: We’re dealing with interconnect and cache coherency capabilities, and we find that we have an interconnect patch coherency module. The issue there is that it’s becoming so complex with the state transitional tables—we have books of these tables—it’s a nightmare. We’re writing assertions and verifying these transitional tables are complete. Also, with multicore, you want to be able to run different operating systems at the same time on different cores. This creates a huge need for address translation capabilities and resource management on a specific address. And now we have to tightly couple the software with the hardware. In a multicore system, you want to be efficient with your hardware. We have things like queues where you can insert tasks, and you want this to be as parallel as possible, so we have tight coupling of the software tasks and the execution of the hardware. These are real challenges for us. There are real-time requirements that have to be solved with software and hardware.
Hamid: The CPU guys have been seeing this for awhile. It’s all about resource conflicts. Some of this testing has been done before.
Larson: We’re attacking the same number of transistors with a dozen people that we used to do with several hundred.
Q: The foundries have been sending back more and more data about what’s going in the manufacturing process. Is that helping or confusing the issue?
Hamid: Once we figure out how to do all these functional tests, we will see them move to the manufacturing side to make sure this works.
Miller: From our side, we need more things like redundancy and ECO (engineering change order) capabilities. That can also help.
Hunter: We’re one step removed from what goes into silicon. This helps the silicon guys, but it doesn’t help us. We need more feedback.
Hamid: We need to come up with a set of rules that say, ‘Here’s the way that we expect you to use our IP,’ with documentation. If we say, ‘These are the use models, these are the way you can use a tool,’ then the customer may complain about the rules but at least he knows to use it.
Q: Intent about how to use IP or tools is an interesting approach. So far, it hasn’t been done, though.
Hamid: What does it mean for a design to work? That it gives you the same answer for all possible implementations.
Miller: That’s a black-box perspective. We need to emphasize the white-box point of view.
Hamid: The black box is also part of your intent. But no, we haven’t evolved to the point where that’s being done.
Haller: We are trying to agree on mutation-based metrics with our IP providers to ensure their functional correctness. We are not there yet. But it is clearly the most objective method and the industry should push for these kind of metrics to ensure final quality of designs.
Q: Verification is currently 70 percent of chip development time—and cost. What’s a realistic goal for reducing that time?
Miller: The key is automation, and writing specifications where you can automate the verification IP from those specifications. Today people are writing specifications, but they’re not always doing it in a form that is extractable. There are some successes.
Larson: We’ve gotten quite good at re-using design IP. We have standard bus protocols and we can take standard IP from multiple sources and meld that together to create a huge system. But none of that helps in verifying that the design does what you want it to do. We’re not nearly as good at re-use of verification IP. It takes a lot of effort to put the test bench together and to put the test plan together. We need to get to the same point in verification as we’re at with design. We need standard avenues of communication in the verification. Our typical verification engineer cranks out way more new code than a typical RTL engineer.
Leave a Reply