Experts At The Table: Who Takes Responsibility?

Last of three parts: Coverage issues; who’s responsible for performance degradation; warranties on IP; limitations on risk.


Semiconductor Engineering sat down with John Koeter, vice president of marketing and AEs for IP and systems at Synopsys; Mike Stellfox, technical leader of the verification solutions architecture team at Cadence; Laurent Moll, CTO at Arteris; Gino Skulick, vice president and general manager of the SDMS business unit at eSilicon; Mike Gianfagna, vice president of corporate marketing at Atrenta; and Luigi Capodieci, director of DFM/CAD and an R&D fellow at GlobalFoundries. What follows are excerpts of that conversation.

SemiEngineering: Even for highly skilled verification engineers, getting sufficient coverage is a common complaint. How do we deal with that?
Stellfox: For the IP level, it’s a relatively easy problem to solve. Of course you have all the different configurations of SoCs that you need to support in your IP and all the different methodologies for that. But at the SoC level, the coverage is an interesting problem that we’re trying to solve. It really gets to be use-case coverage. How will you test a chip, both from the software APIs and the dozens of cores on an SoC. How will those cores interact with various parts of a chip. And then you have all these interfaces. You have to characterize the application space in some formalized way around what are the use cases and whether it works functionally, as well as whether performance and power fit those use cases. That’s what the SoC people would really like to have—a much more methodoical approach.
Skulick: In a monolithic SoC, right?
Stellfox: Exactly. Is it really realistic to have coverage of everything. It’s not realistic to say everything could be crossed with everything and you have to account for all of that. These SoCs have very specific use cases. You may use the same chip in a tablet or a phone, but there is still a finite set of use cases. What kind of traffic is coming in, what kind of software is operating on that. And then you verify and have coverage of that space.
Skulick: If we have trouble with an SoC, you can imagine what kind of trouble we’re going to have in 2.5D and 3D. Trying to get access to a stack or a die that has not been contemplated in the same domain as an SoC is different. If there’s a SerDes slice from an analog supplier, how do you get in and make sure you get the coverage in a 2.5D environment.

SemiEngineering: There’s a gray area in all of this, as well. What happens if performance isn’t quite as good as you expected or power is a little more than you expected?
Skulick: In an SoC, you have ways to cover some of that. We can boost some things in the process, test for power, test for performance. It’s when you miss it completely that you have a problem. But how do you do that in 2.5D?
Moll: One of the issues with mobile is that as much as use cases are critical to functional, performance and power verification, fortunately on a mobile device is there is no such thing as a use case. In a previous life I was making SoCs at Nvidia. I had my power spreadsheet and there were use cases I cared about. Today you have no idea how a device is going to be used. In many ways, it looks like the 2.5D problem. It’s so complex—we have chips with 160 IPs—that you will never test the whole product. You have coverage for the main use cases, but there are many others where it is very hard to plan for them. Nobody in the whole SoC matrix company has a plan for this. When you do this and this and this, performance is not as good. So who do you blame?
Gianfagna: It’s not impossible to predict, but it’s daunting to enumerate. There’s a data explosion.
Moll: At the architectural level, it’s very hard to have a plan.
Stellfox: I haven’t seen anyone who’s done a good job of capturing all the use cases formally. It’s usually captured in architectural scribbling. We don’t want to be doing constrained random at the system level, but we do want to explore outside the space and combining the building blocks of use cases we can think of in interesting ways that we don’t necessarily think of. That’s where there’s a big opportunity. But the systems are so big and so much software is required to do those interesting cases. Doing that on simulation is not practical.

SemiEngineering: Are we ever going to see warranties on IP?
Koeter: The practical matter in the IP industry is that license fees don’t allow for coverage and warranties, other than compliance verification. But it doesn’t include payment for mask charges. The GSA did a study that showed the average IP vendor offered a 1X indemnification. That’s pretty standard in the industry. You can’t sell a controller for $100,000 and be liable for a $5 million set of masks. That’s not a model that works.
Gianfagna: If part of the warranty is around manufacturing costs, that gets scary. On the other hand, can you come up with a set of conditions under which that IP is going to work? Now what happens if someone uses it in a way it wasn’t intended?
Skulick: We offer warranties, as long as we’ve done reliability testing upon that particular device. We qualify every single device and we offer a warranty for that. On the behalf of the IP vendors, we warranty their IP.

SemiEngineering: How about as we move forward to new process nodes and more complexity?
Capodieci: That’s a question of where you are in the yield ramp rather than the faith you have in the IP. You’re not going to blame the foundry if under some strange software condition your IP stops working because you downloaded a game on your iPhone. IP comes early in this game. When you have good wafers, you have good faith in your yield. But ultimately, the software integrator of the whole system should be responsible for looking into the hardware and abstracting the hardware. That’s something that’s not done well—even using redundancy and software tricks to make sure that if something isn’t responding, it needs to be taken out. That’s still not being done. If the system is reaching that level of complexity, blaming a little piece of IP and asking for everything to be perfect probably isn’t reasonable.

SemiEngineering: Are we at a point where risk will at least stop increasing in the future?
Gianfagna: I like to think the methodology is getting better. We have a better vocabulary to talk about expectations, what the deliverables are and how they should behave. That has a positive outlook for predictability. If it can be more predictable, then the risk of using it is less.
Capodieci: There are two different views here. One is moving forward to the next node. The other is moving forward on the same node. Going to the next nodes, the risk will diminish because there will be fewer suppliers willing to take that risk.
Koeter: IP is a matter of scale, exactly for that reason. That’s why you see so much consolidation. You can’t afford the level of investment in qualification and simulation if you don’t have enough scale.