Experts At The Table: Yield And Reliability Issues With Integrating IP

Second of three parts: The growing challenges of communicating complex information across a disaggregated supply chain; why we need a consistent vocabulary for IP quality; understanding the new value of verification IP.

popularity

Semiconductor Engineering sat down to discuss the impact of integrating IP in complex SoCs with Juan Rey, senior director of engineering at Mentor Graphics; Kevin Yee, product marketing director for Cadence’s SoC Realization Group; and Mike Gianfagna, vice president of marketing at eSilicon. What follows are excerpts of that conversation.

SE: Do we need to move to subsystems or more restrictive design rules, or do we have to learn to live with more uncertainty?

Gianfagna: There’s not a simple answer to that. One path says the big guys get bigger and the smaller guys die. The big guys have enough resources to do the requisite testing, and the world shrinks even further for suppliers that are more predictable. Some people call that the virtual IDM model. It’s the re-aggregation of the industry using firewalls. But history shows that every market resists being homogenized. There will always be the new kid on the block that does something just a little bit better and people will always want to use it for the competitive advantage it brings. Deal with the smaller company and make sure it works correctly because the new functionality will trump whatever everyone else is doing. There will always be that differentiation force. But when you’re talking about high-performance digital, which is basically analog, or true analog, that doesn’t work. It’s still a handcrafted design problem. Small shops make a good living. They come up with innovative solutions the big guys don’t and that people will want to buy. But it’s custom-made and it has to be verified each and every time in silicon until someone invents an infinitely accurate SPICE model. And that, by definition, is impossible because the foundry will never give you enough data to calibrate that model.

Rey: Even then you need to understand variability from the fab under certain conditions, and you don’t always get that. But the issue here is how can we improve this. One problem is communication. If you have IP, the question is when and if you will need access to it. The second involves communication across the industry. The whole idea is getting partners to communicate, and to communicate in an efficient manner. An example of this is an IP provider that has done all the characterization required by the industry. And they’ve done their IP so well that they get an agreement with the factory floor that says, ‘These kinds of things that you believe are corner cases are not corner cases, so we need a waiver for them.’ Even with that, you still need a way to communicate to customers that everything is fine and there is no problem. We have been working for some time establishing valid communication mechanisms so that when that waiving information is generated, it is properly preserved and generated and used for validation purposes on the floor. It’s not enough just for the IP provider to understand that and get an agreement with the manufacturer. There also has to be communication between the IP provider, their customer, and the final company that is integrating that IP. It’s a triangle, and everyone needs that established mechanism for communication.

SE: Does the ecosystem need to be reconfigured? Flows may no longer work as planned for complex SoCs.

Yee: The ecosystem is already evolving. How the IP providers are integrated with the tools providers and the foundries has changed. A lot more needs to change, though. For example, we work more closely with foundries than anytime in the past because we have to. The TSMC 9000 program sets standards for IP coming in, and other foundries are following suit. The foundries realize IP is critical. They’re not just saying talk to the supplier. They’re setting up programs to say, ‘On this process this is how we’re rating things.’ That has to be done because SoCs are getting more complicated and customers are demanding more. We talk to the end customers much more frequently about things like integration than in the past. There’s still a learning curve. We don’t understand everything because the foundries don’t want to share it with us. Customers don’t want to share everything with you. And as an IP provider we only want to say so much. But the flow is evolving. Jumping from 28nm to 16nm changes everything, anyway, in terms of the flow and how tools are being developed and IP is being used. We have been forced to evolve with that.

Gianfagna: There’s a concept called IP quality vocabulary. It’s a vocabulary of articulating what’s the delivered quality of what you’re getting. That’s early in its evolution. We have a long way to go in articulating what you’re buying, what the quality is, how well it has been simulated across process corners, how many times it’s seen silicon and under what conditions. We struggle with that. There’s no well-defined way of communicating that. What the foundries are doing helps. You see everything the same way on the TSMC Web site because it’s rendered with the same vocabulary. That’s a start.

SE: But if IP has been in silicon in 45nm, it also may not be the same as 45nm a year later because that process node may get optimized, too, right?

Gianfagna: In theory it should get better, but that’s not always the case. Different means surprises will crop up.

SE: What happens on test? Even if it yields, it may not function well.

Gianfagna: It’s not just test. It’s verification and test. This is where verification IP comes in handy. If you have a block that’s implementing a certain protocol you use verification IP to test that protocol. That helps. But in my opinion the amount of verification vectors shipped with IP is woefully inadequate. We need more. And even that isn’t enough, because what the IP provider can give you are the vectors they ran with a standalone piece of IP. That doesn’t say anything about how it works in a system when it’s processing real data packets or MPEG video. We have to bolt that together, use what was done before, and then enhance it with system test. The test problem is a little different. You’re trying to isolate and verify a bunch of blocks on a chip with limited visibility and very limited time, because test is very expensive.

Rey: There is a big effort to pinpoint failure, not just before manufacturing but also once the part has been manufactured. You now have a situation where you have to figure out whether this is caused by your own IP, or whether it’s because of some IP you are incorporating from somewhere else. Ultimately the robustness of the whole design depends on multiple IP components that see corner cases. This is part of the same equation. Things get more complicated for test as well as everything else, and you have to deal with more players rather than a group that is closer to you and more integrated.

Yee: Just because designs are getting more complex, test will be a challenge. The problem won’t be solved at the back end. The more you can do up front, the simpler it’s going to be at the back end. If you wait until the back end, it’s too late. You can kill all your time doing test and never find a problem, and chips are only getting bigger and more complex. If you look at the last 20 years, engineering time has shifted from design to verification. In general, the individual block has been tested. It’s not the protocol for standard IP that’s going to be the problem. It’s how you connect it. Most IPs are black boxes. Even if you want to test PCIe, most people don’t know what it is anymore. It’s what’s coming in and going out. It’s not what’s inside the black box that they’re debugging. It’s connecting one black box to something else and whether it’s all working together.



Leave a Reply


(Note: This name will be displayed publicly)