Experts At The Table: The Reliability Factor

Last of three parts: The effects of disaggregation, restrictive design rules and economic considerations.

popularity

Low-Power Engineering sat down to discuss reliability with Ken O’Neill, director of high reliability product marketing at Actel; Brani Buric, executive vice president at Virage Logic; Bob Smith, vice president of marketing at Magma, and John Sanguinetti, chief technology officer at Forte Design Systems. What follows are excerpts of that conversation.

LPE: Is a more complex supply chain causing reliability issues?
Sanguinetti: That does happen as a result of disaggregation. There’s third-party IP and the associated issues of putting it all together. The most common problem is when you buy a piece of third-party IP and you want it to do something just a little bit different. You make a change—or a third party makes a change and it isn’t fully tested. That’s the way bugs get introduced.
Buric: This touches the gray area between quality and reliability. That’s why people have standards. TSMC is developing TSMC 9000, which is just one measure of an ISO standard. When you build a complex device that’s how you make sure all sub-components conform to quality standards. That’s part of the equation. If you can’t establish quality standards you cannot manage a design these days.
O’Neill: Quality standards would mean things like toggle coverage and code coverage for validation and test programs?
Buric: Yes. Whatever you may need.

LPE: How disaggregated is the supply chain?
O’Neill: It’s becoming more disaggregated. Years ago when I started with Actel we designed the entire chip. We designed the I/Os, the logic, the routing and the programmable interconnect. Today, because of the complexity of our current and future generations of FPGAs, we’re sourcing pieces of IP from outside of our own company. There’s a supply chain issue there. We have to choose our suppliers wisely, as well as the products we purchase from them, and ensure that we’re consistent with our own verification and validation techniques. Going forward, we will do more and more sourcing of our IP as our products become more complicated.

LPE: As the foundries impose restrictive design rules at future process nodes, how will that affect reliability?
Buric: Restrictive design rules have always been there. They have not always been so obvious, though. They help reliability. If you don’t use them, then you have a fundamental failure in yield. Reliability is a marginal case of yield. They are establishing rules to minimize freedom of implementation on silicon, which will result in higher reliability. As the process matures, some of those rules may be waived. They can learn enough to see it is not impacting anything.
Sanguinetti: Over time, our coding requirements have gotten more strict. They don’t have the effect of design rules, though.

LPE: But isn’t this complexity hitting every part of the flow?
Sanguinetti: Yes, it is, and we’ve had to modify our product over time to put out RTL that is more regular and follows different rules. That has been a maturation process. But it’s on the order of 10s or 100s of rules, not thousands.
O’Neill: We see some movement in the end-user community and among their customers to impose coding standards, as well. If you’re doing a design for an industrial process at a petrochemical plant or something that’s safety critical you may be working to a certain specification imposed by intended operator of the plant. They will impose coding standards. Similarly, for commercial aircraft we’re seeing the certifying authorities imposing standards on the contractors, who in turn are purchasing FPGAs or ASICS to implement critical digital logic.
Buric: The European Community has been doing that with their contracts for the past 20 years. Companies have had to follow their codec style or they would not get the contract.
Smith: The challenge for the restrictive design rules is that if the foundries are too restrictive, people will not move to the next node. It will be too restrictive, too expensive, and far less attractive. We have a lot of customers designing at 40nm, but a lot of design is still being done at 180nm. There are reasons to go to 40nm—density and power—but there’s not free lunch. It costs more to do designs for all the reasons we’re talking about: verification, reliability and everything else.

LPE: Are the economics of reliability changing? Does it become more expensive to guarantee reliability at future process nodes?
O’Neill: Absolutely.
Smith: The number of rules and the number of cases you need to check for goes up, so it’s more expensive.
Buric: Reliability is becoming a liability. You have to design with that in mind and it costs you money. I have seen people doing it for alpha particles where it’s two-bit detection, one-bit correction. I’m now seeing algorithms with 8-bit detection. People need it or they wouldn’t do it. It costs silicon and it costs in the design.

LPE: Is reliability now part of the cost equation in developing a chip?
Buric: Yes.
Smith: For the fabs, definitely. They’re staking their reputation on being able to deliver a product that meets certain standards. Being able to test that and then being able to somehow abstract that and do a bunch of rules so folks like us can take those rules and make sure everyone follows them is a tough job.
O’Neill: From an FPGA standpoint, we’re not designing for a single application. We’re designing for a whole range of applications. We go from consumer electronics applications, which have a lifespan of two years, to military and aerospace systems that have to survive 20 years. We don’t have the luxury of just designing an FPGA that just survives two years. We have to design for the maximum lifetime. That really causes us to look very carefully at the tradeoffs between density, reliability, power and performance.
Sanguinetti: What would you do differently if you could design products just for the consumer electronics market?
O’Neill: Let the parts run hotter so the power density would be higher. There’s a tradeoff there, too, because you need low power to conserve battery.
Buric: You could save on the package, too.

LPE: Is reliability considered essential all the time?
Buric: It is company-specific. It’s hard to see a trend. In the consumer market, cost is critical. In that market, failure is measured against cost of replacement.
Sanguinetti: If you have a TV set that drops frames every 30 seconds or you get a blocky picture, you get a bad reputation and no one buys your products.
Smith: It depends on the application. In military and aerospace, it’s vital to be reliable. On the other extreme, if you go into the Hallmark store and pick up a card with a synthesizer attached to a battery, it has to last through the shelf life, but design for reliability is minimal.