Chip Reliability Vs. Cost

CEO Outlook: Market shifts, higher productivity per engineer and the overhead and opportunities for security and reliability.

popularity

Semiconductor Engineering sat down to discuss the cost, reliability and security with Simon Segars, CEO of Arm; Joseph Sawicki, executive vice president of IC EDA at Mentor, a Siemens Business; Raik Brinkmann, CEO of OneSpin Solutions; Babak Taheri, CEO of Silvaco; John Kibarian, CEO of PDF Solutions; and Prakash Narain, CEO of Real Intent. What follows are excerpts of that virtual conversation, which is part of the ESD Alliance’s annual outlook. For part one of this discussion, click here and part three here.

SE: We can’t just develop one chip and have it run across 100 million units anymore. How do we amortize these costs? Is it chiplets, new architectures, or some other platforms? And what does this mean for the industry as a whole?

Narain: One interesting trend is that a lot of the system houses have started doing their own chips, and a lot of it is for their own internal consumption. If you look at Google, they have the cloud infrastructure. A lot of chip designers have custom applications. It’s just that the scale of computing within a system house is so large that it is now cost effective. The more customized chip design, the better for us, which is why Cadence and Synopsys have been doing so well. If you look at the S&P 500 over the last year, they are in the top 20 as far as their performance. Our industry is benefiting from the fact that there’s a lot of customized chip development.

Taheri: I agree with Prakash. When chip design is largely for consumer electronics, a lot of OEMs that provide a full solution to the customer want to reduce their supply chain costs. They want to cut out some of the middlemen. These are giant companies that put out laptops, cell phones and such, and they have initiatives to design their own custom chips. On top of that, if you look at automotive, they’re usually behind consumer industries by 5 to 10 years, depending on the technology. So yes, everybody looks at different architectures, different packaging, whether they can do it for wafer-level chip-scale packaging, whether they can use through-silicon vias. They look at all of that in every industry, but there are phases in which these things happen. So consumer goes first, perhaps automotive goes next. It’s lagging because of the fact that they have a qualification process. But this trend also is picking up in automotive. Car manufacturers like Audi and BMW are trying to bypass Tier 1s – the Continentals of the world — and go directly to Tier 2s, which are the chipmakers, in order to reduce margin. As this trend continues, there will be need for new tools to be able to help the OEMs and the bigger companies that are doing this themselves, which are not as well versed as the Tier 1s. Then you would you would add new tools, new capabilities, new know-how to help them to do that. That’s part of the trend that I see is changing, as well.

Segars: Behind that is a change in the business model, which fundamentally pays for the development of these devices. The automotive companies are looking at the future of mobility not from a point of view of, ‘I sell a car for X dollars, and those X dollars are going to pay for all the components that went into it and help me to ultimately amortize my R&D.’ It’s more thinking about, ‘This is an object that comes with these services.’ That completely changes the relationship between what it costs to develop the thing in the first place, the revenue that this object generates throughout its lifetime, and who pays out when it’s a completely different lens to look through, when you look at the cost of developing all the electronics that go into a component like that, because it is so different. This shift includes many more things becoming service-driven — as opposed to a one-up fee that pays for the cost and all the software that’s in it. When it leaves the factory, and it’s just done. That does change the overall economics of how chip development, and all the tools and IP that are required, get funded in the long run.

Kibarian: If you look at software, that stagnated until Google came along. And really, Google innovated on the business model. They sold advertising to fund software. They didn’t actually sell their software. You don’t go and buy Google software. And the chip industry has been one where we sell silicon and fund all of R&D. And for at least the last decade, customers said, ‘The software stack in the silicon keeps going up and I sell acreage of silicon. My R&D on the factory keeps going up, but my capacity is somewhat limited.’ In the end, that technology enables a service that is far more valuable than that acre of silicon or the cost of manufacturing. Using silicon to innovate on services will be the best thing that happens to the chip industry over the next 10 years because it will enable a new way of funding technology both the silicon as well as the architecture and the software systems that run on it. That is super important for us.

SE: One of the things that affects cost is reliability. A lot of this started out of the automotive industry, but we’re even seeing it in smartphones, where used to think about a two-year cycle. That’s now a four-year cycle. And in the cloud, data centers are looking for at least seven years. How is this impacting both design and cost, and how will this play out? And there’s another piece that gets overlaid on this, which just makes it more complicated, which is that security is now a part of reliability in some of these industries as well that you layered into it.

Sawicki: We’ve already seen a trend where tools to do reliability analysis — whether those be dynamic simulation-based, static topology checking, physical verification types of things, or qualification of processes that are more detailed — all of these have started to to ramp up significantly over the last few years. If you look forward, tying both reliability and security together, what we see is a trend to take our industry forward in terms of the tools and IP into what we call lifecycle management. We’re having on-chip monitoring that can look for things like security intrusion and parametric drift in the circuitry. That’s going to be part of not just safety-critical types of systems like automotive, but in data centers where they don’t want to have to wait for the thing to crash to know they have a bad machine. They’d rather be able to reliably manage that process of taking things offline so that their customers don’t see that aspect. That’s one of the bigger trends we’re going to see going forward, and it’s going to start to take up a reasonable, though not huge, part of silicon area, as well.

Taheri: Reliability is becoming critical. I used to design mil-spec standard parts. The MIL-STD-883 and several others were dictating what the reliability conditions should be. So one of the critical things that I see is that you really need to go to device level and be able to have a very reliable, SPICE-like model that has all the reliability impacts included so that you can accurately model these things. There has been a shortage of that. We are putting a bigger foot in it, and I’m sure the industry is doing the same. We need to go from the bottom up to provide solutions so that it’s simulated to a level that you can predict reliability more accurately.

Brinkmann: We see people looking into this problem from multiple angles at the same time, and we see also trends toward standardization. Functional correctness is one aspect that we have been working on in the verification side for many years. Functional safety is a not-so-new addition to to address reliability. And lately, trust and security are coming into play. That involves the security itself, but also trust in the supply chain. These are all correlated in different dimensions, but they need to be looked at at the same time during the design and verification process. So there’s a lot of work to be done on defining metrics, specifically on the security side, to allow for proper planning and tracking across these dimensions. On the standards side, the combination of standards like ISO 26262 and 21434 is a challenge for our customers right now. You see all these new elements that people have to address in their design flow, in the organization, and in verification, which is one of the key pieces to actually ensure this reliability. When we look at security specifically, there are multiple levels for how to address that. Hardware security is really at its infancy right now. It’s always incomplete and outdated, which is something that is new to us. When you have built a reliable system in the sense of functional safety, you can model what can happen and what can go wrong. That has allowed us to build systems that are reliable in multiple ways. But attack vectors and other things are coming into play when you look at security, and these are changing over time. So the flexibility of updating and manipulating configuring systems in the field is an important factor, but you also have things coming into play at the organizational and supply chain levels. When we look at security we’re basically looking at a security culture. You have to establish widespread security expertise across engineering organizations to address this. It’s not really something people already have on their agenda. And last but not least, the supply chain needs to be looked at. If you just look at the silicon supply chain, that’s one aspect. But if you look at incident response, then you need flows among the OEMs, the Tier 1s to Tier 2s, and even IP providers, in order to address the security challenge and liability in the field. This is something new, and obviously PLM and other technologies stretch out to the design phase now in order to address these things.

Narain: Reliability is triggering a lot of activity. There are many different aspects of the design that have to be analyzed for reliability. Increased focus on reliability creates a lot more work. But there are only so many good engineers available, so the pressure on productivity per engineer goes up very dramatically. That is one of the aspects that is driving advancements in the tool chain. And then there are newer, more subtle failure modes. Your chip will die will be dead if you don’t properly take care of clock domain crossing. And you cannot address it using traditional verification methods like simulation and formal. Testability is a very different beast. And now we see recent domain crossings increasingly creeping into reliability. So now we have the combination of pressure to optimize productivity per engineer, reliability concerns, and security that is really in its infancy. A more holistic approach is needed, and these pressures are enabling an opportunity for us to provide solutions which will address these problems.

Segars: There’s no question that reliability, functional safety and security are big, big trends that are changing the way we think about design — not just the configuration of the transistors, but the methodologies behind design, as well. The functional safety packages that we put together for some of our products are all around documenting process and providing transparency, so that in the event of an issue you can trace it back to what happened. It adds surprisingly large overhead in the cost of doing the design. That’s just a reality. This is a much more complex problem that you’re looking to solve, and it is costing more. It does require engineering effort, and therefore the ultimate product is going to cost more. It’s the same for security. I am a strong believer that in the future anything that is connected to the Internet is going to have to have some service behind it that is monitoring it for security, enabling you to do over-the-air updates and things like that. Otherwise, any thing is a way in for somebody who wants to go and and hack into a network. So that, again, requires more sophisticated devices. And you can’t just optimize for cost. You have to take security and reliability as key design criteria as you’re setting out to create these products. But again, it comes back to the business model that pays for it. A lot of the reasons that devices are gathering data is due to the fact that the data is so valuable, and somebody along the way is making money from that data. That data relates to consumers in a lot of ways, and consumers need to care about it. That means that as an industry we’ve got to come together and define some standards around how you articulate security, how you measure it, how you provide comfort to people who don’t have advanced degrees in semiconductor design that they can rely on the security of a product. All of that takes work. All of that takes a lot of collaboration. You can’t do that without thinking about who is ultimately paying for it and the business model that creates it all because there is no free lunch. And if everyone takes a view of, ‘No one’s going to pay for security bcause I’m not that interesting to a hacker,’ and if you just optimize for cost, then ultimately we lose out on the opportunity that’s in front of us. A data-driven world leads to a lot more efficiencies. So you can’t just focus on the number of transistors. Of course we have to make that as efficient as possible. But we have to think about these issues in a much more holistic way.

Related
2020 CEO Outlook (part 1)
Impacts of the global pandemic and the rising cost of chip design.
Challenges For A Post-Moore’s Law World (part 3)
More customization and a different message for the chip industry.



Leave a Reply


(Note: This name will be displayed publicly)