Liability And Reliability

The daunting implications of more data everywhere.


As systems vendors accelerate the development of their own architectures, semiconductor companies across the supply chain are getting a seat at the table for architecting the engines in those systems. Rather than competing for a socket, they are directly involved in strategizing the optimal solution that can make a systems vendor or OEM more competitive or far more efficient. That gives the developers of chips, tools and IP a much better understanding of how these devices will be used and what kinds of constraints and operating conditions they will be subject to.

This changes the economic equation for the entire supply chain. It’s no longer about competing for a socket based on squeezing pennies from the final cost. The goal is to develop the most efficient solution, which may be a combination of lower power, higher performance per watt and per square millimeter, and greater functionality over a given period of time. So rather than slightly reducing the cost for billions of units, these designs are meant to save tens of millions of dollars for one or several key customers.

But in taking on this kind of effort, rather than just buying off-the-shelf components, systems companies also are distributing the potential liability. Many of these designs are being used for mission-critical and safety critical applications, and the big companies expect them to work as designed. That, in turn, is driving a big push for improved reliability, which in turn is fueling demand for better test coverage and more data analytics.

There are three elements to this, each with its own risks and benefits.

1. Proactive. The ability to measure minor changes in electrical current, temperature and vibration, both in-circuit and with flexible external devices, adds a whole new opportunity for preventing catastrophic failure, particularly in complex systems. The basic philosophy of design in these systems is no single point of failure, but being able to address potential failures ahead of time can both improve uptime and reduce liability costs in case of a failure. Given that the chip industry is now deeply engaged in the design, it can more accurately focus on projected use cases and limit the number of corner cases that need to be considered. That also limits the amount of margin that needs to be built into a design to deal with all known worst-case scenarios.

2. Predictive. This kind of data also can be used to predict not just that a failure will happen, but when it will happen. For an industrial assembly line, it can mean the difference between scheduled maintenance and required maintenance. And for a data center with thousands of servers, it could mean the difference between regular upgrades of all equipment to avoid problems and being able to push out that upgrade by months or even years because of the failure of a handful of servers or network switches. The problem here is getting the modeling right, because pushing the limits based on wrong or misinterpreted data can easily turn into a finger-pointing exercise if something goes wrong.

3. Post-production. Having an understanding of how devices behave post-production adds both a check on existing designs and quality, as well as opening up a whole new business for monitoring equipment remotely. Today, most of this knowledge stems from return merchandise authorizations (RMAs). But with precise data tracking across the supply chain, the problem can be pinpointed to a particular part, when it was produced, where on a wafer a particular die was, and which piece of equipment was used to process it with what gas mix from which vendor.

In the past, this kind of data generally was presented as a report, and the data was so voluminous that no one could understand it. Going forward, much of this data analysis can be automated using machine learning tools to find patterns, making the data much more relevant and potentially automating the remediation.

This is a very different way to use data, and one with enormous possibilities for improving efficiency and reliability everywhere. But it also points to a world where everyone is accountable, with the underlying hope that the data is correct.

Leave a Reply

(Note: This name will be displayed publicly)