Tracking Down Errors With Data

Optimal Plus’ CTO looks at how data can be deployed to improve quality and yield, and to find out what went wrong.


Michael Schuldenfrei, CTO at Optimal+, sat down with Semiconductor Engineering to discuss how data will be used and secured in the future, the accuracy of that data, and what impact it can have on manufacturing. What follows are excerpts of that conversation.

SE: Can data be shared across the supply chain?

Schuldenfrei: We believe it has to happen. If it doesn’t, the industry will never achieve its quality goals—and the big OEMs can’t permit that to happen. Ideally, suppliers and customers will come together and create a consortium or reach some kind of agreement that they’re going to share data together. It will be very similar to what we saw in the semiconductor industry a couple years ago when our customers were asking their suppliers to enable them to run software on test equipment. That was unheard of 10 years ago—to have data transparency into the supply chain. But once the major customers made that happen, it became a reality in the industry. Once it started, the suppliers began to see it as a win-win because they no longer had to spend as much time on quality because the data was available, allowing them to get to the bottom of issues very quickly. Additionally, suppliers can use data from their customers to adapt their quality controls to the specific application. This means they can improve quality while actually increasing yield at the same time.

SE: What can you do with all of this data? Much of it has never been used in the past. What’s going to change?

Schuldenfrei: There are very practical examples now. A year ago, this was all theoretical. We’ve already done work on this with customers working with their suppliers. If you take data of a bunch of board failures, there doesn’t seem to be anything in that data that can explain the failures. But if you then correlate those boards back to the chips that are in those boards, all of a sudden you see that it’s related to a pattern on the wafer. So there’s something going on during the wafer manufacturing that isn’t being caught at wafer test, but which is causing the board failures downstream. Without connectivity between these two segments you have no way of seeing that or preventing it. But we’ve seen examples where you correlate those two and you can see the pattern immediately. It’s now very easy to stop the problem and put in preventive measures to stop that from happening again.

SE: Where can you go with this? What comes next?

Schuldenfrei: We believe it will become a generic and accepted business model. That will happen across many different industries. It will grow both in the number of companies that are involved, as well as in the breadth of the data coming in. So rather than just manufacturing data, it also will bring along in-use data as well as telemetry data. If you think about the connected car, it will allow adaptive preventive maintenance. You can identify exactly which cars need to be called in at which point in time by combining manufacturing data with in-use data.

SE: You’re focused on quality, but there is an underlying concern about the security of that data, right?

Schuldenfrei: Yes. How do we make sure all of this data is moving around in a secure manner and not enabling one party to understand the secrets of the other party? That’s why we believe the third-party hub in the middle is so critical.

SE: How much of an issue is that?

Schuldenfrei: We fully understood that if you want to do this, you have to make the data anonymous. On one hand, if you want to look at manufacturing data and know exactly how it has behaved, you need to have the accurate measurements. But you don’t necessarily need to know the name of the test you’re looking at or the other side that can provide IP or some insight of what the product was doing. The same is true for end use cases. You bring back information about equipment out in the field, but you don’t have to identify which particular piece of equipment. That can be kept in a separate silo, so only when you have to do a proactive recall there can be a system or service out there that converts the anonymous data back to a specific pointer. Then the vendor can alert a person that they need to take a device into a repair shop.

SE: How much of this can be automated into AI or machine learning? At that point, the technology would get embedded rather than being entirely external.

Schuldenfrei: There is always going to be a combination of the two. For example, machine learning is embedded in autonomous cars. Cars are learning how to drive. That’s all good within the context of one car, but you still need the ability to aggregate a whole fleet of cars. The same is true for fridges and telephones and any other piece of electronics. There will be a level of machine learning going on at the end device. That’s pretty expensive, so not every device is going to have its own machine learning capabilities. But there will be a lot of machine learning in the cloud or inside of company headquarters that is looking at the overall broader picture.

SE: How much of this is pattern recognition taken to the next level, versus taking that data and then building pattern recognition from it? What’s the starting point?

Schuldenfrei: It’s going both directions. We have situations where companies are trying to develop pattern recognition on wafers based on parametric data across multiple wafers.

SE: So you’re creating the Gaussian distributions and trying to figure out what’s not inside those distributions, right?

Schuldenfrei: Exactly. There are some different algorithms and methodologies going in. But underlying all of it, even before you get to the pattern recognition, is which machine learning algorithm you’re using. What kind of patterns are you detecting? Are you doing deep learning using decision trees or random forests? The actual algorithm is the smallest part of the problem. It’s the infrastructure around that—gathering the right data, identifying the relevant parameters in the algorithm, executing that algorithm on an ongoing basis, connecting it to all of the business systems that can take action. We’ve got all of that working across the supply chain, and the data is coming in from all suppliers, not just one supplier. We can deploy that automatically into the supply chain. We can apply that model to the results and actually change the binning of those parts. It’s the entire system, not just the machine learning.

SE: Do you ever have any reason to question the accuracy of the data?

Schuldenfrei: We always question the accuracy of the data. Is ‘good’ really good, and is ‘bad’ really bad? With the question, ‘Is good really good,’ very often it’s a matter of everything is in a silo and you don’t always see the big picture. With, ‘Is bad really bad,’ we gain a lot of yield, throughput and efficiency in manufacturing because the test data was unreliable. The test equipment told us a part had failed, but it hadn’t. There may have been a piece of dirt on the probe.

SE: Or you don’t pick up an aberration because the test failed, right?

Schuldenfrei: Yes. We find about 1% of parts that test good are questionable results, and 1% of the parts that tested bad also are questionable. That doesn’t mean the parts are necessarily bad or good, but on average, about 1% of the data is wrong.

SE: Do companies set a threshold of what is good enough for them? The context for this is that sometimes parts are used in safety-critical devices and the manufacturer doesn’t know that.

Schuldenfrei: Today, manufacturers of any devices using semiconductors will require their suppliers to achieve certain levels of quality. Quality can be measured in defects per million or defects per billion. So yes, people will say they will accept 100 defects per million or 50 defects per million. The more components going into an end device, and the more mission-critical those devices become, the more unacceptable errors become. This is why some people are saying defects per million isn’t good enough. It has to be defects per billion. But every step you take on that scale adds cost, and the costs can become prohibitive. That’s why you need more intelligence to catch that additional number of escaping parts.

SE: It’s a cumulative effect, isn’t it? So now there are more electronic parts in a car, and if you’re at the latest process node there is no historical data.

Schuldenfrei: One car company says it uses 7,000 chips in a high-end vehicle. Because there is an average of 1 defect per million for each chip, a car fails every hour out of the factory. Last year, for the first time, the number of cars coming back to repair shops for electronics issues was higher than the number coming back for mechanical issues.

SE: That’s a big shift.

Schuldenfrei: Yes, and it’s new. Some big gorilla will decide this isn’t acceptable and everyone will have to comply and start finding ways to smartly share data, reducing the risk and exposure as much as they can.

SE: So the capability is there to prevent this. It’s just a matter of will the industry change the way it does things, right?

Schuldenfrei: The capability is getting there. Companies use data sporadically with a very limited number of suppliers. Cisco is talking publicly about this. They’ve done it with one or two of their suppliers, not all of them. It’s very difficult to repeat across the supply chain because there is no standardization of systems.

SE: What happens with over-the-air updates, which may not cause problems in every case? Can the data reflect that, as well?

Schuldenfrei: Yes, but not directly. Knowing attributes about the environment in which a particular piece of hardware is operating can be used as an input for these kinds of algorithms. So with over-the-air updates, are the problems localized to specific versions of software running in humid areas when the driver is doing this early in the morning? That combination of environmental factors may cause problems with this particular version of the software. A glaringly obvious problem is going to happen in manufacturing. The problems in the field will be a combination of different parameters. Some may be due to manufacturing, some may be environmental or geographical. One company found problems with a particular supplier. It turned out to be those parts that were being shipped at a particular time of day from that supplier because temperatures were higher.

SE: A lot of components electronics companies are using, though, are black boxes. They don’t know what’s inside. At that point, you are reliant on outside data, right?

Schuldenfrei: It requires knowledge on both sides to be able to troubleshoot the problem. But even without any of that knowledge, if you run a correlation along the supply chain, those outliers very often point to the problem. So the electronics manufacturer might see the problem occurs with test number 1,000 from the semiconductor side. They don’t understand why, but now they can point it back to the semiconductor manufacturer to troubleshoot the problem.

Related Stories
Big Data On Wheels
As the market for chips in cars grows, so does the amount of sensor data that needs to be processed.
Data Leakage And The IIoT
Connecting industrial equipment to the Internet offers big improvements in uptime and efficiency, but it adds security issues.
How Good Is Your Data?
As machines begin training and talking to other machines, the question takes on new meaning.

Leave a Reply

(Note: This name will be displayed publicly)