Applying ML In Failure Analysis

When and where machine learning is best used, and how to choose the right model.


Experts at the Table: Semiconductor Engineering sat down to discuss how increasing complexity in semiconductor and packaging technology is driving shifts in failure analysis methods, with Frank Chen, director of applications and product management at Bruker Nano Surfaces & Metrology; Mike McIntyre, director of product management in the Enterprise Business Unit at Onto Innovation; Kamran Hakim, ASIC reliability engineer at Teradyne; Jake Jensen, senior product specialist at Thermo Fisher Scientific; and Paul Kirby, senior marketing manager at Thermo Fisher Scientific. What follows are excerpts of that conversation. Click here for Part 1. Part 2 is here.

[L – R] Hakim, Chen, McIntyre, Kirby, Jensen.

SE:  How has machine learning changed the failure analysis process?

Kirby: Whether you’re a fabless company or IDM, when you have a sample coming, you’ve got to get data from it quickly. You don’t have all the fab information. You just have to isolate that defect and then get the root cause data, and you’ve got to get it fast. Humans can’t do that anymore. We’re getting beyond the point where artisan FA processes are going to work. You need automation. It’s not like fab automation; we’re not talking about putting a sample in and you get data out.  We’re talking about automating pieces of the application, so that you can guarantee to get the right answer. That’s where we see the biggest application right now with machine learning. In fact, we just released a new TEM a couple of months ago with a lot of that capability.

SE:  Can you provide an illustrative example of the machine learning you’re using?

Kirby:  If you’re creating a recipe and you don’t have the CAD data, and you don’t have the time to invest in doing it manually, create a recipe for an application around that particular defect on that device. Machine learning can really help you spot patterns, what type of device, what type of features you’re looking at, and then create a recipe based on that information for imaging analysis and the milling.

Chen: In particular, we use it for the automatic defect classification. One of the challenges has been that even though you can collect lots of data from your tool, you may not have a lot of reference data or labels. So you can use limited data for how to generalize your models as best as you can in order to reuse them. That requires some additional work — for example, combining computer vision to simplify the feature space, or combining some physical understanding of the process to constrain the model. That way you can use fewer data points, or you can augment your data to get a robust model. Jumping straight to deep learning is quite challenging with a limited amount of data, as it tends to have overfitting issues. You can be creative within that space, and it could be a multi-phase approach. One thing we’ve done is use a workflow that requires unsupervised learning, where you don’t need to have labels, and that might guide you to help with the failure analysis. Once we get references for these defects, you can ignore the others. Then you do a supervised learning model.

Hakim:  We do not apply machine learning to the FA process. We approach it in a manual way at this point. But we are attempting to use machine learning in other areas, like testing of the device. However, that is not a trivial task. The number of patterns that are used during sort or final test is upwards of 50,000 different tests — all the way up to 110,000 tests. So how do you use a matrix that has something like 100 and 100,000 columns of data, where each row is going to represent one device? This is a huge task to accomplish, and it needs a lot of computing power. For instance, for a given test, you have a specification with a pass-fail criterion. However, there are interactions between these different test vectors. Machine learning is going to highlight the first-, second-, third-, and fourth-level interactions, and those interactions become very beneficial. It can identify signatures that could be used to do failure validation and verification.

Kirby: You are all talking about the use of machine learning to look at these large volumes of data and make sense of it all. I don’t have the same volume of data. We’ve got very artisan, decades-old failure analysis, workflows, and applications that grew out of material science labs. The application of machine learning in this space is a little bit easier because there’s a smaller set of input variables.

McIntyre:  We’ve had enough experience with the various models and various machine learning techniques to start recognizing that these machine learning routines have personalities unto themselves, much like subject matter experts. These are data scientists telling me this. You may end up finding out that certain machine learning models are more beneficial in certain places, and less beneficial in others. We’re now trying to figure out which machine learning model works best for specific situations, because they all do a good job. But when we’re trying to guarantee zero escapes coming out of a factory based on image classification, you can’t miss one in 10,000 defects. You got to be able to be precise with your classifications.

SE:  With different algorithms, does it come back to one size doesn’t fit all?

McIntyre:  Yes, because I’ll get an answer from any machine learning. But whether I get a good answer, or the best answer, is going to be very dependent on how many options I put into my system. And just like back in the old days, you talk to an integration engineer, then you talk to a different integration engineer, and you get two answers. If they’re aligned, you believe it. If they’re different, then you ask a third engineer.

SE:  How do you maintain your models when the data shifts?

McIntyre: Models definitely will walk, but not as fast as people do. But they can walk over time as the data shifts.

Hakim: Standardization for machine learning is needed. Today, machine learning seems more subjective. It is like having a unit of money without standardizing it across a nation. As a result, the money I have is going to have a different value compared to the money that somebody else has. You really need to come up with a standardization methodology. I do not believe that has been really addressed, because if you look at the papers that have been written, people attempt to repeat the same experiments and they try to break what another person has done. And it’s really becoming a conflicting set of answers with regard to what’s happening.

Read Part 1 of the discussion: Streamlining Failure Analysis Of Chips.
Read Part 2 of the discussion: Isolating Critical Data In Failure Analysis.

Leave a Reply

(Note: This name will be displayed publicly)