Isolating Critical Data In Failure Analysis

Why a shortage of data often impedes root-cause analysis.


Experts at the Table: Semiconductor Engineering sat down to discuss traceability and the lack of data needed to perform root cause analysis with Frank Chen, director of applications and product management at Bruker Nano Surfaces & Metrology; Mike McIntyre, director of product management in the Enterprise Business Unit at Onto Innovation; Kamran Hakim, ASIC reliability engineer at Teradyne; Jake Jensen, senior product specialist at Thermo Fisher Scientific; and Paul Kirby, senior marketing manager at Thermo Fisher Scientific. What follows are excerpts of that conversation. To view part one of this discussion, click here.

[L – R] Teradyne’s Hakim; Bruker’s Chen; Onto’s McIntyre; Thermo Fisher’s Kirby and Jensen.

SE: How does data management and analysis differ now that so much of the data is being generated from images?

McIntyre: It comes at two layers. One, there is the need to continually expand your ability to data min, and that means launching tens of thousands of analytics — most of which are going to be dry wells. Hopefully, if you’re in control, they’re going to launch and come back with no difference found or no significant problems found. But you are going to get a considerable amount of results from increasing the data mining and increasing the size of those lakes that you data mine against. The other layer is in the machine learning space, where you’re not looking for a definitive answer. You’re looking for a general direction that you can look at in more detail. These are the self-trained models that we use for automated image classification or complex analytic model building, based on the results that come from the testers. There are new techniques, but it ultimately also comes down to how much you can automate and get results out on a regular basis. Then you’re mining for those results. How does that result match the physical reality to what the TEM showed me? What did the tester data end up giving me from a one-sigma or two-sigma range?

SE: Does a failed die provide insight for how the neighboring die performs?

McIntyre: Definitely. But we’re data starved in manufacturing, primarily because manufacturers sample for control purposes. From a process control standpoint, you can probably get away with nine points on a wafer, two wafers in a lot and one lot out of every ten. But when it comes to a failed part, you have to go back and look at what was physically happening on the wafer. In the factory, was that lot even measured? Was that wafer potentially measured? Do you have close to physical measurements on that die? If so, I can begin to make corrections to my process. But the odds of that happening are rare.

SE: Are chipmakers then adopting 100% inspection on the most critical layers?

McIntyre: Chipmakers are targeting a smarter set of coverage. You now can start taking more advanced samples to understand what’s my variation within the die, die-to-die, wafer-to-wafer, lot-to-lot. These things have to add up, and you have to be cognizant of this when you’re doing your physical benchmarking to help you with your analysis, and ultimately diagnose your failures.

SE: Is it possible to get that kind of information during yield learning on a new product?

McIntyre: That’s probably your best shot, because that’s where you’re going to give the largest suite of diagnostics, especially in the test realm. Once a part qualifies, a manufacturer looks at the 4,000 tests and says, “Okay, I can get by with 400.” But there’s a reason the 4,000 were in there to begin with. If I’m now throwing a dozen parts together, and you picture each one is in the two-sigma range of their own behaviors, you could very well create an edge case where I fail every single component in that complex system that is passing test. How do you do that FA?

Chen: From the data mining of the data lakes, and supplying that amount of data, that’s essentially the problem that we deal with in advanced packaging for failure analysis — especially NPI, which is a good place to start. They’re doing more failure analysis to understand where exactly they need to be on their sampling strategy and the sampling rate. It used to be hard to find these trends if it was a die-to-die trend, wafer-to-wafer trend, or a wafer-level trend, which is a few data points on a wafer. The standard workflow is to take a few dies and send it to the FA lab. It would take a couple days to get PFA or get 3D CT. Even then, it’s still hard to see that you have a lot of fine warpage happening within the die. And then you have a wafer trend, where there’s some chuck non-uniformity and wafer thickness issues. Once we supply the data lake that’s what some of our customers have seen. Then they get to trace back to notice some strip-level maps, or these quadrants seem to fail. Then they see where the dies came from. ‘Oh, they’re all from the edge of the wafer.’ So what happened at the edge of the wafer? The dies are too thin. They’re more susceptible to warpage. Supplying all this data gives visibility. By having it available you can backtrack and trace to the issues. That’s been very valuable.

SE: You use the word trace. Is part of data analytics making sure you have traceability so you can connect the data?

Kirby: In the front-end metrology processes you need that reference data, which usually comes from TEM. And now you need reference data on the package structures, as well. So today you are reliant on much more FA data, and that’s why we have to improve cycle times through automation. It mainly is about being able to handle large volumes of data, and the cycle time is really key.

Chen: This certainly requires some automation. For example, you need the right material handling. Are you tracking the IDs, magazines, and unit-level labels. And are you making sure all that data is tied together. When you’re doing it manually there could be errors. A certain amount of automation is needed to make sure all those linkages are there. Also, for communicating with the host, because the data needs to be pulled somewhere that’s accessible to multiple tools. That communication has to be established via standards.

SE: What else do we need to do more of?

Kirby: This seems trite and simplistic, but just more data – more TEM data, better data, with more repeatability of the data, and absolutely capturing all the data that you need from the sample. Faster time to data, of course. These are not new trends. These have been going on for the last few decades. And they’re not slowing down. They’re getting faster.

McIntyre: I come at this from a data standpoint insofar as we build metrology tools that are reliant on the physical measurements that come off of TEMs and SEMs. Today we’re data starved when it comes to our analytics and finding the root cause, because it isn’t an overstress of the part. There’s some other reason – a thin metal or small via, the actual physical conditions that create the overstress. That’s all data-linked, and you have to be able to get to the right data to draw your conclusion.

Read part one of this discussion: Streamlining Failure Analysis Of Chips.

Related Reading
EDA’s Role Grows For Preventing And Identifying Failures
Finding root causes of problems shifts left, fueled by better data and improvements in tools.

Leave a Reply

(Note: This name will be displayed publicly)