Metrology Sampling Plans Are Key For Device Analytics And Traceability

Determining the common factor shared by otherwise isolated events.


A mother steps on the brakes, bringing her car to a stop as she drops her kids off for dance lessons. At the time, she doesn’t notice anything wrong, but when she takes her car in for its regular service appointment, the mechanic conducts a diagnostic check and discovers that the primary brake system on the car had failed because of a faulty braking controller without anyone realizing it. Fortunately, the car was able to stop successfully due to the vehicle’s system redundancies, and the dealer’s diagnostic test confirms that since that first chip failure, another one has not occurred. The braking systems are behaving normally.

Following that, the dealership sends the information about the braking failure to the manufacturer, where an analyst notes that over the last 60 days, and around the country, six other brake failures traced back to the same controller system have been reported for the same make and model. In each of these situations, the backup system successfully brought each car to a complete stop. And, as in the case with the mother who dropped her kids off at dance class, the analyst looks at the reporting samples for these six other failures and determines that each is isolated and non-recurring.

So, what happened, and what may be the common factor shared by these otherwise isolated events?

To find those answers, the manufacturer begins executing a failure mode and effects analysis (FMEA) on the braking systems for the particular make and model, and they do so, in part, by using analytical software with genealogy and traceability capabilities. Depending on the findings, the manufacturer will determine whether a general recall is necessary.

After a thorough analysis, the manufacturer identifies that the faulty chips came from the same supplier. In addition, the chips are from a single delivery the company had received from the system parts provider. Once the parts provider is notified of this, they conduct their own analysis and determine that multiple elements in the six braking systems were made at roughly the same time in the same semiconductor factory overseas and that the chips in each of the six appeared to be good when initially inspected. In fact, the performance and electrical results of the bad chips met specifications and were within expected distributions.

The factory is approached to determine if there are any unusual signals surrounding the materials used in the construction of the faulty braking systems. Beyond the normal 100% validations checks, outgoing visual QA, and electrical test structure wafer acceptance tests (WAT), no other in-line data is directly recorded on this material. In this case, this is where the analysis stops due to a lack of direct data about the construction of the faulty chips. But it doesn’t correct the problem.

Beyond the electrical performance characteristics of the parts in question, semiconductor factories do not generally capture sufficient or representative in-line metrology data for all the parts they manufacture in their facilities.

In today’s fab environments, executing part-level analytics on a process-by-process basis is impossible given the most common measurement sampling strategies, strategies that have been configured for process control and process sustaining purposes (figure 1). These monitoring and control strategies are woefully inadequate when applied to product analytics for parts-per-million traceability. Even with good data extrapolation and expansion packages, the error bars on any representative data created by these approaches would create results that are insufficient to draw conclusions. To put a finer point on why data extrapolation does not work, consider the multiple sources of error that naturally occur in manufacturing. Extrapolated programs must account for measurement tool variation (gage repeatability and reproducibility) and film variation across the wafer, the lot, lot to lot, and tool to tool. Accounting for these errors from a practical standpoint means any extrapolated value could have error bars consuming over 50% of the allowed process variation window.

Fig. 1: A high-reliability system requires knowledge of where each and every component originated. Easily accessed genealogy is a prerequisite.

While the initial investigative event depicted above is fictious, the outlined approach to isolating the problem is a likely scenario, as is the lack of necessary data to complete the comprehensive diagnostics. Moving forward, manufacturers, especially those providing parts for the automotive and medical industries, will have a choice. Collectively, they can decide to continue to design solutions with inherent redundancies and create ever more restrictive guard-banding and part average test (PAT) systems to eliminate questionable parts and mitigate failures when they occur in the field. Or they can increase the frequency they perform metrology in their manufacturing operations in order to ensure they have a sufficient amount of data for every single unit produced at their factories. In this situation, gone would be the days where sampling 13 points on a wafer, two wafers in a lot, and one lot out of 20 would be sufficient.

Today’s manufacturers need to create metrology sampling plans for device analytics, making this data available and traceable to downstream component manufacturers and customer-facing system providers. Furthermore, linking these more comprehensive metrology coverage plans to the available fault detection and classification (FDC) software results from process tools, along with maintenance records and process source chemicals and gasses (GAC), will ultimately provide the diagnostic analysts with a complete and actionable FMEA.

At the end of the day, it will be the end customer who drives the best solution.

Leave a Reply

(Note: This name will be displayed publicly)