Are All Known Good Tested Devices Created Equal?

Post-mortem on RMAs backed by Big Data Analytics prove otherwise.


Your known good parts all had passed their required wafer sort, final test, and system-level tests and were shipped to your customers. However, as we all know, a known good part or device sometimes does not stay good and may end up failing prematurely in the field and flagged as an RMA (return material authorization) by your customer. But why is it that some good parts fail early and others last for their expected lifetime? Moreover, if your customer has little to zero tolerance for quality or reliability issues and requires strict minimal DPPMs (defective parts per million), how can you ensure to your customer that you have met their requirements?

As it turns out, it is no longer a big mystery why some good parts fail early and why others do not. The key lies within the data captured throughout the manufacturing process, much like that of a collective DNA profile. In almost all cases, there is a strong correlation between parts that pass all the requisite tests but fail prematurely based on how they performed and compared with other known good parts during manufacturing. What if you had the ability to foresee the future and safeguard against having these borderline good devices from being shipped to your customer and avoiding an early field failure? It then begs the question, “Wouldn’t you want to prevent this from happening if you could?”

The good news is that with today’s sophisticated and comprehensive Outlier Detection (OD) algorithms (also commonly referred to as Part Average Test or PAT), combined with Big Data Analytics engines tailored for the semiconductor and electronics industries, there are ways to ensure tighter quality measures. Together, they can be used to create a “quality firewall” that satisfies your customer’s DPPM requirements but also protects your yield requirements. The challenge thus becomes which types of OD algorithms should I implement and how can I automate this process for improved efficiency?

OD algorithms can be grouped into three (3) categories: parametric, geographical, and a hybrid of the two. Parametric techniques rely on analysis of actual test data to determine whether a specific die or part in question should be flagged and binned out even though it has passed all required tests because it looks like an “outlier” when compared to similar die or parts. Geographical techniques on the other hand look at more systemic trends where the quality of a good die can become “suspect” based on its geographical proximity to other failed die which raises a potential higher risk of eventually becoming an RMA. Hybrid techniques take both parametric data results and the geographical landscape into consideration when identifying outliers.

In its simplest form, parametric-based OD algorithms can be as simple as performing univariate outlier detection as I wrote about in my last blog where more accurate statistically calculated test limits are derived and used per each parametric test instead of using the generic device specification limits found in the product datasheet for all tests. Device specification limits that are too wide may allow “outliers” to sneak through, resulting in poor quality parts being shipped into the supply chain. However, more sophisticated outlier detection techniques such as bivariate and multivariate look to correlate the consistency of results across multiple tests simultaneously which highlights additional outliers that had stayed hidden when using univariate outlier detection only. While bivariate refers to correlating two tests simultaneously, multivariate refers to three or more tests being cross-correlated simultaneously. Typically, more complex resource intensive algorithms are needed to perform multivariate analysis but with a potential added advantage of improving quality by identifying those very difficult-to-find outliers that cannot be found using univariate or bivariate methods.

Figure 1 below shows a real-world bivariate example where the combined results over two tests are shown to have high correlation for every die in the wafer except for two. Even though these dice individually passed all their respective tests, combining their test results has highlighted the fact that these two devices operate differently than the rest of the good die on this wafer; differences that may manifest themselves later after being shipped to the customer.

Figure 1. Bivariate Outlier Example

Geographical-based OD algorithms on the other hand tend to be more intuitive because they look at the overall landscape of both good and bad die in relative proximity of each other when determining if a believed known good die or device should be flagged and binned out for precautionary measures. There are several geographical-based techniques but the most common ones are cluster or GDBN (good die in bad neighborhood) and Z-Axis PAT (ZPAT).

Figure 2 below shows another real-world example after running a cluster/GDBN OD algorithm during wafer sort testing. Note that the blue shaded die had previously passed wafer sort tests but were now flagged as outliers and thus recategorized as bad die since they were neighboring a cluster of bad die. For devices targeted to high-quality end market segments such as automotive, this type of outlier detection is essential to maintaining very low levels of DPPM.

Figure 2: Geographical Cluster/GDBN Outlier Example

Figure 3 below shows a real-world example of Z-Axis PAT (or ZPAT) which is used to identify subtle systemic faults resulting from a process issue or design/process interaction. Outliers are detected by stacking the test results for all wafers in a lot and comparing the good/bad status of every die based on its X-Y location across the entire lot. In this example, out of a 25-wafer lot, only two dice with the X-Y coordinates of 48-34 representing two of the 25 wafers tested as good. As a result, the ZPAT algorithm will identify the two dice as outliers and re-bin them as bad dice.

Figure 3. Geographical ZPAT Outlier Example

Once you have identified all of the OD algorithms you would like to use, it is desirable to create an executable flow, that will automatically sequence through each algorithm you want to run as shown in the example in Figure 4 below.

Figure 4. Automated Outlier Detection PAT Rule Flow

When using multiple OD algorithms in sequence, order of execution is important because as die are flagged as outliers and automatically re-binned, they will no longer be analyzed by subsequent algorithms because they are no longer “known good die”, having been recategorized by the prior algorithm. Therefore, there is a benefit in strategically placing certain algorithms prior to other algorithms. The decision of what algorithms to use depends on the type of product and preferences of the product owner.

If quality is a significant concern for you or your end customer, then there is no better way to ensure minimal DPPMs in your shipped devices than by using Outlier Detection techniques as your last line of defense for catching potentially bad devices. It is the most effective and proven approach to creating a virtual “quality firewall” and achieving your quality goals.

To learn more about how semiconductor companies are leveraging these types of automated technologies to aid in achieving high quality standards in their devices, click here.

Leave a Reply

(Note: This name will be displayed publicly)