Next Steps For Improving Yield

Engineering teams are drowning in data at most advanced nodes; defining what is sufficient quality can vary by domain.


Chipmakers are ramping new tools and methodologies to achieve sufficient yield faster, despite smaller device dimensions, a growing number of systematic defects, immense data volumes, and massive competitive pressure.

Whether a 3nm process is ramping, or a 28nm process is being tuned, the focus is on reducing defectivity. The challenge is to rapidly identify indicators that can improve yield, and those indicators need to be tracked from design all the way through to test. New solutions include applying monitoring data to minimize downstream testing, optimizing excursion strategies, as well as a reliance on software to manage yield issues.

There is interplay between yield improvement and process control.

“Yes, there are new advances in process control. But the more near-term challenge is, ‘How do you monitor for end unit quality,” said Mike McIntyre, director of software product management at Onto Innovation. “Inline factory control is adequate for process monitoring, but if you’re sampling 2 wafers and 13 sites per wafer, is that sufficient to comprehend why 3 units in a million failed? The fab can under-utilize the functionality (use a looser design rule). Another alternative is screening techniques to take out any device that is even remotely suspicious. But then you’re going to be throwing away potentially good material. Who’s going to bear the cost of that? And there are other options, like building redundancy into systems, which may be necessary in particular for something like autonomous vehicles and braking systems.”

Yield is extremely complex. It can vary by design, by manufacturing lot, or even by use domain. What is considered acceptable in one application may be very different than the same device for another application. And the more complex the design, the more data that needs to be analyzed.

“People are really looking at end-to-end system-level yield, because customers are drowning in data –- from the equipment, from failure analysis, from design to packaging,” said Matt Knowles, director of product management for Siemens EDA’s Tessent Group. “Yield isn’t so much of a product engineering problem. It’s a data management problem. I see a huge macro shift toward the software vendors, the people who do analytics and machine learning, to try and fill in some of the gaps.”

To perform data analytics between design and test, or more ambitiously end-to-end, traceability using on chip sensing and monitoring is essential. “Industries such as automotive and data centers, have chips for which they need to trace the full history across the entire supply chain,” said Yervant Zorian, chief architect and fellow at Synopsys. “In the past, only a few chip companies used unique IDs to trace high-end chips. However today, an increasing number of in-field operating companies require silicon lifecycle management (SLM) of their chips to understand what’s happened with them, going backwards through manufacturing to out in the field, tracking quality, yield, RAS, including aging, etc. This will require integrating sensors, monitors and on-chip SLM highway during chip design. We see this being realized in data center and automotive chips because they are often designed directly by these companies.”

Design for monitoring, testing
At 3nm and beyond, addressing systematic defects and parametric variation is becoming increasingly challenging. “The tolerances are basically disappearing on these advanced nodes,” said Andrzej Strojwas, CTO at PDF Solutions. “And because leading fabless companies like to design early to get access to new technology, the foundry issues PDKs — or refined PDKs — that cover corner cases. But even then, there’s a mismatch between what was assumed in the SPICE parameters and actual device performance in the real layout. So foundries are issuing design-for-manufacturing rules.”

Strojwas outlined how logic diagnosis is performed on advanced SoCs for new products. “For logic, the issue is how do you find the location of the defect based upon the processing of the scan chain information,” he said. “We take the product wafer, and at metal 1 for instance, add masks and routing to pads, and then do massively parallel testing of all transistor characteristics.”

This process automatically reveals several parametric outliers that may or may not be critical to yield. The platform then determines which ones are critical for the device. Some issues call for retargeting or optical proximity correction (OPC), some will not impact yield or reliability, and others require a mask respin.

“We’ve heard that there are pattern-based systematic defects that can exist well into volume manufacturing,” added Siemens’ Knowles. “Some people thought pattern complexity would get better with EUV, and it did for a hot minute. So instead of going through quad-patterning, now you’re down to single-patterning, but it will quickly go back to double-patterning.” He noted that engineers are tackling these defectivity issues by running more volume diagnosis in production.

“To achieve faster yield learning and new product introductions, our customers have been asking for tighter integration between different platforms across the semiconductor product lifecycle, including EDA, manufacturing analytics, and test operations,” said John Kibarian, president and CEO of PDF Solutions.

In a joint product offering, PDF’s Fire data and layout pattern analysis combines with Siemens YieldInsight in the Exensio analytics platform to better monitor systematic yield loss (see figure 1). “The platform identifies where the yield losses are, and then it performs data processing to identify where in that chip has the highest probability of a defect,” Strojwas said. “After they identify the suspects, we bring this information back to Exensio and try to correlate to inline defect inspection data, for instance. But the really novel part, in addition to identifying random defects — so shorts and opens — is it identifies layout patterns that are most vulnerable. The root cause convolution actually improves the accuracy of finding the true failing locations, to the tune of 90%.”

Fig. 1: Process flow for analyzing yield and design data. Source: PDF

Fig. 1: Process flow for analyzing yield, test and design data. Source: PDF

The high confidence defect Pareto is then confirmed by root cause physical failure analysis (PFA) using a focused ion beam (FIB) cross-section and SEM imaging. Such a closed-loop approach to defect isolation is preferred over traditional fault isolation, which can take weeks and lead to significant lost productivity by the fab.

What gets monitored on the chip can be classified physically or structurally, such as PVT, then parametrically, with advanced on-chip Agents from proteanTecs. The company leverages “deep data analytics,” based on chip telemetry, using multi-dimensional Agents that operate at both test and in mission-mode, to monitor performance degradation due to aging and latent defects that were not caught during manufacturing. In addition, those monitors can be used for operational, environmental and application analysis, measuring workload and software stress on the hardware, and the reliability of interconnects in advanced packages.

Design for test (DFT) activities are part of what is tracked in supply chain management. “More and more, embedded monitors for the purpose of monitoring key environmental conditions within the silicon such as process, voltage and temperature are being placed within devices during the design phase,” said Guy Cortez, senior staff product marketing manager of Silicon Lifecycle Management at Synopsys. “The valuable data analytics collected from these monitors in production can help improve operational metrics such as performance and power not just during manufacturing of the device but also while the device is in use in the field.”

Data from these monitors enable targeted yield improvements, adds Cortez. “Increased use of monitor data is starting to be realized especially in the quality screening process of silicon devices throughout manufacturing, enabling higher quality devices. Traditional quality analytics targeting outlier detection and escape prevention methods such as geo-spatial techniques — like good-die-bad-neighborhood, failure clusters, and Z-axis part average testing (ZPAT) as well as univariate parametric analysis like dynamic part average testing (DPAT) — are tried and true techniques and are still heavily used today. But now, with the advent of monitor data becoming more readily available, correlating this valuable data with other parametric data such as Vdd consumption tests, you have a more accurate way of identifying true outliers compared to traditional methods since monitor data inherently captures a more accurate representation of what is happening within the silicon.”

Better use of inline monitor data
One way engineers are accelerating yield is by making more productive use of inline inspection data. For automotive chips, for instance, critical layers may receive 100% inspection. The resulting defect maps then can be ranked for failure potential, which may be used to decide when a downstream test such as accelerated temperature and voltage stressing (burn-in) is warranted.

KLA and NXP recently collaborated on a project to use inline inspection data to determine an optimal stress test (burn-in) for microprocessor and analog products. [1] Using the inline parts average testing (I-PAT) methodology, [2] the necessity of burn-in to weed out latent reliability failures can be eliminated for all but a very small percentage of devices. Case studies on 50,000 microcontrollers and 76,000 high-voltage analog products revealed early production implementation of defect directed stress testing “enables us to achieve the best tradeoffs among stress/test coverage, reliability, and cost,” reported Chen He, fellow at NXP, and John Robinson, senior principal scientist of KLA. “Only the dies with higher defectivity will receive additional BI stress – reducing cost and reducing over-stress risk associated with burn-in.”

Fig. 2: Burn-in testing is run on edge dies, non-volatile memory repaired dies, and parametric margin outlier dies, as well as dies with high I-PAT scores. Source: IEEE ITC

Fig. 2: Burn-in testing is run on edge dies, non-volatile memory repaired dies, and parametric margin outlier dies, as well as dies with high I-PAT scores. Source: IEEE ITC

Rather than relying on wafer and lot sampling, all dies are characterized using the KLA 8 Series brightfield/darkfield inspector. Each defect is classified and weighted based on the specific layer, wafer location, defect type, defective size, polarity, etc., using a machine learning training algorithm. “The I-PAT score is used as a leading indicator of the defectivity of each die, which will help us to determine what will be the best subsequent stress testing to achieve the target quality level and maintain a good balance between the cost and quality,” said He. The I-PAT score aggregates defectivity data from all layers, resulting in a die specific reliability metric.

Fig. 2: Microcontroller die with highest-to-lowest defectivity are separated by high risk (scrap), requiring burn-in stress test (1% of die), and low risk (99%, no burn-in). Source: IEEE ITC

Fig. 3: Microcontroller die with highest-to-lowest defectivity are separated by high risk (scrap), require burn-in stress test (1% of die), and low risk (99%, no burn-in). Source: IEEE ITC

The distribution of scores from low to high, along with wafer location, provides the selection rules for which dies will receive burn-in (see figure 2). Dies with high I-PAT scores were validated using failure analysis. Wafer probe and packaged device testing helped separate which die were in the high reliability risk category (candidates for scrap), from the medium risks (should undergo burn-in testing) – known as the outlier threshold. When I-PAT score dropped significantly, all dies in the lower category did not need burn-in (99% of die in the microcontrollers). Importantly, the outlier threshold was 10 for the microcontrollers but 100 for the HV analog devices, so it is product-specific. Good correlation between test failure rates and I-PAT scores provided further confidence in the methodology.

The NXP/KLA team found that certain stress tests would best bring out particular defect types, such as metal layer defects responding to elevated voltage and current testing (Iddq testing). The team added that methods like I-PAT increasingly are recommended for automotive devices by the Automotive Engineering Council (AEC) and OEM global standard documents.

Next-gen transistors and interconnects
Intel recently announced it will integrate nanosheet transistors and backside power delivery at its 20A (2nm) node. TSMC and Samsung too will make these transitions, both of which will enable future scaling, but also bring yield challenges. “Gate-all-around transistors and backside power distribution are definitely necessary. And there are other reasons for doing backside power, like doing decoupling capacitors on the back,” said PDF’s Strojwas.

But there are drawbacks to the new approach too. “With backside power, now you don’t have access to the backside of the wafer, which is useful for electrical fault isolation in FA and debug,” said Siemens’ Knowles. “Those two changes are really throwing a wrench into a lot of the yield learning flows.”

At a recent FA conference, a speaker at one leading fabless company told the audience to expect that failure analysis in the future will mostly be done on the ATE. “That was a huge statement,” said Knowles. “He’s basically saying that the physical techniques are going to be so challenged that their use is becoming tenuous based on the application. So for the most advanced nodes, there’s more and more emphasis on working upstream, getting the designers to put more design for test, design for diagnosis, and design for yield into these systems, and then leveraging ATE platforms to apply stimulus patterns, adaptive test, and things like that to debug the issues.”

“We see people becoming more proactive with respect to yield issues and the processing of failure data using data analytics with machine learning,” said Knowles. He expects this trend to continue, with leading-edge OEMs processing greater volumes of failure data on a regular basis. While the use of sensors on process tools were once viewed as largely an added expense, it appears that today the high risk of yield excursions justifies the engineering, sensor and analytics investment. These efforts, along with more DFT, are enabling faster yield ramps.


[1] A. Meixner, “Auto Chipmakers Dig Down to 10ppb,” Semiconductor Engineering, March 8, 2022,

[2] C. He, et. al, “Defect-Directed Stress Testing Based on Inline Inspection Results,” 2022 IEEE International Test Conference, doi: 10.1109/ITC50671.2022.00050.

Related Stories

Auto Chipmakers Dig Down To 10ppb
Driving to 10 defective parts-per-billion quality is all about finding, predicting nuanced behavior in ICs.

Where And When End-To-End Analytics Works
Improving yield, reliability, and cost by leveraging more data.

Big Payback For Combining Different Types Of Fab Data
But technical, physical and business barriers remain for fully leveraging this data.

Leave a Reply

(Note: This name will be displayed publicly)