Pressure Builds On Failure Analysis Labs

Goal is to find the causes of failures faster and much earlier — preferably before first silicon.


Failure analysis labs are becoming more fab-like, offering higher accuracy in locating failures and accelerating time-to-market of new devices.

These labs historically have been used for deconstructing devices that failed during field use, known as return material authorizations (RMAs), but their role is expanding. They now are becoming instrumental in achieving first silicon and ramping yield, prompting them to automate and speed up existing processes, as well as introducing some new ones.

There is massive pressure on FA labs to perform analyses faster in a more automated, repeatable manner, just as fabs do, to meet the growing reliability demands of automotive, server and other mission critical applications. “The speed at which our customers are able to align test results from different stages of the workflow will speed the rate at which they’re able to introduce their product to market,” said Robert Manion, vice president and general manager of the Semiconductor and Electronics Business Unit at NI Test & Measurement, an Emerson company.

With chips used increasingly in mission- and safety-critical applications, and for longer periods of time, this kind of analysis is essential to ensuring reliability. It also ties into the yield learning flow (see figure 1). “Silicon lifecycle management is really relevant today because fabs need to be able to collect and analyze and provide actionable insights from the data, from design all the way through to in-field product use,” said Guy Cortez, staff product marketing manager for silicon lifecycle management in Synopsys‘ Digital Design Group. “The analysis of RMAs is part of that.”

Fig. 1: Data flow that enables identification of the most critical defects and selecting chips for failure analysis date. Source: Synopsys

Arriving at actionable insights is non-trivial. It requires very precise metrology and inspection. “It comes down to two factors,” said Juliette van der Meer, X-ray product marketing manager at Bruker. “Customers want faster, higher productivity from the tool, but also higher sensitivity. Generally, you can go faster at the cost of precision or vice versa, but customers always want to have both. So that really means you need to improve precision by two to four times to meet requirements.”

Just like fabs, FA labs are moving from a siloed culture to a more collaborative culture for data sharing and problem solving. This extends from design to manufacturing, testing, and in-field use. And just like the fabs, FA operations are increasingly adopting AI- or machine learning-based data analytics to meet the stringent reliability requirements for integrated circuits.

“We went all-in on semiconductors because of the accuracy, control, and precision that’s needed,” said David Park, vice president of marketing at Tignis. “It’s incredibly critical to the proper functioning of the devices.”

Park noted a degree of confusion in the industry regarding the capabilities of AI in manufacturing. “A lot of people think that AI solutions — not just AI for process control, but AI in general — is a fancier version of yield management systems (YMS) or fault defect classification (FDC), which is 100% not the case. They may have some similar rules, but the purpose is different among APC, FDC and YMS. The value-add with AI is knowing what to do, and the dispositioning of solutions when problems are detected. This is where domain knowledge is paramount, because people need to guide the decision-making. AI can help them do that in a faster and more efficient way.”

From design to different metrology and test insertions, failure analysis might be performed at any particular step to reconcile differences, such as between package test and system-level test.

Fig. 1: Filled (top) and partially filled (bottom) structures on a 3D NAND device. Source: Thermo Fisher Scientific

Fig. 2: Filled (top) and partially filled (bottom) structures of a 3D NAND device. Source: Thermo Fisher Scientific

Data analytics using AI or ML also improves an engineer’s chances of locating failures in the device, such as partially filled contact holes in 3D NAND structures (see figure 2), or measuring critical dimensions such as hole diameter, hole depth, and degree of tilt or twisting along holes with high aspect ratio structures.

The failure analysis process also provides what the industry refers to as ground truth — actual CD measurements — that are used to correlate scatterometry or other modeled metrology methods.

Failure modes and tooling
The most common tools in the FA lab include optical CD (OCD), CD-SEM, overlay tools, IR reflectometry, transmission electron microscopes (TEMs), focused ion-beam SEMs (FIB-SEMs), and X-ray systems. As the industry moves to 3D structures, like gate-all-around transistors and 3D NAND, system upgrades and in some cases full retooling has been required.

Dual beam FIB-SEM systems can provide diagonal milling into the device and direct measurements to reveal CD variation, incomplete etching, bowing, tilting and twisting in 3D NAND contact holes. “These systems are capable of layer-by-layer removal of material and CD-SEM measurement during new product development or yield ramp,” said Dave Akerson, senior global market development manager at Thermo Fisher Scientific. TEMs can provide atomic-level imaging, and when a scanning TEM is combined with energy dispersive spectroscopy (EDS), it provides composition for defects or films, such as in SiGe/Si stacks.

Another atomic-level metrology or inspection tool is the atomic force microscope (AFM). “AFMs can identify nano contaminants — basically contaminants that are too small to identify with conventional techniques,” said Peter de Wolf, director of technology and application development at Bruker. “We do spectroscopy and can relate that back to maybe what caused that contamination. Then, of course, there is the base capability of atomic force microscopy of measuring 2D or 3D surface morphology with extreme high resolution that is still used. And post-CMP have a little bit different requirements in terms of sizes and flatness. AFM is applied in hybrid bonding to determine roughness and dishing depth.”

At the other end of the spectrum are multi-chip assembly and packaging analyses, which might use optical profilers, X-ray or acoustic inspection, and metrology systems. “With these systems, there is the challenge of throughput, because the density of interconnections becomes incredibly high with up to half a billion bumps or pads per die,” said Samuel Lesko, head of application development at Bruker. “If one single connection fails, then the whole IC fails. So there is also the question of price tag for the metrology, because back-end operations are not used to multi-million dollar investments. That is starting to change, but the combination of throughput, capabilities, and pricing for HVM requires a transformation in the industry,”

Nanoindentation is another materials characterization method that can be used inside of a SEM or TEM to see how a device or material behaves when a small load is applied, giving insight into weak spots and crack formation.

Data retrieval and analytics
Analyzing failure analysis data faces similar challenges to those in the fab. But the traditional barrier between lab and fab means there is yet another logistical hurdle to overcome in the data processing, packaging, and analysis, along with the integration with fab yield and reliability analyses.

Several experts pointed to the need for centralized data storage. “Our perspective comes as a group that sees parts come in after they’ve seen failures — all the way back to the validation and first silicon bring-up into quality and reliability groups, production test, and failure analysis,” said NI’s Manion. “By having a common data repository and set of tools to search and gather and view that data, they’re able to look at when they see defects and failures in the field. Then they can tie back some of the manners in which they failed — or the data they’ve gathered on those failed parts — to things like the exact lots that part might have been a part of when it went through production test, or the specific design and some of the validation or characterization data they saw when it was going through initial silicon bring-up. Hopefully, what that does is inform smarter testing to address failures earlier.”

Solve problems faster
The pressure on failure analysis and yield learning is ratcheting up because of increasing device complexity at each node, the advanced functionality of devices, and the incorporation of multiple devices in an advanced package. This helps explain why more companies are collaborating on root cause analyses. Their collective challenge is to figure out which is the best failing die to use in physical failure analysis.

“We are collaborating with Siemens EDA to bring together volume scan test diagnosis with yield manufacturing data to better select failing die that will undergo physical failure analysis,” said Thomas Zanon, engagement director at PDF Solutions. “The quality of the results you get out of this flow strongly depends on the quality of the failing die population that goes into physical failure analysis.”

RMA analysis also can guide the chipmaker in terms of locating other potentially defective devices that passed device-level testing. “If you can identify the source of the problem during the failure analysis phase, then the engineer may go back further to say, ‘Okay, tell me other devices that I may have the similar problem with,’ and proactively recall them before they fail,” said Synopsys’ Cortez. “From a practical standpoint, a company can recall chips before they fail. That’s always better. And as far as this traceability back to manufacturing, those connectivity points are available today because most chips contain an electronic ID. So we have that ability in analytics to home in on similar devices, which is a feature we can do automatically in our analytics solution.”

Context is essential. “It’s paramount to give engineers access to the whole context of the data whenever an anomaly is flagged,” said Dieter Rathei, CEO of DR Yield. “While we have been a trailblazer in providing technology to control all parameters — not only key parameters — to stay within control limits, our approach is becoming the standard practice.”

Dieter noted that root cause analysis of field failures is faster when detailed information about the part can be rapidly examined at wafer acceptance test, final test, and system-level test, along with process conditions at a given deposition or etch step. “Valuable time can be lost when yield engineers have to independently gather data from different databases, rather than having all data available within a few mouse clicks,” he said.

Others agree that consideration of all data in the fab is needed to truly drive toward ppb-level defective parts. “The original thought behind moving to smart manufacturing/Industry 4.0 was to harness all that data to improve manufacturing productivity,” said Tignis’ Park. “We work with customers to reveal correlations between inputs and outputs. We analyze historical data and go through and look at different signals. People say, ‘Wow, that explains a lot of what we thought were unexplainable excursions that we had in the past.’ Fabs can see whether they are finding all the errors that the AI finds. So finding things they didn’t realize were affecting yield is the really cool part.”

Failure analysis capabilities are stepping up to meet the needs of 3D devices, SiC and GaN power semiconductors, and photonics. But despite efforts to reduce a week-long analysis to one that takes hours, FA labs and fabs are still separate silos that need to be breached. Data analytics from FA must become part of the fab’s overall data collection and disposition.

One way that fabs can think about such investments is that AI and ML analytics add value in the long run. “A really key aspect of AI is that is it one of two appreciating assets in the fab, the first being its people,” said Tignis’ Park. “The AI machine learning models that we put in place are scalable and can be deployed fab-wide to learn and compensate for things like equipment drift or process drift. It is not a static thing that you put in, set it, forget it, and then you pay any license fee. It really is learning what’s happening in your fab and what’s happening on actual devices.”

Related Reading: Three-Part Discussion
Streamlining Failure Analysis Of Chips
Identifying nm-sized defects in a substrate, mixing FA with metrology, and the role of ML in production.
Isolating Critical Data In Failure Analysis
Why a shortage of data often impedes root-cause analysis.
Applying ML In Failure Analysis
When and where machine learning is best used, and how to choose the right model.

Leave a Reply

(Note: This name will be displayed publicly)