Better tools, automation, and analytics improve FA, but determining the cause of failure is still difficult.
When a chip malfunctions it’s the job of the failure analysis engineer to determine how it failed or significantly deviated from its key performance metrics.
The cost of failure in the field can be huge in terms of downtime, recalls, damage to a company’s reputation, and more. For these reasons, chipmakers take customer returns very seriously, focusing resources to quickly get to the bottom of the failure’s root cause in dedicated FA labs. Those labs also play essential roles in new product introductions and in resolving yield excursions throughout manufacturing production lines. For instance, FA might be performed to reconcile differences between package test and system-level test.
Despite all of that, the process of finding killer defects in chips is getting tougher. “Performing root cause analysis is far more complex than it used to be,” says Jayant D’Souza, technical product director at Siemens EDA. “And it’s a fine line, because in the process of eliminating the root cause you might change, say, a small mask dimension, but changing it in one place can have an effect in another part of the mask. So it’s not as straightforward as it used to be, and the pressure to do this very quickly to keep up with demand has been increasing for years.”
As with analyzing field returns, new product introductions often require use of both destructive and non-destructive tools. “Process development work happens every time there is a node change,” said Samuel Lesko, head of applications development at Bruker. “For example, the customer may run a CMP topography optimization to ensure die flatness or field flatness.”
A second approach digs deep to improve the process window from multiple angles while balancing yield, efficiency, and cost. “For instance, a collaboration between us, the CMP tool maker, slurry and pad manufacturers, and the factories themselves might explore how different slurries or pads alter the physics of friction coefficients to influence topography,” Lesko said. “We’ve seen a lot of traction on these optimizations in the past two years.”
At the same time, some new applications force changes in how FA is performed for backside power delivery. In the past, debug was performed using laser voltage imaging through the backside of the silicon wafer. “On the advanced logic side, backside power delivery is causing some additional challenges, because what you used to be able to see through, now you can’t see through because there’s metal there,” said Trevan Landin, senior manager product marketing at Thermo Fisher Scientific. “So you have to carefully remove the metal and dielectrics and use e-beam probing instead of light to perform fault isolation.
Bonded interfaces are a likely location for contamination in assembly and packaging. “In flip chip or bonded wafers, there is a pressing need for quick, non-destructive inspection to detect voids and particles between bonded surfaces,” says Melvin Lee Wei Heng, senior manager applications engineering at Onto Innovation. “High-speed infrared (IR) imaging addresses this need. As device features become smaller, the importance of inspection speed and accuracy increases, providing real-time feedback to enhance throughput and enable prompt intervention during inline inspections if abnormalities are detected.”
Fig. 1: A combination of oblique angle illumination, image processing and AI algorithm reveals an organic residue on a bond pad (R) that is missed by conventional infrared defect inspection tools (L). Source: Onto Innovation
Where to start?
Failure analysis is a systematic process that uses electrical characterization, physical analysis, and electrical testing tools. Some of the most common tools in the FA lab include optical CD (OCD), CD-SEM, overlay tools, IR reflectometry, transmission electron microscopes (TEMs), focused ion-beam SEMs (FIB-SEMs), and X-ray systems.
FA begins with confirming the actual failure mechanism that occurred using electric characterization. All non-destructive methods are performed on failed devices first using methods such as e-beam voltage contrast testing, thermal imaging cameras, and parameter analyzers.
Time-domain reflectometry (TDR) is an excellent, non-destructive method for detecting opens and shorts. It works by injecting a high frequency signal into a conductor and measuring the reflected change in impedance. TDR can identify hard faults such as a short, but also soft failures such as weak connections that may precipitate long-term reliability failures.
After a particular failure mode and location is suspected, the engineer may turn to destructive testing. “Traditionally, if it’s a detrimental failure mode such as a metal continuity open failure, delayering might be used to determine if there is an open fail mode in the serpentine metal continuity electrical structure,” Onto’s Lee Wei Heng said. “Alternatively, if it’s a high resistivity failure mode, such as in the via resistance electrical test, TIVA might be used to identify which vias are contributing to the high resistivity results from the electrical test.”
Thermally induced voltage alteration (TIVA) is commonly used to localize faults. “TIVA is the quickest method for fault localization,” said Lee Wei Heng. “Scan chain diagnostics and bitmapping are other methods for fault defect isolation. For bitmapping, once a bit-to-defect relationship is determined, a physical cross-section or delayering needs to be performed to ensure that the failure mode is accurately identified for that particular bit failure.
Indeed, full scan diagnosis using ATPG vectors for logic, or BiST for memory, give engineers a starting point for performing software analysis before physical failure analysis begins. In the case of multiple failing die, it can narrow down the candidates of failed chips to determine the best die to perform failure analysis on. “We are starting to see more integration of diagnosis data being used to perform FA, and use of that diagnosis data in some of the yield management software to correlate the defect data with the FA data to try to identify the potential yield limiters or potential candidates for failure analysis,” said Arshdeep Singh, staff applications engineer at Synopsys.
So where to begin? “For root cause analysis, you really want to begin using a diagnosis tool,” said Guy Cortez, principal product manager at Synopsys. “Once these vectors come off the tester, we scan the logic back into the design from the failed output pin to come up with the list of candidates for physical FA. Then that’s tied with ATPG engine, because they have to be in lock step together. And then our tool, Silicon.da, does the volume diagnostics but takes it to the next level by helping to prioritize which of those candidates is the one that caused that failure. That gets fed into the failure analysis tool like Avalon, which does the CAD-NAV alignment with the failure analysis equipment, to align the defect location from the mask and layout to the actual silicon that’s under test within the FA lab.”
Continuous improvement is a long-term goal in failure analysis. “When we talk about the link of diagnosis to FA, there is a two-way link that we are exploring, which has the potential to improve FA in the future,” Singh said. “When the failure analysis is completed on a candidate, there is also a feedback loop going back to the yield management software that improves the processing of candidates in the future. That pipeline also helps improve the FA workflow, in general.”
Selecting the right FA candidate is especially critical at advanced nodes. “In these latest process nodes, there’s very high skill and expertise levels involved in processing failures and understanding which tools and combination of tools are to be used,” said Andras Vass-Varnai, 3D-IC Solution engineer at Siemens EDA. “But more importantly, a very critical piece involves selecting the right part to do the failure analysis on.
AI plays an increasingly important role here. “What we’re doing in diagnosis is really trying to analyze what actually happened in true silicon, and we use machine learning in our tools to perform the diagnosis,” said Jayant D’Souza, technical product director at Siemens EDA. “So when a part gets tested on a wafer and it fails, data is collected off the tester and it tells you the cycle passed or the cycle failed. But that information, along with our simulation information that was used to create test patterns to test the device on the tester, is used together to kind of figure out what failed and why. We use this machine learning to narrow down the logical area and the physical area, so where to look inside the die. If you do this over many parts, like hundreds of failing parts, we can build this Pareto to be able to tell, ‘This is a behavior that explains why this particular population of failing parts exists today, and the root cause of it might be something in the chemical process that’s not quite right.’ The diagnosis provides a clue.”
Smaller needle in a larger haystack
As the complexity of SoC devices and multi-chiplet packaged parts escalates, FA engineers are charged with locating and analyzing ever-smaller defects in an increasingly larger haystack of silicon. To do so, they need better FA instruments such as optical/e-beam inspectors, focused ion beam FIB-SEMs, TEMs (transmission electron microscopes), IR reflectometers, etc., as well as better data analytics.
“What’s really interesting is that previously failures were oftentimes rather gross,” said Thermo Fisher’s Landin. “But now they can be very small and they can cause problems that are just as big. So localization really means that you are looking for something relatively small in a fairly large space.”
Others agree. “Historically, failure analysis has been more manual analysis, and it depended on the engineers to manually map and then localize those defects,” said Synopsys’ Singh. “Nowadays, we do see more of the automation coming into the tools as well as to the whole FA flow. There have been multiple developments both on the tool side to automate some of the workflow that the engineers do, and on the software side to help users automate their workflows. Also, we are looking into linking more of the yield data to FA to identify some of the correlations and some of the yield issues from diagnosis directly, instead of an engineer having to go through the whole set of data to find them.”
Tools of the trade
In failure analysis labs, trained detectives use multiple physical, electrical, and chemical means of investigation to decipher the exact location and degree to which a defect causes functional failure in a semiconductor device.
“Whether it is for process monitoring or for failure analysis, fabs need ground truth data, which is only available through TEM,” said Landin. “We’ve heavily invested in automation for sample preparation on the FIB side as well as the TEM side. Those samples, for example, might be on the order of 20nm thick. That used to be something that was not so easy, and now it is a day-in, day-out requirement. It’s about the performance of the instrument and keeping that to within the right tolerances.”
Another technology that significantly benefits from automation is nanoprobing. Nanoprobing helps with localizing faults. “It’s like the difference between having a Google street view and having Google map directions to your doorstep of where your failure is,” said Landin.
There are others, as well. X-ray imaging has become the go-to technology for elucidating voids in packaging. X-ray imaging relies on a difference in density to achieve contrast.
Lock-in thermography, meanwhile, is a method that takes advantage of the heat signature in a device, particularly in multilayer stacks such as high bandwidth memory. “In the past, you might have been looking at a top contact or bottom contact that was very easy to deal, whereas now there are multiple memory chips that stack together, so it’s massive in terms of scale,” Landin explained. “So something like lock-in thermography is definitely not new but it’s being used more frequently to accurately localize and see those defects.”
Lock-in thermography is a nondestructive infrared imaging method that uses the heat signature in the device to localize the rough coordinate system for a defect from which a FIB-SEM can destructively remove material. “You can get a relative depth, and then start to remove material away slowly from one side of the defect and be able to then visualize with the SEM,” he said.
Conclusion
Scan diagnosis plays an important role in helping chipmakers select the best failed device on which to perform failure analysis. From there, a variety of electrical and physical tools are being utilized by FA engineers to find the causes of failures. But the exact causes of failures in some of the newest chips and systems is still hard to determine for new technologies such as backside power delivery and die stacking.
This is less of an issue around tools than methodologies. Failure analysis labs are more automated than ever before, saving valuable engineering time for value-added analysis and collaboration with other parts of the fab including design, test, process and product engineering. At the same time, data sharing — which is considered essential to building systems capable of higher orders of learning — is spotty.
“Only a select few leading chipmakers are collaborating in this fashion, but the ecosystem is being built to enable such data sharing,” said Synopsys’ Singh. “In some cases, you have to rely on your foundry partner for some of the data. So there are some limitations on how you can share the data and what data can be shared. The whole ecosystem is not completely built, but it’s on a good path.”
— Gregory Haley contributed to this report.
Related Reading
Pressure Builds On Failure Analysis Labs
Goal is to find the causes of failures faster and much earlier — preferably before first silicon.
Why Chips Fail, And What To Do About It
Improving reliability in semiconductors is critical for automotive, data centers, and AI systems.
Yield Management Embraces Expanding Role
From wafer maps to lifecycle management, yield strategies go wide and deep with big data.
Leave a Reply