Rebalancing Test And Yield In IC Manufacturing

Functional testing is gaining traction in the pursuit of known good die.

popularity

Balancing yield and test is essential to semiconductor manufacturing, but it’s becoming harder to determine how much weight to give one versus the other as chips become more specialized for different applications.

Yield focuses on maximizing the number of functional chips from a production batch, while test aims to ensure that each chip meets rigorous quality and performance standards. And while effective testing protocols are necessary to maintain high standards of quality and reliability — especially for critical applications — they must be managed in a way that does not unnecessarily diminish yield.

This dynamic can lead to conflicts. Extensive testing can reduce yield by identifying more chips as non-functional or marginal. At the same time, efforts to maximize yield might lead to less stringent testing, potentially allowing sub-par chips to pass.

“The real question is around the tradeoffs between test cost and time and device quality,” says Nir Sever, senior director of business development at proteanTecs. “The more you test, the better quality you get, and vice versa.”

Fig. 1: Outliers for a particular device are flagged when measured and estimated Iddq values are compared. Source: proteanTecs
Fig. 1: Outliers for a particular device are flagged when measured and estimated Iddq values are compared. Source: proteanTecs

Yield is essentially a success gauge, measuring the rate of devices passing all tests. Fault models traditionally have been used to predict and simulate potential defects and their impact on a semiconductor device. But inherent limitations in these fault models, combined with increasingly complex process requirements at advanced nodes, means that yield no longer can be completely assured solely through design-phase optimizations. Extensive functional testing is increasingly required to identify and mitigate unforeseen defects.

“This is a big topic that is sorely misunderstood by the vast majority of the industry,” says Dave Armstrong, principal test strategist at Advantest. “The conventional thought is that test drives yield, and they’re actually somewhat different. They work together, certainly. The more tests you do, the lower your yield is going to be, by definition. However, it’s important to understand that more testing doesn’t necessarily mean you’re going to get more good parts at the end of the day.”

No single set of tests can identify all possible failures. Faulty dies sometimes evade detection during test, leading to latent failures in the field, and those faults can be the result of a variety of errors, including design errors, process variation, contaminants, and even packaging. Failures also can develop in the field if a device is used in a way for which it was not designed, such as AI/ML, which utilizes more compute resources for longer stretches of time than other types of processing.

“The amount that’s tested is driven by the fault models,” says Armstrong. “But there are some recent papers that say those fault models are not very good. As a consequence, even if we test 100% of everything in the fault models, we’re still not going to be getting very good yield.”

With fault models, expectations are established upfront for what tests are trying to find. Test engineers establish parameters for what they’re concerned about. Sometimes they find issues, which is what the test is supposed to highlight. But that is not particularly useful when the device is already being used in the field.

“The testing community is well aware of this notion of fortuitous defect detection, where what we model and what we use to guide test generation does not very well match what’s actually happening in the silicon,” says Shawn Blanton, associate department head for research in electrical and computer engineering at Carnegie Mellon.

Blanton and his colleagues published a paper last year based on an analysis of 30,000 failed chips. Their research shows that existing fault models and test metrics can miss up to 95% of timing-independent combinational (TIC) defects. [1]

“That’s a stunning statistic — and scary, because everything we’ve been doing for years has been based on these faults models,” says Armstrong. “You have good parts that pass these fault tests, and that can give you good yield but still not cover these faults.”

In addition, as circuits age, various stresses reveal latent defects, process variation, errors, and failures. This unpredictability underscores the need for comprehensive functional testing, a process essential for quality assurance, as well as for the safety and reliability of critical applications.

“Test time reduction also plays a role in test development to help reduce overall costs,” says Guy Cortez, senior staff product marketing manager at Synopsys. “But techniques to reduce test time are not typically done for quality-sensitive devices, as there may be a risk of not fully testing a device if you are trying to also reduce the test time.”

For example, as automobiles increasingly are packed with complex computing capabilities, there is greater pressure to weed out potentially defective dies — both immediate and latent — while keeping costs in check.

“The amount of testing depends on the application,” says proteanTecs’ Sever. “Mission-critical applications will require more quality and will bear the cost of more testing. In other applications, where cost is the most important factor and some DPPM levels are allowed, you may decide to spend less time on test.”

This balancing act presents a ripe challenge for device engineers, calling for a more nuanced understanding of the interplay between exhaustive testing, yield optimization, and device quality.

“You always want some later-stage test to validate what you’re doing,” says Marc Jacobs, product management advisor at PDF Solutions. “Depending on the product, you can do system-level test, either as a qualification or new product introduction step — or, if it was important enough, you run system-level test in mass production. But now that’s kind of like a turtles-all-the-way-down problem. You say, let’s do a system level test, and if you do system level test, then you can check that your functional test is good, but only as long as your system level test is also good. It’s a challenge.”

More functional testing
The trend of packing increasingly numerous functions onto a single chip, coupled with escalating structural complexity, necessitates a corresponding increase in both the variety and quantity of testing steps. It requires more functional testing prior to packaging, more system-level test, and more thorough testing of every die. This escalation in testing requirements invariably leads to higher test costs and longer test times, which in turn negatively impacts yield.

“One of the biggest challenges with functional testing is that you don’t know what the quality of that functional test is,” says Blanton. “At the structural level, you get a sense of how well you’re doing. Of the things I’m worried about that can go wrong, I get 98% or 99% of them. With functional tests, there’s no equivalent to that. You don’t know when enough is enough. What else is out there? That’s sort of the holy grail, to tie functional tests to some kind of metric that allows you to say this functional test is better than that functional test, or we’ve done enough, or we’re at 99%. That doesn’t exist at all.”

“Customers need to run a dramatic number of tests in less and less test time, and this means they have to be really thoughtful about how they organize their test workflow,” said Robert Manion, vice president and general manager of the Semiconductor and Electronics Business Unit at NI, an Emerson company. “That’s true in the validation space as a time-to-market item, and it’s also true in the production test space as a cost-of-test item.”

The tests themselves are becoming more challenging, too. “Test complexity is increasing exponentially,” says Thomas Uhrmann, director of business development at EV Group. “You can have everything in place that to find what a known good die actually is, but you still need to probe every die. You still have testbeds, you still can do electrical testing, you still can burn-in everything, but now you have to deal with functionality.”

So at what point does increased functional testing become unviable? While it’s theoretically possible to conduct functional tests at any point along the line, that much testing would be cost- and time-prohibitive, which would impact yield. Plus, at every point along the line, there’s the opportunity for more failure.

“What do you actually want to include in these tests? It strongly depends on the devices,” said Uhrmann. “That’s what makes it so complicated. The whole testing strategy is inherently linked to the application complexity and how you build it, and it’s why you’re doing the integration. If you really want to do functional tests of different dies, it has to be on an assembly level later on. Otherwise, from a cost perspective. it’s not going to work.”

There is also a need to identify potential physical defects in chips to preserve quality over time, and those defects cannot be found in fault models.

“You may have a device that will pass functional tests, but when you look at it and the expected lifetime of 5 to 20 years, now you’re going, ‘Oh, this passed fine, but I had a big hole and it didn’t flow properly at the conductive layer, and now I’m going to run into a failure problem with it,” said Brad Perkins, product line director at Nordson Test & Inspection. “On the design end you get design for test and all that process, but none of that can account for process errors. You can’t use design for test for process errors. That’s a really critical piece of looking at device functionality.”

Quality vs. yield
A key point of contention in semiconductor manufacturing is the tradeoff between achieving high yield and ensuring high quality. The lines on both sides are less than clear, and they may vary by application and use case. Ensuring quality in semiconductors requires an array of tests to ensure device integrity. But it’s no longer a simple assessment of “good units” versus “bad units.” The big shift is toward a more sophisticated, data science-driven investigation that seeks to identify and address a broad spectrum of factors.

Engineers must constantly weigh the tradeoffs between test cost, time, and quality. The choice often hinges on the application’s criticality. Mission-critical applications justify extensive testing despite higher costs, while cost-sensitive applications might allow for lesser testing, accepting a certain degree of defective parts per million (DPPM).

“Test development and yield are somewhat complementary in that yield tracks the success rate in all of the passing tests,” says Synopsys’ Cortez. “For example, if all tests performed on devices on the tester pass, then you have 100% yield. However, high yield does not necessarily mean high quality. The more robust and comprehensive the tests are, the more likely the device is exhaustively tested. The tradeoff becomes exhaustively testing a device, which increases your confidence in the quality of the device at the cost of test time.”

A fundamental challenge in semiconductor manufacturing is balancing exhaustive device testing and test duration. While comprehensive tests boost confidence in device quality, they also prolong the testing process. Industries where high-quality devices are non-negotiable face a big challenge here. Test time reduction becomes a strategic objective to manage costs without compromising quality.

“There is a constant tension between engineers who are pushing for greater test and HVM managers who are trying to achieve yield,” says Armstrong. “Where does that tension lie? The bottom line.”

Cortez agrees. “While high yield may be a predictor of high quality, quality measures taken during test often run counter to achieving the highest yield possible,” he said.

New test techniques
Several nuanced techniques have been developed to reduce test times and identify defects, and to do that while preserving device quality and without slowing production or negatively impacting yield. Each has its strengths and weaknesses.

The Portable Stimulus Standard (PSS) is a relatively recent innovation in the field of semiconductor design and verification. Developed and maintained by Accellera, PSS allows engineers to describe test scenarios at a high level of abstraction. This means the test scenarios are not tightly coupled with the specifics of any particular test platform or environment. They can be written once and then used across various platforms and stages of the semiconductor design and verification process.

A major advantage of PSS is the ability to re-use test scenarios and automatically generate test cases from those scenarios. While PSS offers considerable advantages for testing, it does have a steep learning curve, particularly with defining complex test scenarios. There also are some integration challenges with existing processes, which are being addressed by EDA vendors and their customers today, along with potentially high costs. Still, PSS is gaining traction, although unusually quietly.

Deep data-based testing is another new approach that seeks to balance test quality and speed. In-chip monitors collect data during tests, feeding it into a machine learning-driven data analytics platform. This approach aims to reduce test times while maintaining or even enhancing quality levels, and represents a significant advancement over traditional methods.

“This is done by training models on the cloud to correlate in-chip measurements of ‘good devices’ and deploy those models on the ATE,” adds Sever. “There, a whole set of personalized outlier detection and smart prediction algorithms are deployed, which reduce test time and detect faults that cannot be observed by traditional scan-based testing.”

Deep data-based testing offers significant benefits in terms of insight and efficiency in semiconductor manufacturing. The major drawback is the need to manage and analyze large volumes of data.

Another widely used technique is known as “good die in bad neighborhood,” which looks for groupings or clusters of failed die/devices and purposely removes or bins out those passing devices that just neighbor the failed devices since there is a likelihood that something is wrong in that part of the wafer. Although the neighboring devices may have passed the requisite tests, it is safer to throw away those devices that can decrease yield, providing higher-quality devices that remain.

Known good die
Yield is defined both with tests and with a fab’s ability to deliver product to meet a goal. But there are two different yields to consider. One is the yield against DFT, while the other is yield for known good die. As advanced packaging becomes more popular, and as chiplets become common — especially commercially available chiplets — then known good die will become standard for determining yield. There is just not going to be any way around that.

“We should stop thinking of this tension as test versus yield, because you don’t get one or the other,” says Armstrong. “The real problem is yield versus known good die. Yield is implicit with test, because today we do all this testing to a certain yield. But we’re not necessarily identifying parts that are known good die, because that’s not based on faults. It’s based on actually meeting the application needs. This is a revolutionary change to digital device testing. People are seeing that structural test is fine, but it’s just the beginning. In order to really provide value-add, we need a bigger castle, which means more test time and more cost to the bottom line, but the value-add is constrained by the inputs we get. If the inputs don’t go far enough, either in the DFT, the functional, or whatever domain, then we can’t provide value. It’s like a fundamental premise of what we’re doing is broken, and we need to go back to square one.”

Conclusion
The semiconductor industry continues to grapple with the intertwined complexities of test development, device testing, and yield optimization. These challenges are intensifying with the increasing complexity of products. As the industry evolves, a holistic and innovative approach to testing and quality assurance will become increasingly central to its success. The stakes are high, and addressing these challenges head-on will be crucial for the continued advancement of semiconductor technology.

References

1. Li, Wei & Nigh, Chris & Duvalsaint, Danielle & Mitra, Subhasish & Blanton, R.D.. (2022). PEPR: Pseudo-Exhaustive Physically-Aware Region Testing. 314-323. 10.1109/ITC50671.2022.00083.

Related Reading
Optimizing Scan Test For Complex ICs
New techniques for improving coverage throughout a chip’s lifetime.
Mission-Critical Devices Drive System-Level Test Expansion
SLT walks a fine line between preventing more failures and rising test costs.



Leave a Reply


(Note: This name will be displayed publicly)