Timely changes to test programs will speed throughput, but one big hurdle remains.
Widely available and nearly unlimited compute resources, coupled with the availability of sophisticated algorithms, are opening the door to adaptive testing. But the speed at which this testing approach is adopted will continue to vary due to persistent concerns about data sharing and the potential for IP theft and data leakage.
Adaptive testing is all about making timely changes to a test program using test data plus other inputs to enhance the quality or cost of each device-under-test (DUT). At its core are various methods that change manufacturing test conditions, test content, or test limits to increase the outgoing quality and reliability of semiconductor devices. The basic idea is to apply only the right test content to the device, taking data generated by the tester, on-die sensors, or relevant data from previous steps to predict the testing needs. Tests can be added to ensure risky parts meet reliability requirements, or eliminated when no failures are found.
“Outlier screening for reliability screening, which first arose in the 2000s for automotive devices, is still considered to be the base driver for adaptive testing,” said John Carulli, fellow in the PostFab Development Center of GlobalFoundries. “Wafer-level adaptive testing is the most leveraged and simplest to deploy in the context of post-processing. With the latest software and data systems, there is more opportunity for feeding data for decisions across wafer and module operation insertions, as well as system test.”
While adaptive testing is being used in test facilities today, leveraging machine learning-based algorithms and data analytics to improve device quality, it’s happening largely in an offline manner.
“Adaptive test decisions are made based on a population of data,” said Greg Prewitt, director of Exensio Solutions at PDF Solutions. “Historically, people went back through the characterization data and through a wealth of production data they had collected to date, and they looked at the tests that never failed and using good engineering judgment and said, ‘I’m comfortable taking this test out for this device.’ The test programs would get revised, and they would simply omit certain tests to achieve test time reduction (TTR). Adaptive test differs from this historic approach by automating test coverage decisions made inline and in real-time based on a rule-based or machine learning driven dynamic test plan.”
Fig. 1: 3D-IC packaging optimization using adaptive test. Source: PDF Solutions
This is a non-trivial process. “State of the art adaptive test requires adopters to choreograph the movement of data through a complex ecosystem where data is collected from geographically diverse test operations, multiple test steps, and potentially multiple devices,” said Prewitt. ”PDF’s Exensio platform enables customers to automate the collection, transformation and delivery of this data for use in subsequent test operations.”
The technical hurdles of implementing adaptive testing seem surmountable. The real problem is the inherent complexity of the logistics. “Much of the complexity of adaptive test is the orchestration and management of the process of delivering data to the right place at the right time,” said Michael Schuldenfrei, NI fellow at Emerson Test & Measurement. “For example, test data from wafer sort can be leveraged at final test to identify parametric drift across a range of parameters, assuming parts have an electronic chip identifier (ECID) or another method of traceability. This requires historical data be made available to the test program in real-time, without taking a test-time hit.”
Schuldenfrei noted that the industry’s toughest challenges are associated with its highly disaggregated infrastructure. “This is particularly challenging when wafer sort and final test happen in different facilities, requiring secure and reliable orchestration of data movement between facilities.”
Sharing is caring
Making the necessary data available where and when it is needed is an enormous hurdle today, an it’s primarily a result of the fabless-foundry model. “Data security is a big concern for adaptive testing, particularly in disaggregated manufacturing and test flows in which the device owner and manufacturing partner are different companies,” said Ken Butler, strategic content business manager at Advantest America. “When data and applications must be shared across company boundaries, the security of that information is paramount. Advantest’s ACS Real-Time Data Infrastructure (RTDI) solution has numerous features to ensure that data can be safely shared to accomplish adaptive test flows and real-time inferencing without exposing proprietary data to unauthorized entities.”
Fig. 2: Data infrastructure for real-time adaptive test at any test insertion secures developed test data, the test program, and third-party data analytics. Source: Advantest
Safe data sharing begins with significant encryption. “We use a lot of encryption to move information around, but the architecture of the system itself is physically secure in the sense that there’s no keyboard attached to the compute platform, no USB sticks are allowed, and it’s in a locked box to prevent access,” said Butler. “And at the end of the test process, everything is wiped clean so that the data is gone, and there’s no retention for people to fish data through from the back.”
However, when it comes to data sharing in the design-to-manufacturing test, or even field data testing, more work is needed to put the data into context for various users. “Data availability is still probably a key piece that we have to agree on,” said Eli Roth, product manager of smart manufacturing at Teradyne. Engineers need the context of a wafer to efficiently drive lower-cost in testing. “In particular, the context of the test data sometimes doesn’t make sense to upstream and downstream people.”
Settling on data sharing protocols is a key goal of SEMI’s Smart AI Industry Advisory Council. “That’s the challenge we’re getting into now,” Roth said. “How can we make that data not only available, but also contextual?”
Teradyne has invested in its own parallel computing platform, with recent emphasis on feed-back and feed-forward, or bi-directional data streaming. “Knowing that real-time adaptive test is coming, we’re focused on ensuring that the data coming out of a tester is genuine, that it hasn’t been manipulated by anybody, and that you don’t need to run some other piece of software on the tester to extract the data properly,” said Roth. “If you’re fabless, your devices are running on the same process line as your competitor, so where’s your competitive advantage? It lies in the data. So our thoughts are around packaging up the data into a standard. That’s the same for all our testers. The data is structured in the same way, piped into whatever data source you want. Then you can translate that data into whatever your data model is, via a standard, which is more efficient than trying to natively build all the different solutions for adaptive test.”
Companies are using the existing A4 TEMS SEMI standard, a specification for the automated test equipment tester event messaging for semiconductors, which describes this data streaming protocol and the structure of the data. [1] However, this standard does not extend to how data is stored or packaged. It uses a publication-subscription model to show the available data that users can subscribe to.
Best insertions for adaptive test
Most industry experts agree that adaptive test can and is being used at multiple test insertions. In all cases, the sooner potential failures are identified, the better it is for device quality and manufacturing effectiveness.
“We have implementations at wafer sort, final test, burn-in, and system-level test, said NI’s Schuldenfrei. “At wafer sort, test time reduction is often used to increase test efficiency and lower cost while minimizing the risk of test escapes. Final (package) test also shares the same benefit, but there are even more compelling reasons to use adaptive test at final test. At wafer sort, post-processing by statistical or AI-based algorithms that re-bin devices (e.g., outlier detection) can be performed offline after the wafer completes testing. Those updated binning results of the algorithm can be applied via the inkless (digital) wafer map. In contrast, at final test, the final binning decision for the DUT needs to happen before it is removed from the socket, necessitating real-time decision making.”
The earliest versions of adaptive testing were all about test time reduction. Adaptive testing facilitates test time reduction, including less burn-in stress testing. But to make significant progress from simple test time reduction to off-line ML-based modeling and large-scale quality improvements, the ecosystem and data sharing practices need to change.
Most of our customers no longer talk about DPPM or even DPPB,” said Schuldenfrei. “Any test escape is considered extremely problematic. There are several prevalent applications of adaptive testing today, including adaptive test-time reduction (ATTR), adaptive test augmentation, adaptive outlier detection, and various adjacent applications such as drift detection that rely on accurate and timely data interchange between the test program and an external service.”
The most common example of adaptive test limits adjustment is part average testing (PAT) and dynamic part average testing (DPAT). PAT is a statistical process that goes back to the 1990s and was updated in 2011 by the Automotive Engineering Council (AEC). [1] In this process, based on the data collected during testing, you can adjust (typically tighten) the specification limits for one or more tests based on the behavior of the parts in the lot being tested. This adjustment is done to better screen potential outlier devices that are technically within the operating specifications but might be at risk for early failure in the end application.
“While PAT and DPAT have serviced the industry well for years, with today’s advanced compute capabilities and analytics, there are far better ways to identify at-risk devices while simultaneously reducing the amount of yield loss associated with this form of screening,” said Advantest’s Butler.
Fig. 3: (Top) Distribution with outliers. Source: YieldHUB
(Bottom) The ML-based algorithm is better at identifying true outliers and yielding devices. Source: Synopsys
An advanced mixed-signal chip or SoC might employ any number of on-chip sensors or monitors, offered as IP from Synopsys, proteanTecs and other firms. It is common practice to embed sensors all over the die that are used to alter the performance and monitor the health of the die, such as ring oscillators, temperature sensors, aging sensors, and many others. ProteanTecs provides on-chip monitors, called Agents, in addition to cloud-based software to correlate monitor data with data analytics.
“Common sensor and measurement data include Vmin, Fmax, Iddq, Idd, process ring oscillators, IR drop detectors, jitter detectors, thermal sensors — any analog measurements in analog/mixed-signal/RF designs,” said GF’s Carulli. “A typical outlier case at the wafer level may be with Vmin used with a nearest-neighbor-residual algorithm. A typical case at the module level may be using a bivariate model for Iddq vs. Fmax for detecting outlier behavior.
“A more advanced case from Advantest was using their ACS system to improve a digital pre-distortion test,” Carulli said. “Key inputs were offloaded to the adjacent server system for modeling and optimizing, then the optimized conditions are sent back to the tester for improved results.”
Another example involves adapting test limits to improve device quality. “When you create a semiconductor device, you’re going to run several process splits, where the engineer intentionally varies the process to identify the worst-case and best-case performance of that device,” said Butler. “I’m going to set my limits based on the process width, but the reality is that the fab is going to try to tightly control the material as much as possible. So you need to collect information that tells you how much to tighten up those specs.”
Others point to similar evolutions. “For example, Vdd consumption test results (see figure 3) can be correlated with embedded in-chip monitor or sensor data, the resulting bi-variate correlation can be captured in an algorithm deriving adaptive test limits per each individual die on-the-fly running real-time on the tester,” said Guy Cortez, senior staff product manager for SLM analytics solutions at Synopsys. “This new improved methodology to DPAT provides superior identification of true outliers for improved quality while also not sacrificing but improving on yield as well.”
Cortez went on to say that in-chip monitor or sensor data as shown in this example requires the instantiation of monitor or sensor IP into the design during the design phase. “Synopsys provides this monitor and sensor IP, but the analytics shown in our Silicon.da analytics solution can ingest monitor and sensor data from Synopsys or any other third party IP provider.”
Conclusion
ATE providers are building infrastructure to support the use of real-time adaptive testing, incorporating advanced outlier detection methods and tightening test limits to improve device quality. On-die sensors are proving they can capture the behavior of individual die, and advanced modeling is provided by ML-based algorithms. But the logistics around adaptive testing depend on the industry’s development of standard methods of encrypting yet processing critical data while protecting intellectual property of both chipmakers and fabless companies.
References
Related Story
Testing ICs Faster, Sooner, And Better
Why test cells could become the critical information hub of the fab.
Leave a Reply