Identify early indicators of risk by analyzing timing margin data from within the chip.
Chipmakers worldwide consider Automatic Test Pattern Generation (ATPG) their go-to method for achieving high test coverage in production. ATPG generates test patterns designed to detect faults in the silicon and ensures they are applied effectively using the chip’s Design-for-Test (DFT) infrastructure. This combination enhances fault detection while optimizing test efficiency.
These patterns are injected by Automatic Test Equipment (ATE) into each die during high-volume manufacturing (HVM), enabling solid quality control through large-scale testing of all chips.
ATPG at speed tests are targeted for different kinds of faults (e.g., transition faults, small delay faults) and have earned their spot in the semiconductor testing hall of fame—but what about their limitations? This article explores the risks and remedies of ATPG drawbacks to help you create a robust test program that cuts defects without affecting yield.
If you’re worried about your test patterns letting defects slip through, you’re not alone. Despite its advantages, conventional ATPG may not catch small, latent and marginal defects, while even creating false positives/negatives:
Latent/marginal defects: A threat to product reliability
One of the major concerns is defects that are too subtle for the pass/fail granularity of ATPG results. The marginal performance of such chips is just enough to pass all patterns on ATE, yet they are “walking-wounded” devices.
These issues often escape detection until customers discover them in the field. For example, undetected defects that potentially cause Silent Data Corruption (SDC) might lead to costly post-release issues that jeopardize product reliability and customer trust. They can also cost as much as $50,000 per RMA, not counting lost reputation and resources allocated from other projects to investigate. You can read more about such faults and their remedies in this whitepaper.
Misalignment between ATPG and real-life conditions
Another inherent limitation is the potential misalignment between test patterns and real-world scenarios, raising doubts about whether ATPG truly reflects the conditions a chip will face during lifetime operation.
To compensate for this limitation, chipmakers may tighten test thresholds, but this can lead to two risks. Overly stringent testing (overkill) may generate unrealistic patterns that cause unnecessary failures at ATE, reducing yield without real benefit. On the other hand, insufficiently representative patterns (underkill) may overlook defects that could emerge under actual workloads, leading to field failures.
Striking the right balance is critical to ensuring both high yield and long-term reliability.
Many latent faults in the field exhibit abnormal behavior that can evolve into future timing violations. These defects often escape detection due to ATPG’s limitations in capturing subtleties. Thankfully, by analyzing timing margin data from within the chip, it’s possible to identify early indicators of risk, addressing blind spots and strengthening confidence in the test program.
Table 1: Parametric margin data from within the chip mitigates ATPG limitations by tackling their causes.
The result? Imagine a robust test program that catches all those marginal issues in advance. Thanks to powerful machine learning (ML) algorithms, you could analyze high-coverage timing margin data with unprecedented visibility into every die. The ML model can be loaded onto the ATE to eliminate the blind spots of your ATPG patterns automatically.
Using proteanTecs’ Margin Agents (MA), designed to boost quality without compromising yield during structural tests, the minimum margin to operating frequency of millions of paths is measured, and critical issues are pinpointed per die. By analyzing parametric timing data, these Margin Agents tackle the inherent limitations of ATPG head-on.
The solution includes a cloud-based deep data analytics platform and edge software deployed on the ATE. It leverages advanced machine learning algorithms in the cloud to analyze timing margin measurements. It trains on extensive data to profile normal behavior across different operating conditions and the process distribution. Then the trained models are deployed to the edge, for inline decisions on the test floor. By generating a highly accurate predicted timing margin values across the chip, it can detect subtle deviations that ATPG would miss. If the measured timing margin deviates from the predicted value, the chip is flagged as an outlier, allowing preventive action before it reaches the field.
Fig. 1: Combining on-chip agent reading with precise Machine Learning models deployed at ATE.
The solution integrates seamlessly with your workflow:
This powerful combination ensures unprecedented visibility into every die, reducing DPPM, preventing costly RMAs, and driving confidence in your test program.
proteanTecs MA-based outlier detection can prevent the escapes of marginal and latent defects characteristic of complex designs and advanced nodes. Such issues might pass conventional ATPG tests as they are too subtle to detect, yet they can cause hardware failures in the field. The shift left that timing margin measurements enable directly reduces DPPM and RMA-related costs, by moving detection from the field to production testing.
As depicted below, the new data can help to make informed decisions regarding quality. A close examination of the wafer-level testing results to the left reveals that a faulty outlier which had enough margin to pass all ATPG patterns, including at-speed patterns, has outlier behavior from the expected behavior. Following the detection of the outlier die, the software pinpoints the location in the chip where the problem occurred.
Fig. 2: Reducing DPPM while simplifying defect investigation: proteanTecs MA-based outlier detection uses ML to identify faulty outliers undetectable by ATPG and then pinpoints the exact location of the problem in the chip.
Customers report a significant DPPM reduction thanks to proteanTecs MA based outlier detection. For one datacenter chipmaker, despite their high risk of failure, some devices passed all ATPG tests, in fact all production tests, as their performance was marginal rather than unacceptable. After integrating proteanTecs’ solution, the same chips showed lower-than-expected timing margin measurements, leading to their disqualification. If undetected, these units were likely to suffer timing violations that could cause Silent Data Errors after some in-field usage.
During New Product Introduction (NPI), it is essential to establish a solid test program for High Volume Manufacturing (HVM) testing with ATPG patterns and functional system-level tests (SLT), or even System tests. As explained above, the ATPG patterns might not reflect real workloads, unlike functional tests, potentially hurting yield and DPPM.
To mitigate this misalignment, proteanTecs helps to correlate ATPG patterns and functional workloads by comparing their timing margin measurements, provided by the Margin Agents, on the same devices. There are two options for the alignment process depending on the comparison results:
Fig. 3: The proteanTecs solution correlates margin agent data of wafer-level chip probing (left bar) and system-level test (right bar) to help reflect real-life conditions in ATPG patterns.
For example, the Margin Agent measurements above show that wafer-level ATPG timing margins are much higher than functional ones on average. These results imply that ATPG patterns fail to reflect real workloads, potentially leading to systematic failures in the field. When the chipmaker noticed, the test engineering team worked to extend ATPG patterns until their margins were aligned with functional ones.
You can also use timing margin monitoring when the chip is in the field beyond NPI and HVM. This approach is aligned with the trend of running ATPG in the field at some pre-defined testing cycles or during SLT. This is called “In-System Test.”
In case of In-System Test in the Field, the timing margin information provided by the Margin Agents can once again show how close to failure a device is, even if it passes the In-System Test. The Margin Agents are capable of measuring while the device is operating real workloads. In this case, the timing margin monitoring is available both while the device is operating and executing real workloads and when running deterministic ATPG tests during In-System test cycles.
In case a malfunctioning chip returns as RMA, you can compare its timing margins across three different measurements:
This approach can accelerate root-cause analysis, supporting test program improvements and design optimizations.
Ready to cut your DPPM and shift left defect detection? Download our exclusive whitepaper or contact our team today at this link.
Leave a Reply