SPONSOR BLOG

Can Your ATPG Do This? Cut Defects Escaping Detection With ML

Identify early indicators of risk by analyzing timing margin data from within the chip.

June 10th, 2025 - By: Alex Burlak

Chipmakers worldwide consider Automatic Test Pattern Generation (ATPG) their go-to method for achieving high test coverage in production. ATPG generates test patterns designed to detect faults in the silicon and ensures they are applied effectively using the chip’s Design-for-Test (DFT) infrastructure. This combination enhances fault detection while optimizing test efficiency.

These patterns are injected by Automatic Test Equipment (ATE) into each die during high-volume manufacturing (HVM), enabling solid quality control through large-scale testing of all chips.

ATPG at speed tests are targeted for different kinds of faults (e.g., transition faults, small delay faults) and have earned their spot in the semiconductor testing hall of fame—but what about their limitations? This article explores the risks and remedies of ATPG drawbacks to help you create a robust test program that cuts defects without affecting yield.

Understanding ATPG’s limitations and their impact

If you’re worried about your test patterns letting defects slip through, you’re not alone. Despite its advantages, conventional ATPG may not catch small, latent and marginal defects, while even creating false positives/negatives:

Latent/marginal defects: A threat to product reliability
One of the major concerns is defects that are too subtle for the pass/fail granularity of ATPG results. The marginal performance of such chips is just enough to pass all patterns on ATE, yet they are “walking-wounded” devices.

These issues often escape detection until customers discover them in the field. For example, undetected defects that potentially cause Silent Data Corruption (SDC) might lead to costly post-release issues that jeopardize product reliability and customer trust. They can also cost as much as $50,000 per RMA, not counting lost reputation and resources allocated from other projects to investigate. You can read more about such faults and their remedies in this whitepaper.

Misalignment between ATPG and real-life conditions
Another inherent limitation is the potential misalignment between test patterns and real-world scenarios, raising doubts about whether ATPG truly reflects the conditions a chip will face during lifetime operation.

To compensate for this limitation, chipmakers may tighten test thresholds, but this can lead to two risks. Overly stringent testing (overkill) may generate unrealistic patterns that cause unnecessary failures at ATE, reducing yield without real benefit. On the other hand, insufficiently representative patterns (underkill) may overlook defects that could emerge under actual workloads, leading to field failures.

Striking the right balance is critical to ensuring both high yield and long-term reliability.

What if you had high coverage timing margin data from within the chip?

Many latent faults in the field exhibit abnormal behavior that can evolve into future timing violations. These defects often escape detection due to ATPG’s limitations in capturing subtleties. Thankfully, by analyzing timing margin data from within the chip, it’s possible to identify early indicators of risk, addressing blind spots and strengthening confidence in the test program.

Table 1: Parametric margin data from within the chip mitigates ATPG limitations by tackling their causes.

The result? Imagine a robust test program that catches all those marginal issues in advance. Thanks to powerful machine learning (ML) algorithms, you could analyze high-coverage timing margin data with unprecedented visibility into every die. The ML model can be loaded onto the ATE to eliminate the blind spots of your ATPG patterns automatically.

Timing margin visibility: Enhancing quality with ML precision

Using proteanTecs’ Margin Agents (MA), designed to boost quality without compromising yield during structural tests, the minimum margin to operating frequency of millions of paths is measured, and critical issues are pinpointed per die. By analyzing parametric timing data, these Margin Agents tackle the inherent limitations of ATPG head-on.

The solution includes a cloud-based deep data analytics platform and edge software deployed on the ATE. It leverages advanced machine learning algorithms in the cloud to analyze timing margin measurements. It trains on extensive data to profile normal behavior across different operating conditions and the process distribution. Then the trained models are deployed to the edge, for inline decisions on the test floor. By generating a highly accurate predicted timing margin values across the chip, it can detect subtle deviations that ATPG would miss. If the measured timing margin deviates from the predicted value, the chip is flagged as an outlier, allowing preventive action before it reaches the field.

Fig. 1: Combining on-chip agent reading with precise Machine Learning models deployed at ATE.

The solution integrates seamlessly with your workflow:

On-chip timing margin monitors: proteanTecs Margin Agents capture real-time timing margin data from millions of logic paths, which serves as a baseline for ML model creation.
Cloud-based deep data analytics platform: Processes massive datasets with ML to train a model that learns the normal behavior, enabling the detection of anomalies beyond the scope of ATPG’s pass/fail metrics.
Edge software on the ATE: Automates the detection and classification of faulty dies on the ATE by combining real-time margin measurements with a trained model. This enables identification of latent defects and eliminates ATPG blind spots during high-volume manufacturing.

This powerful combination ensures unprecedented visibility into every die, reducing DPPM, preventing costly RMAs, and driving confidence in your test program.

Eliminating your ATPG blind spots to reduce DPPM and RMA-related costs

proteanTecs MA-based outlier detection can prevent the escapes of marginal and latent defects characteristic of complex designs and advanced nodes. Such issues might pass conventional ATPG tests as they are too subtle to detect, yet they can cause hardware failures in the field. The shift left that timing margin measurements enable directly reduces DPPM and RMA-related costs, by moving detection from the field to production testing.

As depicted below, the new data can help to make informed decisions regarding quality. A close examination of the wafer-level testing results to the left reveals that a faulty outlier which had enough margin to pass all ATPG patterns, including at-speed patterns, has outlier behavior from the expected behavior. Following the detection of the outlier die, the software pinpoints the location in the chip where the problem occurred.

Fig. 2: Reducing DPPM while simplifying defect investigation: proteanTecs MA-based outlier detection uses ML to identify faulty outliers undetectable by ATPG and then pinpoints the exact location of the problem in the chip.

Customers report a significant DPPM reduction thanks to proteanTecs MA based outlier detection. For one datacenter chipmaker, despite their high risk of failure, some devices passed all ATPG tests, in fact all production tests, as their performance was marginal rather than unacceptable. After integrating proteanTecs’ solution, the same chips showed lower-than-expected timing margin measurements, leading to their disqualification. If undetected, these units were likely to suffer timing violations that could cause Silent Data Errors after some in-field usage.

Correlating your ATPG and functional tests to reflect real-life conditions

During New Product Introduction (NPI), it is essential to establish a solid test program for High Volume Manufacturing (HVM) testing with ATPG patterns and functional system-level tests (SLT), or even System tests. As explained above, the ATPG patterns might not reflect real workloads, unlike functional tests, potentially hurting yield and DPPM.

To mitigate this misalignment, proteanTecs helps to correlate ATPG patterns and functional workloads by comparing their timing margin measurements, provided by the Margin Agents, on the same devices. There are two options for the alignment process depending on the comparison results:

ATPG timing margins are worse than functional test ones: In this case, ATPG results may be overstressing (from a performance point of view). For example, running ATPG at-speed patterns on the entire chip can cause unnatural IR drops that won’t occur in functional tests. To fix the problem, the patterns can be adjusted to reduce false fallout without compromising quality.
Functional timing margins are worse than ATPG ones: This case is dangerously misleading, making it seem like the chip is doing well, as it passed all test patterns successfully. However, timing margin measurements would reveal insufficient ATPG at-speed coverage instead, calling for additional test patterns that reflect actual functionality.

Fig. 3: The proteanTecs solution correlates margin agent data of wafer-level chip probing (left bar) and system-level test (right bar) to help reflect real-life conditions in ATPG patterns.

For example, the Margin Agent measurements above show that wafer-level ATPG timing margins are much higher than functional ones on average. These results imply that ATPG patterns fail to reflect real workloads, potentially leading to systematic failures in the field. When the chipmaker noticed, the test engineering team worked to extend ATPG patterns until their margins were aligned with functional ones.

Taking ATPG to the field

You can also use timing margin monitoring when the chip is in the field beyond NPI and HVM. This approach is aligned with the trend of running ATPG in the field at some pre-defined testing cycles or during SLT. This is called “In-System Test.”

In case of In-System Test in the Field, the timing margin information provided by the Margin Agents can once again show how close to failure a device is, even if it passes the In-System Test. The Margin Agents are capable of measuring while the device is operating real workloads. In this case, the timing margin monitoring is available both while the device is operating and executing real workloads and when running deterministic ATPG tests during In-System test cycles.

In case a malfunctioning chip returns as RMA, you can compare its timing margins across three different measurements:

Original ATPG results during HVM
Functional mode in the field
Post-RMA ATPG results

This approach can accelerate root-cause analysis, supporting test program improvements and design optimizations.

Ready to cut your DPPM and shift left defect detection? Download our exclusive whitepaper or contact our team today at this link.

Alex Burlak

(all posts)
Alex Burlak is vice president of test and analytics at proteanTecs. Before joining the company, Burlak held senior director of interconnect and silicon photonics product engineering positions at Mellanox. He holds a B.Sc. in Electrical Engineering from The Israel Institute of Technology, Technion.

Knowledge Centers
Entities, people and technologies explored

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

Can Your ATPG Do This? Cut Defects Escaping Detection With ML

Understanding ATPG’s limitations and their impact

What if you had high coverage timing margin data from within the chip?

Timing margin visibility: Enhancing quality with ML precision

Eliminating your ATPG blind spots to reduce DPPM and RMA-related costs

Correlating your ATPG and functional tests to reflect real-life conditions

Taking ATPG to the field

Alex Burlak

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

What Exactly Are Chiplets And Heterogeneous Integration?

Big Changes Ahead For Interposers And Substrates

Sponsors

Recent Comments

About

Navigation

Connect With Us

Can Your ATPG Do This? Cut Defects Escaping Detection With ML

Understanding ATPG’s limitations and their impact

What if you had high coverage timing margin data from within the chip?

Timing margin visibility: Enhancing quality with ML precision

Eliminating your ATPG blind spots to reduce DPPM and RMA-related costs

Correlating your ATPG and functional tests to reflect real-life conditions

Taking ATPG to the field

Alex Burlak

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

What Exactly Are Chiplets And Heterogeneous Integration?

Big Changes Ahead For Interposers And Substrates

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored