Predictive modeling, strategic sampling and embedded monitors help accelerate testing for yield limiting defects.
As packaging complexity increases and nodes shrink, defect detection becomes significantly more difficult. Engineers must contend with subtle variations introduced during fabrication and assembly without sacrificing throughput.
New material stacks degrade signal-to-noise ratios, which makes metrology more difficult. At the same time, inspection systems face a more nuanced challenge — how to detect critical yield-limiting defects without overwhelming engineers with benign false positives that inflate costs and delay product release.
“Dimensional shrink challenges both metrology and inspection,” says Al Gamble, vice president of product marketing strategy at Onto Innovation. “Sampling requirements increase, while the need to manage and ultimately reduce cycle times remains a key performance indicator. The ability to reliably detect yield-critical defects while minimizing impact on throughput has become a defining factor for tool effectiveness.”
To address these challenges, the industry is shifting toward inspection strategies that are smarter and more targeted. One example is vector-based e-beam inspection, which navigates directly to suspected weak points rather than scanning an entire wafer in raster mode. These systems rely heavily on layout-aware predictive modeling to anticipate where failures are most likely to occur, an approach that is particularly vital for modern architectures like nanosheets or backside power delivery networks.
“If you have design information, you can simulate which areas have enough capacitance to act as a virtual ground,” says Michael Yu, vice president of Advanced Solutions at PDF Solutions. “That’s essential for making e-beam inspection viable before the backside power rail is built.”
With these advanced structures, traditional imaging tools can fall short. Minus an electrical discharge path, floating or incomplete structures produce little contrast under standard inspection conditions. By integrating simulation data and structural context into targeting strategies, engineers can mitigate the risk of misclassification and avoid damage during inspection.
“The beam has to know where to go and how to land,” says Yu. “Using layout data to inform that targeting not only saves time, it also ensures we’re not damaging the structure or getting meaningless results.”
As the industry leans further into data-driven test flows, improving defect classification accuracy becomes just as important as finding defects in the first place. False positives slow down production and result in unnecessary rework and wasted engineering hours. To mitigate this, AI-enabled classification models are being trained on historical defect libraries and real-time feedback loops, significantly improving both sensitivity and selectivity.
“AI-based analytics and classification best improve efficiency and false positives using feed-forward/back data loops,” says Onto’s Gamble. “The most superior results come from metrology solutions sharing optic and algorithmic synergies.”
In addition to refining classification, AI also is used to reduce the volume of physical measurements needed. Rather than testing every wafer exhaustively, engineers are building statistical models that represent entire wafer populations through strategically sampled data. By aggregating measurements across multiple wafers, they can build virtual representations of process behavior that highlight subtle, repeatable deviations.
This strategy is essential for managing the balance between sensitivity and throughput. As design rules tighten and process windows shrink, metrology must be more precise — but not at the cost of cycle time. Engineers increasingly are using AI not just for analysis, but to guide where to apply deeper scrutiny and where standard monitoring suffices.
“One cannot accurately control what is inaccurately measured,” says Gamble. “Analytics on aggregated datasets allow us to catch trends that would be missed on a single wafer.”
Advanced packaging adds challenges
Alongside front-end scaling, the rise of advanced packaging introduces new stress points — thermal gradients, warpage, signal distortion, and cross-die interactions — which traditional test flows were not built to detect. A die that passes every electrical test in isolation still may cause system failures once integrated into a high-density 3D stack. To get ahead of these issues, simulation and modeling must move upstream in the design process.
“Adding multiple chiplets and other materials manufactured with a lot of heat can produce warping,” says John Ferguson, senior director of product management at Siemens EDA. “It’s critical to understand those impacts in advance, because once the stack is built, it’s often too late to correct.”
Thermal deformation is just one of many issues. As interconnects shrink and power delivery becomes more complex, IR drop, crosstalk, and electromigration become more problematic. To identify these potential weak spots early, engineers are using multi-physics modeling for electrical, thermal, and mechanical domains.
But even the most rigorous simulations can’t predict every real-world failure. That’s where embedded diagnostics come in. On-die monitors integrated during the fabrication design phase can monitor real-time critical parameters under normal operating mode, such as timing margin, thermal loading, workload stress metrics response, and voltage and clock signal anomalies, providing visibility into failure modes that traditional in-line tests might miss.
“Some issues on the board as well, like the power delivery path or even the heatsink mounting, can impact chip performance,” says Alex Burlak, vice president of test and analytics at proteanTecs. “We’re now able to detect conditions that were completely invisible to test in the past.”
This shift, from detecting isolated defects to understanding broader reliability trends, has fundamentally transformed test. Instead of acting as a gatekeeper between the fab and the field, test is evolving into a continuous learning engine that feeds back into simulation, design improvement, and process refinement.
Fig. 1: End-to-end analytics using on-chip agents across production and field. Source: proteanTecs
Maintaining that loop requires an infrastructure that can unify data streams from disparate vendors, facilities, and tools. In particular, chiplet-based packaging makes traceability non-negotiable. A single product may integrate dies fabricated in different countries, assembled by separate OSATs, and tested on disjointed platforms. Without shared visibility and data governance, failure analysis and predictive modeling fall apart.
“You have to make sure all the data flows into the same place before making a decision, and know that everything is aligned and traceable,” says Yu. “The challenge isn’t building the AI model. It’s getting clean, synchronized data from multiple sources, with security and governance in place.”
As fabs, equipment vendors, fabless, and other partners look to enable this kind of collaboration without compromising IP, privacy-preserving techniques are gaining ground. “There is a way to do AI without decrypting the information, using various zero-trust approaches including potentially monomorphic encryption,” says Yu. “This allows all parties to collaborate without giving up IP or compromising security.”
The more granular and predictive test becomes, the more it shifts away from binary decision-making and toward nuanced risk assessment. Engineers are not just trying to find whether a chip fails, but why, and whether it will fail later. Detecting early signs of margin loss or degradation helps prevent field failures, but only if tools, models, and data infrastructure can keep up.
From gatekeeping to continuous improvement
Traditional views of test as a one-time filter are no longer viable. A chip may pass wafer-level and package-level tests, only to exhibit failures after integration or under field conditions. This is particularly true in chiplet-based architectures, where tight thermal coupling and electrical interactions can create edge-case scenarios that are invisible to standard inspection methods.
“Parametric issues, like signal degradation across the interposer, are much harder to detect,” says Yu. “These aren’t opens or shorts you can catch optically. They require a combination of characterization, simulation, and predictive modeling.”
To address this, test strategies now emphasize lifecycle modeling and feedback. For example, multi-physics simulation is used to anticipate stress-driven effects like electromigration or thermal fatigue, while embedded monitors provide runtime visibility into evolving behavior. What matters increasingly is not whether a die passes at time zero, but how it performs after hours, weeks, or months of operation.
This evolution makes test a continuous function that spans every phase of the product lifecycle. Field data, simulation results, and in-line measurements all contribute to a shared understanding of risk. In this new model, yield is no longer a static metric. It has evolved into a dynamic, traceable outcome influenced by design, materials, environment, and assembly decisions.
“Test is increasingly moving from a gate function to a continuous improvement function,” says Burlak. “It’s about understanding marginalities, not just identifying defects.”
This approach necessitates smarter tools. Adaptive test programs, informed by prior wafer data and environmental telemetry, can adjust pass/fail thresholds or skip redundant steps. Meanwhile, integration between simulation and inspection allows real-world results to refine predictive models in near real time. The goal has shifted from just coverage to actionable insight.
AI plays a pivotal role in orchestrating this complexity. Where conventional test flows applied the same procedures to every unit, AI models help determine when, where, and how to test based on risk profiles. These insights are drawn from accumulated data across wafers, lots, and test stages, and they allow engineers to focus efforts precisely where failure is most likely.
“We’re now able to use AI to prioritize where to inspect, not just how to classify,” says Gamble. “That helps us scale sensitivity without losing efficiency.”
Toward predictive and adaptive test
Even with AI and adaptive flows, test engineers still face fundamental tradeoffs. Advanced nodes demand atomic-scale precision, yet production cycles allow only limited time for inspection. The solution isn’t always to measure more. It’s also to measure smarter by using virtual metrology and hybrid approaches to infer critical information from correlated indicators.
One such emerging technique involves data fusion — combining outputs from different tools to estimate values that would otherwise require invasive or costly steps. The quality of these inferences depends on tight correlation models and prior knowledge of how processes influence measurable characteristics.
“We’re seeing greater emphasis on correlation-based metrology models, especially where direct measurement is slow or invasive,” says Gamble. “When you know your upstream processes and their signature effects, you can make confident predictions downstream.”
Confidence in these models allows fabs to reduce sampling rates, lower costs, and still maintain control. But this requires consistency and calibration across global operations. Equipment must remain in alignment, recipes must stay consistent, and tools must recognize drift before it becomes a problem.
Digital twins offer a way to model these interactions comprehensively. By combining simulation with real-world performance data, digital twins create a living profile of how a device behaves across different scenarios. These models don’t replicate every detail, but they focus on key performance drivers like temperature response, signal distortion, or mechanical deformation and evolve as new data becomes available.
“Digital twins allow us to model how the device will behave in different environments, not just in the lab,” says Ferguson. “That gives us predictive insight we wouldn’t get from test alone.”
This predictive capability is becoming increasingly essential for emerging technologies like RF, photonics, and mmWave communication. These domains are highly sensitive to subtle shifts in materials, process variation, and parasitics, each of which may have minimal effect on digital logic, but a significant impact on analog performance.
“RF, analog, and photonics add layers of complexity that demand different simulation and test tools,” says Ferguson. “Their sensitivity to material variation and parasitics makes them difficult to validate without multi-physics models.”
Packaging complexity demands integrated test solutions
The testing landscape becomes even more complicated in the context of heterogeneous integration. A small thermal deviation in one chiplet may shift the entire stack’s behavior, and performance may change dramatically based on the assembly’s physical configuration. In these systems, validating each die independently isn’t enough, engineers must understand how those dies perform together under real workloads.
“We’re seeing a growing need for co-optimization between power, performance, and test,” says Marc Hutner, director of product management for Tessent Learning at Siemens EDA. “It’s no longer about validating one block at a time; it’s about understanding the system context in which those blocks will operate.”
That system-level perspective transforms test into an ecosystem-wide activity. Test must now encompass more than just silicon — interposers, substrates, thermal interfaces, and system enclosures. And as more of these components are sourced from different locations and vendors, the demand for secure, cross-enterprise data coordination becomes urgent.
Silicon lifecycle management software aims to provide the foundation for tracking this ecosystem-wide activity, which also includes failure analysis (FA) data. ”One vision is to be able to do FA root-cause correlation throughout the entire life cycle,” said Guy Cortez, senior principal product manager at Synopsys. “With 2.5D and 3D you have interposers and TSVs, and these connecting vias have to be considered a potential source of failure.”
That creates other problems. “There’s still a lot of friction when it comes to interoperability,” says Yu. “Not everyone wants to share their process knobs or model internals. That’s where various zero-trust approaches, including potentially monomorphic encryption, are critical — doing the analysis without exposing the underlying data.”
The result is a complex balancing act. Companies must enable data-sharing without compromising trade secrets. Engineers must gather enough insight to make informed decisions, without drowning in unnecessary measurements. And test teams must adapt their methods to continuously improve process control and system robustness, not just to catch errors.
A new role for test
The transformation of semiconductor test is not simply one of scale or speed. It’s one of identity. What used to serve as the last line of defense is now a continuous, distributed process that links simulation, fabrication, and system-level performance. As device architectures evolve and integration deepens, testing must become more predictive, more adaptive, and more intelligent.
This change also is reshaping the role of the test engineer. Mastery of traditional methods is no longer sufficient. Engineers today must understand statistical modeling of data pipelines, AI integration, and system-level co-simulation. Tools and workflows are becoming increasingly software-defined, and success depends on the ability to interpret complex signals in context.
“There’s no longer a single phase where you ‘do the test,’” says Gamble. “It’s built into the entire lifecycle, from first silicon to final system performance.”
With embedded monitors capturing live telemetry and digital twins simulating system behavior, test strategies are now being used to anticipate, not just identify, potential failures. This approach helps to catch issues before they manifest, enable targeted process optimization, and support faster time-to-yield.
“Rather than trying to catch every defect in the traditional sense, we’re using embedded monitors and behavioral machine learning models to detect the effects of faults as they evolve,” explains Burlak. “The test doesn’t have to be exhaustive. It just has to catch the earliest signs of trouble, even if the defect hasn’t fully manifested yet.”
Ultimately, this convergence of simulation, test, and AI creates a test infrastructure that is both proactive and reactive. With better alignment across stakeholders — from fab to field — semiconductor companies can build more resilient products, identify systemic risks early, and iterate faster in a fiercely competitive market.
Conclusion
The increasing complexity of advanced semiconductor nodes and heterogeneous packaging has redefined what test must accomplish. It is no longer enough to detect defects after they occur. The future of testing lies in prediction, adaptation, and integration across the entire lifecycle of the device.
Powered by AI, digital twins, and embedded diagnostics, test is becoming a collaborative engine of insight. And rather than acting as a barrier between design and production, it now serves as the connective fiber that enables faster cycles of learning, more precise manufacturing control, and ultimately, more reliable chips.
Leave a Reply