AI/ML Challenges In Test and Metrology

New tools are changing the game, but it will take time and collaboration for them to achieve their full potential.


The integration of artificial intelligence and machine learning (AI/ML) into semiconductor test and metrology is redefining the landscape for chip fabrication, which will be essential at advanced nodes and in increasingly dense advanced packages.

Fabs today are inundated by vast amounts of data collected across multiple manufacturing processes, and AI/ML solutions are viewed as essential for managing petabytes of data and extracting valuable insights from it. But fully leveraging the potential of this technology requires rigorous analysis, rapid interpretation, and decisive action — and overcoming a variety of hurdles.

The value of AI/ML is clear to everyone in the industry. “AI will be transformative in the test and measurement space,” said Charles Schroeder, an NI fellow. “It accelerates bringing quality products to markets safely and efficiently.”

There are limits, however. While these tools can efficiently pinpoint potential failures before they manifest, they cannot recognize and address anomalies they haven’t encountered. Traditional AI/ML models can excel only with what can be anticipated in advance, particularly if they can be aligned closely with scenarios included in the training dataset. But with unexpected faults and outliers, it’s not clear how these systems will behave.

“Unexpected failures can be a big challenge for these algorithms,” says Prasad Bachiraju, director for sales and customer solutions at Onto Innovation. “Defect detection and classification works well based on the image library you train it on, but when there are changes in the production process or the tool recipe, or some other factor on the wafer substrate, the trained image library may no longer be sufficient for detecting a defect and classifying it.”

This shortfall highlights a growing challenge within AI/ML research, namely how to imbue machines with the ability to flag such unexpected issues, regardless of whether it has encountered similar problems in the past. This requires a model that can recall and compare, as well as to predict and reason.

“With AI models, especially, you need to train them on a lot of data,” says Dieter Rathei, CEO of DR Yield. “Once you have all the data, you can experiment with different models. But unless you have the data, you can’t start playing around with new ideas about what you could model in the future or what you could catch.”

Hunting the unknown
Tackling the unknown in semiconductor test and metrology is as crucial as perfecting the known. It is not enough to build complex models. The solution must be adaptable to specific customer needs, while still identifying novel issues that appear in the analyses. That requires having the right support system in place to smoothly handle data from start to finish.

“The key is not just developing sophisticated algorithms,” says Frank Chen, director of applications and product management at Bruker Nano Surfaces & Metrology. “It also requires establishing a workflow that accommodates the varying needs of customers, and the unknowns that may arise during the scanning process.”

This can change from one design to the next. “One of the main challenges is ensuring the data is ready and accessible,” says Greg Prewitt, director of Exensio Solutions at PDF Solutions. “It involves collecting vast amounts of data, and training a capable ML model without overfitting it to a specific scenario.”

This setup allows predictive models to quickly adjust to changes, which is vital for keeping a manufacturing system running smoothly, especially considering all the complexities involved. But manufacturers are wary about integrating new AI/ML models that potentially can flag false positives or generate escapes.

“Customers are taking cautious steps with AI/ML, thoroughly testing models within known parameters before fully integrating them into their systems,” says Eli Roth, product manager for smart manufacturing at Teradyne. “There’s an understanding that false failures are presently more acceptable than missing defects.”

AI/ML must navigate these decision-making thresholds with precision, balancing multiple factors to accurately delineate between pass and fail outcomes. That balance is essential to avoid the pitfalls of data overload, focusing on pinpointing genuine concerns without being sidetracked by inconsequential data variances. “It’s critical that the model can identify anomalies without dramatic disruptions,” adds Roth. “The models need to handle unknowns efficiently while maintaining system stability.”

This is easier said than done, however. “In the real-world application of AI/ML, setting the right thresholds, especially in the contexts like pass/fail decisions, is pivotal,” says Ira Leventhal, vice president of Applied Research and Technology for Advantest. “It requires a thoughtful balance and integration of various factors and careful consideration of the objectives you are looking to optimize.”

Historically, anomaly detection has been accomplished through dynamic part average testing (DPAT), a univariate approach that examines individual parameters in isolation. But as chips increase in complexity, generating more data — and potentially more false positives or escapes — a univariate approach is no longer sufficient.

“Dynamic part average testing has been done for a long time, but DPAT is a univariate approach,” says Jeff David, vice president of AI solutions at PDF Solutions. “With machine learning, anomaly detection can be applied in a multivariate space, meaning you can find what is anomalous by looking at more than one parameter at a time.”

Such analyses enhance the detection process, allowing manufacturers to spot defects that a univariate system might miss, and to discern patterns indicating systemic issues that require broader operational adjustments. The ability of ML models to analyze an integrated spectrum of variables equips fabs with a more comprehensive, nuanced view of potential failures.

Fig. 1: Multivariate analyses can improve accuracy by looking at multiple parameters simultaneously. Source: PDF Solutions

“Machine learning can comprehend and build models beyond the scope of the human mind,” says Prewitt. “This allows for an analysis that includes hundreds of parameters, rather than the few a person can typically handle.”

Integration challenges
Effectively integrating AI/ML into semiconductor fabs presents a unique set of hurdles. Fabs must contend with ensuring the quality and relevance of data feeding into AI models, the computational demands of processing large datasets in real-time, and a cultural shift required to fully embrace these advanced technologies, especially when it involves competing companies that may not want to share data.

“One of the key challenges in AI implementation is ensuring comprehensive data collection,” says Michael Yu, vice president of advanced solutions product group at PDF Solutions. “Without capturing all relevant factors, even the most sophisticated algorithms cannot build an accurate model.”

The heterogeneous nature of semiconductor manufacturing processes also means that data can come in disparate forms, from equipment performance metrics to silicon wafer inspection results.

“Having all the relevant building blocks integrated into the manufacturing flow is a challenge,” says Alex Burlak, vice president of test and analytics for proteanTecs, pointing to several requirements for solving these issues:

  • In-die monitoring IP that can provide an accurate fingerprint from within the chip;
  • A centralized analytics platform that can absorb all the relevant data and enable data preparation and data engineering capabilities; and
  • Analytics capabilities to train the AI/ML models and then streamline these models back into the test program to enable inference per chip in production and real-time decisions during the test insertion.

Gathering the necessary data is less straightforward. It includes overcoming data that is siloed in various departments, addressing compatibility issues between different equipment manufacturers, and the non-trivial adjustment of sensors and machines to ensure accuracy and granularity.

“The success of AI relies heavily on sufficient training data,” adds Schroeder. “Engineering synthetic data is useful for efficiency, but it’s the measurement data of real-world nuances that is truly invaluable.”

Over-reliance on simulation data runs the risk of overlooking subtle but crucial real-world parameters that can skew model outputs, leading to decisions that might compromise the product quality or yield. Put simply, the integrity of AI’s predictive power hinges on the breadth and depth of data ingested.

“What happens if you try to deploy a model for yield prediction based on one set of tests? There are many factors that contribute to failures that are ignored or are not readily visible in the data set if, for example, you focus on wafer probe data alone, or final test data alone,” said Onto’s Bachiraju. “You can’t build an accurate model without looking at different data types with the right amount of production sampling.”

This becomes even more complex with constant changes to the data, due to technological upgrades and process optimization. “Whenever there is a process change, there could be a different baseline for the model,” says Bruker’s Chen. “Once that is known, you may need to update or re-train the AI model to the new baseline.”

A fab needs to collect extensive volumes of data, but it also must establish a system of traceability that ensures the provenance and quality of the data throughout the semiconductor’s lifecycle. “When you have the data from the entire supply chain integrated, you get better results in predicting the final test yield,” says Bachiraju. “Being able to learn from the back end to the front end and apply that knowledge is critical to AI/ML success, but only a few companies can do that.”

Traceability underpins the reliability of AI-driven analytics, enabling fabs to respond to issues along the production line, as well as to anticipate them. It facilitates a feedback loop where outcomes can be traced back to specific data inputs, forging a path to continuous improvement and learning for machine learning models. It is also a protective measure against potential faults and inconsistencies that might otherwise propagate unnoticed.

“Traceability is paramount,” adds PDF’s David. “Without it, options are limited, even with massive amounts of data. Especially as we advance into the realm of chiplets, ensuring interconnected data is critical for insightful analytics and operational efficiency.”

The AI/ML revolution
AI/ML technologies present a unique set of advantages within the fab environment, transforming data into actionable intelligence in ways that are both revolutionary and fundamental to the advancement of semiconductor manufacturing. In the midst of the data-dense environment of modern fabs, AI/ML leverages computational advancements to tackle problems at scales and speeds previously unattainable.

“Through AI, we are not only able to accelerate the identification of known issues, but also adapt to novel observations,” says Bruker’s Chen. “It allows us to predict and manage outliers more efficiently than ever.”

One of the most impactful advantages of AI/ML is its capability for predictive analytics. In a fab setting, where downtime can translate to significant financial loss, AI/ML models can forecast equipment failures before they occur. That enables proactive maintenance and scheduling, rather than maintaining all equipment on a fixed schedule.

“There is no doubt that AI/ML significantly contributes to cost reduction in testing while not risking the product’s quality/reliability,” says proteanTec’s Burlak. “AI/ML provides the capabilities to weed out marginal or latent defects in production testing, VDDmin/Fmax path limiters, assign more accurate process binning, and more efficiently optimize power/performance.”

This predictive capacity mitigates the risk of unplanned operational halts, while also extending the life of valuable equipment, thereby optimizing investment and resource allocation.

“AI holds a promising future in predictive analytics, with the potential not only to identify but also to anticipate problems,” says Mark Kahwati, product marketing director in Teradyne’s semiconductor testing group. “This fosters proactive adaptations in manufacturing processes.”

Moreover, AI/ML applications in fabs enhance the depth and quality of process control. By continuously analyzing process data, these systems are able to detect deviations that might indicate a shift in process parameters or the onset of a defect trend. This allows for real-time adjustments and an agility that traditional methods — often hampered by latency in feedback loops — simply cannot match.

“AI/ML can be surprisingly adept at identifying unseen problems, even those previously unencountered,” says PDF’s David. “There are a number of different machine learning techniques that can be leveraged to do this.”

One often under-appreciated advantage is AI/ML’s role in democratizing data within the fab. These technologies can distill complex datasets into digestible insights, making the data accessible to stakeholders who may not have deep technical expertise. Consequently, decision-making is no longer confined to a silo of data scientists or process engineers. It becomes part of a collective, informed strategy.

Moreover, the integration of AI/ML fosters innovation by eliminating the tedium of data sifting. Engineers and technicians can focus on creative problem solving and strategic planning, while AI/ML algorithms handle the repetitive and time-consuming tasks of data analysis. “Leveraging unsupervised learning in conjunction with traditional AI models holds significant, yet underutilized, potential to detect emerging patterns and bring about optimization of the overall semiconductor manufacturing processes,” says Advantest’s Leventhal.

This shift in focus can lead to process improvement, equipment design, and operational workflows that might otherwise remain obscured beneath the noise of raw data.

“Machine Learning also lets customers use upstream data to help determine if a failure is likely to occur downstream,” adds David. “An example already commonly deployed in the industry is if you see a lot of failures in a neighborhood of a die on the wafer at an earlier test insertion, then the probability that a good die in that bad neighborhood (GDBN) will fail at a later test insertion goes up. With machine learning, this type of methodology can be employed in a way that produces better failure prediction. Models can be trained around a set of failures downstream, so we know that the likelihood of finding failures caused by unseen root causes increases.”

Furthermore, as fabs shift toward chiplets and heterogeneous integration, AI/ML adds scalability to manage increased complexity. The layering of data from multiple process steps and diverse equipment types requires a level of comprehensive analysis only feasible with AI/ML. These systems can integrate cross-domain data to provide a unified view of the manufacturing process, which is indispensable for the multi-faceted decision-making needed for multi-die/dielet devices.

“The emergence of multi-chip packages and heterogeneous integrated circuits points to the necessity of more sophisticated ML applications to maintain high quality,” says PDF’s Prewitt. “It’s not just a nice-to-have. It will soon be a requirement.”

The human element
While AI/ML models offer significant computational efficiency and data-processing capabilities, they are not autonomous arbiters of truth. Instead, they are tools that are crafted, honed, and overseen by humans. It is the synergy between human intuition and AI/ML’s analytical power that turns raw computational potential into practical, nuanced solutions.

“The human in the loop is important, especially when unsupervised learning techniques are leveraged,” says PDF’s David. “Users need to interact with the system to clarify which anomalies are truly indicative of a fault. This active learning approach refines the model to be more effective and reduce false positives. We refer to this as collaborative learning.”

Human interaction with AI modeling is indispensable for several reasons. First, it provides the domain expertise necessary to guide the AI/ML algorithms towards relevant problem areas and meaningful interpretations of complex data sets. Humans, with their unique ability to understand context and nuance, are essential for defining the precise parameters within which AI operates and evolves.

“Overfitting and misunderstanding physical limitations within measurement data can lead to models that are precise but not predictive,” says PDF’s Yu. “There is a need for a deep understanding of the physical processes we’re modeling.”

Furthermore, while AI can sift through data and identify patterns at superhuman speeds, it is the human experts who inject an understanding of the manufacturing environment’s subtleties, the ‘whys’ behind the ‘whats,’ which AI on its own cannot decipher.

“Deploying ML models in production brings up crucial questions such as model explainability,” says Prewitt. “In sensitive applications, understanding the ‘why’ behind a model’s prediction is often as important as the prediction itself.  Explainability is an active topic in the greater AI/ML community.  In the meantime, it is advisable to implement monitoring, observation and control mechanisms around deployed models.”

This symbiotic relationship extends further. Humans are essential in the iterative process of model refinement, helping to tune AI/ML systems to align with evolving operational goals and the fab’s changing realities. While AI can propose correlations and predictions based on data, humans are required to make the critical judgments on their validity and practical implementation. It’s a continuous dialogue, with AI providing the insights and humans ensuring these insights align with production objectives.

“AI may not yet excel in handling the unexpected, but it excels in flagging anomalies that signal a problem, demanding human judgment to resolve these potential issues,” says Kahwati.

Bridging the Data Divide
“Embracing AI and ML is not just about staying current. It’s a strategic imperative,” says Advantest’s Leventhal. “Companies that integrate these technologies proactively into their processes are not merely participating in a trend. They are setting the stage for a competitive metamorphosis. Those delaying the embrace of AI/ML may well find themselves rapidly outpaced in innovation, efficiency, and ultimately market share.”

This AI/ML revolution hinges both on the algorithms and the appetite for broader collaboration across the semiconductor supply chain. Yet, the full realization of this potential is entangled with the industry’s resistance to sharing proprietary data, creating bottlenecks that could stifle AI/ML’s transformative impact.

“The industry still lacks a way to implement AI/ML consistently across the supply chain,” adds Bachiraju. “Right now, models are different between different fabs. It’s not a true supply chain because people are protecting their data.”

This protective stance of data ownership is not without reason. Intellectual property is the lifeblood of semiconductor manufacturers, often dictating market competitiveness. But this fragmented landscape also leads to isolated data models that yield disparate results — far from the cohesive, intelligent supply chain that AI/ML promises.

“Historically defensive postures between semiconductor value chain partners, such as foundries and fabless companies, can impede progress in applying ML algorithms on the wealth of semiconductor-related data to achieve greater levels of correlation and understanding,” says Leventhal.

The potential for AI/ML is hampered by an industry culture that views data sharing with suspicion. This paradigm has led to a scenario where, rather than capitalizing on the compounded value of shared insights, companies are stuck reinventing the wheel with their data models. This is a costly redundancy that hampers efficiency and scale.

“There are situations where you don’t get the right quality of data, or get unlabeled data,” says Bachiraju. “Customers often expect the AI to automatically calculate what is good and what is bad, but the system has to be taught those boundaries by the user. The lack of proper labeling can be one of the biggest challenges we have. Most suppliers have to work with limited sets of data because companies don’t want to risk their IP.”

The transition to more collaborative efforts in the semiconductor industry is both advantageous and essential. The guarded nature of data sharing, stemming from valid concerns over intellectual property protection, is causing substantial roadblocks to the progression of AI/ML models. “There should be a way for companies to work together on common models,” says Bachiraju. “We have the technology. There is no storage problem. There is no performance problem. The GPUs are there. But we need to come together in partnerships to fully leverage the power of AI.”

What’s required is a pivot from isolated innovation to coordinated advancement. A change from ‘my data’ to ‘our insights’ could unlock significant levels of improvement in yields, process optimization, and defect reduction across the supply chain. “Creating a standardized environment where varied machine learning models can be employed is essential, because which machine learning model is most effective will depend on the data and its inherent distribution,” adds David. “Industry standards would allow for a smoother integration into production and widen the effective use of machine learning.”

This strategic shift necessitates the development of industry standards around AI/ML models. Such standards could serve as blueprints guiding the generation, sharing, and utilization of data without compromising proprietary algorithms or sensitive information. By operating within an agreed framework, companies can contribute to a collective intelligence pool, ensuring that each entity benefits from progress without exposing itself to competitive vulnerability or IP theft.

“In an industry where intellectual property is fiercely guarded, ensuring encrypted data streams is paramount for client confidence in the models,” says Teradyne’s Roth. “Standardization in model sharing could enhance AI and ML application, but proprietary concerns will likely maintain a status quo of individualized model creation.”

Solving this problem requires trust, compliance mechanisms, and a way of sharing data without exposing proprietary secrets. “The future of chiplets and integrated devices necessitates more open data sharing,” adds Roth. “Practical solutions must be found to balance IP protection with the need for collective process improvement.”

This is where federated learning fits into the picture. AI models can be trained collaboratively without ever exposing the actual data. Taking that one step further, homomorphic encryption allows computations to be performed on encrypted data.

“There are tremendous opportunities to apply AI and ML more effectively — especially by not treating each case in isolation, but rather by understanding the interaction between difference process steps and the relevance of design DNA from product tape-outs, which could lead to more generic and scalable models,” says Yu.

The creation of these standards could help foster cooperation among competitors, foundries, and third-party suppliers, as well as the development of models capable of learning from an integrated, industry-wide dataset that’s vastly more informative and predictive than any single company could compile on its own. According to SEMI, however, they are “not actively working on AI/ML standards.”

Forging a path forward also will require dialog and diplomacy. This is a complex challenge that spans technical, legal, and competitive domains. But if the semiconductor industry can successfully navigate these complexities with a commitment to collaboration, it would go a long way to improving reliability and predictability across the chip industry.

Embracing AI/ML in test and metrology can make a big difference for manufacturers – making things run smoother, finding problems faster, and building better chips, but they’re not magic. They need to learn and get better over time.

“It’s important to understand that these models aren’t perfect yet,” says Bachiraju. You’re not going to have zero escapes yet, but the investments that you put into it will continue to improve that efficiency over time. So it’s worth the investment now, but don’t expect miracles from day one, because it’s a learning process.”

So, for the chip makers to really make the most of AI and ML, they need to be okay with not being perfect from the start, keep working at it, and most importantly, be ready to team up and share. This is the way forward to make chips smarter, faster, and better for everyone.

“AI and ML don’t have to be perfect from the start, but should improve processes over time,” says Roth. “Educating the industry about realistic expectations and the need for collaborative efforts in ML application is essential for progress.”

Leave a Reply

(Note: This name will be displayed publicly)