Fabs Begin Ramping Up Machine Learning

New models can debug processes and boost yield, but there are lots of caveats.


Fabs are beginning to deploy machine learning models to drill deep into complex processes, leveraging both vast compute power and significant advances in ML. All of this is necessary as dimensions shrink and complexity increases with new materials and structures, processes, and packaging options, and as demand for reliability increases.

Building robust models requires training the algorithms, and successful deployment requires monitoring the application and effectiveness of those models. But it’s not as simple as press a button and go. It requires training/retraining/cross-training of different engineering disciplines — fab engineers, equipment engineers, and process engineers — to utilize these models for recipe pathfinding, process ramping, critical-dimension optimization, wafer yield improvement, and tool-to-tool and chamber-to-chamber matching.

“Domain expertise is absolutely essential for productive use of AI/ML approaches,” said Regina Freed, vice president of AIx solutions at Applied Materials. “In fact, hardware and process expertise — in tandem with hybrid models based on ML and physics — will be the only path for successful implementation of ML.”

Working in tandem with ML-trained models requires an understanding of how to use those models, as well as an ability to evaluate their effectiveness and robustness.

“Engineers need to believe that it’s going to work, and they need to understand what ML can and can’t do,” said Jon Herlocker, CEO of Tignis. “It’s not going to turn into a machine and decide to take over the fab. Engineers can set controls on these models to develop confidence. They use the same robust statistical methods to qualify the ML trained models that they would use to qualify a traditional process control system. They create specific sampling plans to measure variability within those plans.”

ML also opens the door to exploring more complex relationships between data across the manufacturing flow. But there are some caveats.

“The thing about the deep learning (DL) evolution of machine learning that is really amazing is the use of this incredible compute power, especially in GPUs,” observed Ajay Baranwal, director for the Center for Deep Learning in Electronics Manufacturing (CDLe). “In DL training, an incredible number of parameters and parameter combinations can be explored by using this vast compute power. In some ways, what’s happened is that brute force computing can win the day these days because so much computing power is now available. But there is a tradeoff between exploring large parameters to perform complex analysis and limiting it to simpler models. Overfitting, bias, and interpretability are a couple of downsides of keeping more parameters.”

Once qualified, an ML model used for fab control needs to be monitored to account for drift and other process changes that could affect the relationships between input and output data.

Effectively applying an ML-based model for equipment and process control requires more than just model training. It also requires validation, monitoring, and maintenance.

Reliable data and domain expertise
Successful ML-trained models are built on a foundation of reliable data and domain expertise. Next, reducing input parameters from the hundreds to thousands available to those that really matter requires an exploration of relationships, often using physics-based models. Such models are particularly important for process recipe development. Checking the results against physics-based models and engineering knowledge assures proper guidance of ML development. Also, there are the tradeoffs when optimizing multiple outcome specifications.

Fig. 1: Necessary components to train an ML model for fab processes. Source: A. Meixner/Semiconductor Engineering

Regardless of whether engineers apply computer vision, statistical models, or deep learning models, model creation always begins with data. That data needs to be clean, meaning error-free, and complete. For optimization across multiple fab process steps or feed-forward applications, merging data from multiple sources requires wafer and/or die-level traceability. As multiple industry experts noted, the heavy lifting is in the management of data. This is true for the initial model building and for the deployment in a factory setting.

Along with a steady flow of reliable data, fab process engineers must steer the model in the right direction. This need is no different than in any other field in which ML derived models are created.

“In the earliest stages, having subject matter experts is very useful in ramping the ML in the fabs, especially in process development applications,” said Anjaneya Thakar, senior director of product line management at Synopsys. “As the model is being built using a limited amount of data, you could be optimizing around a wrong optimal point, because they’ve seen the problem before a subject matter expert can quickly guide it back. Once your model is tuned, and you’ve gotten control of your variability, then there is less need for subject matter experts to run it on a daily basis.”

Just letting an algorithm train a model with no direction is like having a new hire in the fab.

“In some respects, machine learning is like human learning,” said Tignis’ Herlocker. “If you have a junior process engineer, there is only so much they can do. But as the engineer learns, they can solve more complex issues. The same applies with ML. The more training the ML model has, the better it becomes. ML model training is the domain knowledge. The challenge moving forward is how to take all that domain knowledge and transform it so that it can be consumed by ML as ‘training.’ Ultimately, we need to pair human intuition and knowledge with ML to get the best outcomes.”

Lam Research explored that optimal pairing of engineer’s intuition/knowledge with ML outcomes in a simulation experiment. [1] “There’s an adage among engineers that data science without domain knowledge isn’t worth much,” said Keren Kanarik, technical managing director at Lam Research. “Likewise, domain knowledge without data science isn’t going to enable you to innovate fast enough in the competitive semiconductor world. Lam recently tested this idea by creating a virtual game that compared humans to AI in developing a semiconductor process at the lowest cost-to-target (i.e., the fewest number of experiments). The results delivered a prescriptive approach for how to combine domain expertise with data science.”

Reduce and prioritize
The big attraction of ML-based models is their ability to connect multiple input parameters into non-linear relationships with several outcome parameters. Yet for today’s models, most experts agree that often the number of significant input parameters is on the order of 10. There are several reasons for this limit, including optimizing for wrong outcome, explaining the trained model, and seeing the relationships.

Fig. 2: Machine learning build pipeline for creating a model. Source: A. Meixner/Semiconductor Engineering

Fig. 2: Machine learning build pipeline for creating a model. Source: A. Meixner/Semiconductor Engineering

“The key to the successful implementation of complex algorithms is visualization. You have to provide a powerful visual representation that helps users instantly interpret and understand the results – and their boundaries – of the algorithms. When the visual image hits the viewer right between the eyes, yield engineers can jump into action,” stated Dieter Rathei, CEO of DR Yield.

ML training algorithms can more easily explore more parameters; however, parameter reduction represents a vital step in the model development pipeline.

“Due to the black-box nature of most ML models, there is sometimes a fear that users will find themselves correlating process performance with the day of the week,” said Meghali Chopra, CEO of Sandbox Semiconductor. “ML is designed to illuminate key data relationships, but garbage in equals garbage out. A good ML model approach and pipeline is designed to distinguish the signal from the noise. We use physics-enabled AI to constrain the parameter space. We also invest heavily in data pre-processing and dimensionality reduction techniques to ensure that the most important parameters are illuminated to our users.”

Illuminating the significant parameters to engineers can be supplied in a tiered manner, first one parameter, then two, and possibly three.

“We have two levels in our tool,” said Jeff David, vice president of AI solutions at PDF Solutions. “One is the univariant prediction. We start with univariant because that’s easy to understand and visualize. For example, if parameter A is about 2.7, and I see more failures when it’s below 2.7, then in the next step we have an interaction between two parameters. And then we show the drill-downs in the interaction for those two parameters. But we stop at two, because once you go to three and four it becomes very difficult to visualize and to explore.”

The other caveat on the number of parameters used is incorrect correlations.

“For this type of application (etch recipe development), the processes are so complex that root-cause analysis can show why modeling that complexity is really important,” said Sandbox Semiconductor’s Chopra. “The danger of modeling so many parameters is you can find false correlations, i.e., the process outcome with the day of the week. That’s what you want to avoid. We invest heavily in dimensionality reduction. And then we use our anchoring physics-based model, so we’re never too worried about over complexifying the problem because a good modeling pipeline will essentially find all the important process parameters for you.”

Others concur on the need for a modeling pipeline that reduces parameters and finds significant relationships.

“There’s definitely a limit in the number of parameters,” said PDF’s David. “But you don’t know for a given use case until you train a model with the data you have and validate it. Our platform automatically allows our customers to do this with our training pipeline. Our training pipeline scales to allow our customers to see the metrics they need to see on their trained models in a few hours, from data ingestion all the way to trained models. Then, if the trained models show value, the user can automatically deploy the models with our ModelOps platform.”

There also are tradeoffs in balancing multiple outcomes.

“Fundamentally, it is true that there’s no way you can build these controllers without a process engineer participating, because there are decisions about tradeoffs that need to be made,” said Herlocker. “You’re trying to optimize for many things at once. For example, you’re trying to optimize for CD, but you’re also trying to minimize the amount of energy you’re using. There are inherent human decisions engineers make about, ‘How much do I care about process quality versus energy use versus chemical consumption versus chemical emissions?’ But ML training doesn’t know which one’s more important. Only the engineers do, so they need to be there.”

Deploy and maintain
Once developed, the ML needs to be validated and deployed into a production setting, and it needs to be maintained. This is best executed with a systematic and reliable process in the semiconductor factories, and it requires a machine learning operations (ML Ops) process/methods platform. This has become standard in ML applications in other industries, and is just now happening in the semiconductor industry.

Engineers develop ML models in isolation. The transition from development to the production environment involves a set of practices similar to DevOps for the deployment of software systems. When the model is launched into production, an ML Ops platform enables an increase in the automation of deployment and improvement in model quality. [2] Thus, it applies to the entire ML model lifecycle.

“For the last decade or so there have been lots of excitement around how ML can bring value to semiconductor manufacturing across many use cases,” said PDF’s David. “But when it comes time to implement it into production, people struggle. For example, how do you monitor if your trained model is doing what it’s supposed to do? If not, what is the recourse? What if it comes time to make predictions and your data is not there? That happens more often than we’d like. How do you detect a drift in the input data that your trained model relied upon? What actions do you take around that? What do you do? ML Ops is basically the implementation of getting all this stuff running in production so you can actually use it.”

In addition, models should have retraining capability. “There have been lots of breakthroughs in new ML algorithms, many of which are deep learning-related. Deep learning is a big area of investment, and as an industry we are able to do many things that were not possible before, especially when it comes to working with large models. Many improvements in the area of model retraining automation are proving essential for the broad deployment of ML models in high-volume manufacturing environments,” said Tignis’s Herlocker.

Another aspect of ML Ops is quicker deployment of models for similar predictions, but on different products. Due to product characteristics, the same input parameters could result in different output data. This also can mean that a different training approach is a better match for a data set. And this is where the ML pipeline and ML Ops comes into play.

“You want to train a different model per product A, B, and C and you want to do so in an automated way, as well as quickly deploy each one of those,” said PDF’s David. “Maybe random forest isn’t the best one for that data set. With ML Ops, you take your data set, you chop it up into pieces, and you deploy different types of algorithmic approaches. They’re hyper-parameters to that data set that can be validated in what’s called cross-validation. [3] Then you build out the model using your best algorithm approach, which then gets wrapped up into the model, and then you deploy that for production. That can be different from chip to chip. ML Ops is sorely needed in the industry, and I’m even hearing from customers that they want this ML OPS platform more than the ability to train a model with an algorithm. The reason­, at the end of the day, is they want to actually deploy ML in production. Without a platform to do that, nothing else matters.”

With the pressures of economy, effectiveness and efficiency, semiconductor fab engineering teams will need to utilize ML models to support their work. ML has emerged as a means to accelerate recipe development, boost throughput and eke out a few percentage points of yield. With ML’s capability to address complex interactions that depend upon spatial properties and temporal states, future ML models will co-optimize across process steps, accelerating the understanding of new chemical mechanisms and so much more.

But as with any tool in their toolbox, engineers need to understand ML’s limitations. To do that effectively, they need a robust ML model pipeline that develops, validates, deploys and monitors. And process engineers are still required to direct and facilitate the application.

“One of the things that we found is that today’s ML solutions need a process engineer, software engineer, data scientists and an IT person,” said Herlocker. “As a part of our vision, the critical person is the process engineer. We can build a piece of software that lets the process engineer do this without everyone else. We’re getting close to achieving that goal and thus empowering the process engineer to effectively use ML.”


  1. Kanarik, K., et al. “Human–machine collaboration for improving semiconductor process development,” Nature 616, 707–711 (2023). https://doi.org/10.1038/s41586-023-05773-7
  2. https://en.wikipedia.org/wiki/MLOps
  3. https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)

Related stories
When And Where To Implement AI/ML In Fabs
Smarter tools can improve process control, identify the causes of excursions, and accelerate recipe development.

Using ML For Improved Fab Scheduling
Researchers are using neural networks to boost wafer processing efficiency by identifying patterns in large collections of data.

Applying ML In Failure Analysis
When and where machine learning is best used, and how to choose the right model.

Data Issues Mount In Chip Manufacturing
Master data practices enable product engineers and factory IT engineers to deal with variety of data types and quality.

Balancing AI And Engineering Expertise In The Fab
Results show big improvements when both are deployed for new process development.

Leave a Reply

(Note: This name will be displayed publicly)