Balancing AI And Engineering Expertise In The Fab

Results show big improvements when both are deployed for new process development.


Modeling and simulation are playing increasingly critical roles in chip development due to tighter process specs, shrinking process windows, and fierce competition to bring technologies to market first.

Before a new device makes it to high-volume manufacturing, there are countless engineering hours spent on developing the lithography, etching, deposition, CMP, and many other processes, at high yield. Device fabrication requires as many as 1,000 individual tool steps. But what challenges engineers most is the need to meet process specifications with nanometer level precision, which can lengthen development times and increase cost.

“It’s always been tough, but it’s getting increasingly more difficult with the moves to 3D and EUV lithography,” said Keren Kanarik, technical managing director at Lam Research. In one case study, Lam estimate a cost of over a hundred thousand dollars of process tool time and metrology work alone to develop the process to target.

This article highlights three processes, aided by computers, which show how process development costs and time-to-target can be reduced by combining human talent with the right computations:

  • A competition between engineers and machine learning algorithms (inspired by Kasparov vs. Deep Blue) that finds the balance between engineering effort and computing efficiency;
  • Using multi-physics modeling to tune a wafer cleaning process to address watermark defects; and
  • Developing a multi-physics model for thermal dissipation in 3D-IC systems.

Process data is not ChatGPT
Most people are familiar with algorithms that take advantage of large and inexpensive data sets, such as those involving consumers. In contrast, clean semiconductor process data on each new wafer stack tends to be scarce.

As such, new process development and yield learning typically suffer from “the little data problem,” in which there is insufficient data to form a reliable model, making predictions difficult. In machine learning, small data analysis leads to overfitting, because the algorithm tends to memorize the data set rather than generalizing it.

“Humans are our benchmark,” said Kanarik, whose team devised a competition between a senior engineer’s cost-to-target versus the ML algorithm’s cost-to-target for a straightforward silicon dioxide etching process. [1] “We have our expert, our Garry Kasparov, and three algorithms. We get the best results when combining human guidance followed by computer decision making.” This approach resulted in process development cost being reduced from $105,000 to $52,000.

The team got its inspiration for the competition from the first computer that successfully beat a human at the game of chess — IBM’s Deep Blue and Garry Kasparov in 1997. Unlike board games, which have clear rules, wafer-reactor systems are governed by microscopic physical and chemical interactions between wafer material, plasma species, and reactor parts. The little data set makes it especially hard to build a computer model from scratch with atomic-scale accuracy.

Fig. 1: For each virtual etching process simulation, the recipe (flows, pressure, bias, etc.) produces output metrics that describe the etch profile. Source: Lam Research

To run the competition with humans and computer algorithms, the Lam Research engineers used SEMulator3D, part of the company’s Semiverse Solutions portfolio, to build a representative virtual process. A high-aspect-ratio etching process of a 200nm hole (see figure 1) contains 11 input parameters and 6 output parameters. “But what makes it difficult is not just the number of parameters but the specifications and tolerances — how straight is the profile, how much mask remains, and how much variability can the process tolerate? The bottleneck to building many devices is often the plasma steps,” said Karanik.

The six process engineers who participated each submitted batches (one or more recipes) to the platform, receive output metrics and profiles, then submitted the next batch until all target metrics of CD, CD uniformity, etc., were met. The winning senior engineer in the contest produced a recipe that met parameter targets costing $105,000, whereas the computer programs with no human guidance exceeded $700,000. This high unaided cost was attributed to the experimental waste associated with algorithms exploring the full reaches of the vast process space to find the ideal recipe.

Fig. 2:  Human first, computer last wins: The ideal handoff point from engineer to computer is at the “V” stage for three algorithms (left to right). Algo3 (right) attained the lowest cost-to-target, with a best handed off at point at C. Past point C, the human expert’s efforts cost more than the computer calculations. Source: Lam Research

The team chose Bayesian optimization algorithms (see figure 2). Algo3, which uses a Gaussian process model, had the best results. “What I like about Bayesian optimizations is their inherent probabilistic nature, making them suitable for decision making with as little data as possible,” Kanarik said. “As humans, we are used to making decisions based on little data.” She added that computers tend to explore more process space and tolerate moves that may not immediately yield good results, while humans do not tolerate these as well.

Next, the engineers applied the learning process to actual etch processes on silicon wafers. “We are applying the human-first/computer-last approach to real-life applications now, which further validates that we’re on the right path. And even though there is an optimal hand-off point, there still are cost savings even if you’re a little bit left or a little right of the V point,” noted Kanarick.

In addition to developing new recipes, process engineers also perform troubleshooting when a process deviation or yield hit occurs in production.

Wafer cleaning dynamics
Single wafer cleaning processes are commonly used for critical cleaning steps because they provide superior within-wafer uniformity and wafer-to-wafer uniformity relative to batch cleaning approaches. Wafer cleaning accounts for about 20% of all wafer processing steps according to Derek Bassett, principal research scientist and member of technical staff at TEL. As part of a COMSOL webcast, Bassett showed how multi-physics modeling helps in understanding how specific tool parameters led to defective watermark defects that appeared in a doughnut like pattern on wafers with defect-free areas at the center and edges. [2]

“The reason why we were seeing these watermarks was not immediately obvious,” said Bassett. “Whenever we’re drying after any kind of wet chemical process it can be potentially dangerous or critical, because if we have any water droplets left from a splash or a drip from somewhere, it will react with the silicon surface to actually form silicon dioxide, which gets left behind.”

TEL’s engineering team confirmed the spots were watermarks using SEM analysis. The engineers hypothesized that water vapor was getting absorbed into the isopropanol (used in drying) during processing where the IPA is spun on the wafer center and spins off the edge. The team then developed a model including two coupled differential equations to describe the changing amount of water in a 100µm droplet of IPA.

Solving the equations at time zero simultaneously in the COMSOL ODE interface, using 23°C and 45% humidity for cleanroom conditions, they saw that even within a small fraction of a second, some water vapor was absorbed into the droplet. The calculations with different droplet sizes showed that evaporation time scaled with the droplet size (i.e., more IPA, more water), but the ratio of water absorbed to IPA volume was constant. When analyzing dynamics inside the process chamber, the droplets tended to evaporate more quickly as the fan velocity on the wafer increased.

The simulation showed that the higher the humidity, the longer the IPA and water took to evaporate, which left ample time to form watermarks. The clean dry air (CDA) is blown onto the wafer surface, but because of humidity in the ambient, humidity at the wafer varies from 0% at the center to 10% at the edge, Bassett said. Convective mass transfer rate increases gradually until the liquid reaches the wafer edge, when it accelerates dramatically.

Combining these findings, the engineers determined the conditions for watermark formation — high humidity and long evaporation times — was occurring in the middle ring area of the wafer. Humidity at the center is so low that little water is absorbed into the droplet. At the edge, evaporation is so fast that watermarks do not form. Watermarks appeared in the ring area because the conditions were ideal. TEL engineers then eliminated the problem by optimizing the flow of the air in the chamber.

Bassett provided tips for using such multi-physics tools. “Modeling and simulation really are a critical part of process development and troubleshooting for these steps because they allow us to understand what is happening in these nanoscale features when there is no way to directly obtain experimental data.” He added that simulations have the greatest chance of success when one includes the most important physics first, reduces the dimension of simulations (e.g., 3D to 2D), if possible, and keeps the model simple, adding layers of complexity as needed.

Thermal modeling for multichip packages
Concurrent multi-physics modeling is especially useful when designing 3D-IC systems, where thermal management has become a first-order concern. The interplay between power and temperature also is critical at the system level. As such, it must be considered early in the design flow with thermal resilience in mind. [3]

Thermal and mechanical issues multiply in advanced packages as engineers integrate chiplets or packaged die in close proximity. The different coefficients of thermal expansion of materials used in multi-die stacked chips also creates mechanical stresses and warping. And due to the mismatch between time constants governing electrical activity and thermal expansion, thermal profiling should be performed over longer time frames.

Multi-physics solvers can bring together chip thermal models, power models, signal models, and electrostatic models to accurately simulate system operation. “For example, Ansys’ CPM (chip power model) predicts how the temperature of a chip affects the power it produces, as well as the other way around,” said Marc Swinnen, director of marketing for Ansys’ Semiconductor Division.

Engineers have long used computational fluid dynamics (CFD) to model thermal dissipation in electronic systems. Still, while junction temperature between a monolithic chip, its package, and the environment is well understood, it’s a whole different ball game with multi-chip packages with interconnecting bumps, all cooled by a fan blowing on heat sinks. “The vital step relies on fluid modeling at the system level that interacts with the chip thermal model (CTM) to converge on a junction temperature for the 3D-IC assembly,” said Swinnen.

Simulation aims to mimic a dynamic operating environment. “Thermal analysis cannot assume static or average temperatures. If the activity levels in two chips in a stack are synchronized, such that their activity is causing them to get hot at the same place at the same time, averages don’t paint a reliable picture of how the device will perform,” he said. And because activity drives power, long activity vectors (RTL-level power estimates) using hardware emulators are needed to capture the thermal flow as a function of many activity vectors of the different chips. “It’s important to link emulators to thermal management tools in a way that can give early power estimates, runs quickly, and gives actionable feedback.”

For example, thermal analysis using the RedHawk-SC Electrothermal model of a chiplet on interposer can provide accurate current and temperature results from the chip through the interconnecting bumps between chips or between chips and board. Multi-physics solvers (see figure 3) also are needed to account for parameters with temperature dependence, such as electromigration or resistance.

Fig. 3: The Icepak using CDF simulation includes fans blowing on heat sinks attached to multi-chip packages. Source: Ansys

Though much of process development still relies primarily on the expertise of process engineers, there is increasing adoption of computer-aid to understand observed results. Meanwhile, machine learning application to process development is in its infancy.

Kanarik shared her thoughts on how the application of AI to process engineering will evolve. “There is the question of how to encode more domain knowledge (directly or indirectly) to enable faster transfers. The improved algorithms the data scientists and engineers come up with over the next few years will not just benefit Lam, but the entire industry to foundationally change the way processes are developed for semiconductor chips.”

One nice part about a human-expert-first/computer-last approach is that the most tedious aspects of development might be taken care of by computer algorithms, freeing up engineers to perform higher-level learning, process refinement, and yield improvements.

The wafer cleaning example illustrated how an age-old defectivity problem, watermarks on wafers, can be addressed using simple equations and commercial modeling tools.

Finally, packaging engineers can look to multi-physics modeling computational flow dynamics to build thermal resilience into 3D-IC systems, which will continue be pushed to higher levels of performance, greater efficiency and smaller form factors.


  1. Kanarik, K., et al. “Human–machine collaboration for improving semiconductor process development,” Nature 616, 707–711 (2023).
  2. Bassett, D. “Modeling and Simulation in Wet Chemical Processes,” COMSOL Day, April 2023,
  3. 3. Swinnen, M. “Coming In Hot: Requirements For Successful Thermal Management In 3D -IC,” Semiconductor Engineering, Oct. 2022,



Thank you for sharing this insight.
Fabs are long overdue for updating process monitoring.

Leave a Reply

(Note: This name will be displayed publicly)