Predicting And Preventing Process Drift

AI/ML are increasingly vital for good yield and reliability, but there are still plenty of pitfalls to avoid.


Increasingly tight tolerances and rigorous demands for quality are forcing chipmakers and equipment manufacturers to ferret out minor process variances, which can create significant anomalies in device behavior and render a device non-functional.

In the past, many of these variances were ignored. But for a growing number of applications, that’s no longer possible. Even minor fluctuations in deposition rates during a chemical vapor deposition (CVD) process, for example, can lead to inconsistencies in layer uniformity, which can impact the electrical isolation properties essential for reliable circuit operation. Similarly, slight variations in a photolithography step can cause alignment issues between layers, leading to shorts or open circuits in the final device.

Some of these variances can be attributed to process error, but more frequently they stem from process drift — the gradual deviation of process parameters from their set points. Drift can occur in any of the hundreds of process steps involved in manufacturing a single wafer, subtly altering the electrical properties of chips and leading to functional and reliability issues. In highly complex and sensitive ICs, even the slightest deviations can cause defects in the end product.

“All fabs already know drift. They understand drift. They would just like a better way to deal with drift,” said David Park, vice president of marketing at Tignis. “It doesn’t matter whether it’s lithography, CMP (chemical mechanical polishing), CVD or PVD (chemical/physical vapor deposition), they’re all going to have drift. And it’s all going to happen at various rates because they are different process steps.”

At advanced nodes and in dense advanced packages, where a nanometer can be critical, controlling process drift is vital for maintaining high yield and ensuring profitability. By rigorously monitoring and correcting for drift, engineers can ensure that production consistently meets quality standards, thereby maximizing yield and minimizing waste.

“Monitoring and controlling hundreds of thousands of sensors in a typical fab requires the ability to handle petabytes of real-time data from a large variety of tools,” said Vivek Jain, principal product manager, smart manufacturing at Synopsys. “Fabs can only control parameters or behaviors they can measure and analyze. They use statistical analysis and error budget breakdowns to define upper control limits (UCLs) and lower control limits (LCLs) to monitor the stability of measured process parameters and behaviors.”

Dialing in legacy fabs
In legacy fabs — primarily 200mm — most of the chips use 180nm or older process technology, so process drift does not need to be as precisely monitored as in the more advanced 300mm counterparts. Nonetheless, significant divergence can lead to disparities in device performance and reliability, creating a cascade of operational challenges.

Manufacturers operating at older technology nodes might lack the sophisticated, real-time monitoring and control methods that are standard in cutting-edge fabs. While the latter have embraced ML to predict and correct for drift, many legacy operations still rely heavily on periodic manual checks and adjustments. Thus, the management of process drift in these settings is reactive rather than proactive, making changes after problems are detected rather than preventing them.

“There is a separation between 300-millimeter and 200-millimeter fabs,” said Park. “The 300-millimeter guys are all doing some version of machine learning. Sometimes it’s called advanced process control, and sometimes it’s actually AI-powered process control. For some of the 200-millimeter fabs with more mature process nodes, they basically have a recipe they set and a bunch of technicians looking at machines and looking at the CDs. When the drift happens, they go through their process recipe and manually adjust for the out-of-control processes, and that’s just what they’ve always done. It works for them.”

For these older fabs, however, the repercussions of process drift can be substantial. Minor deviations in process parameters, such as temperature or pressure during the deposition or etching phases, gradually can lead to changes in the physical structure of the semiconductor devices. Over time, these minute alterations can compound, resulting in layers of materials that deviate from their intended characteristics. Such deviations affect critical dimensions and ultimately can compromise the electrical performance of the chip, leading to slower processing speeds, higher power consumption, or outright device failure.

The reliability equation is equally impacted by process drift. Chips are expected to operate consistently over extended periods, often under a range of environmental conditions. However, when process-induced variability can weaken the device’s resilience, precipitating early wear-out mechanisms and reducing its lifetime. In situations where dependability is non-negotiable, such as in automotive or medical applications, those variations can have dire consequences.

But with hundreds of process steps for a typical IC, eliminating all variability in fabs is simply not feasible.

“Process drift is never going to not happen, because the processes are going to have some sort of side effect,” Park said. “The machines go out of spec and things like pumps and valves and all sorts of things need to be replaced. You’re still going to have preventive maintenance (PM). But if the critical dimensions are being managed correctly, which is typically what triggers the drift, you can go a longer period of time between cleanings or the scheduled PMs and get more capacity.”

Process drift pitfalls
Managing process drift in semiconductor manufacturing presents several complex challenges. Hysteresis, for example, is a phenomenon where the output of a process varies not solely because of current input conditions, but also based on the history of the states through which the process already has passed. This memory effect can significantly complicate precision control, as materials and equipment might not reset to a baseline state after each operational cycle. Consequently, adjustments that were effective in previous cycles may not yield the same outcomes due to accumulated discrepancies.

One common cause of hysteresis is thermal cycling, where repeated heating and cooling create mechanical stresses. Those stresses can be additive, releasing inconsistently based on temperature history.  That, in turn, can lead to permanent changes in the output of a circuit, such as a voltage reference, which affects its precision and stability.

In many field-effect transistors (FETs), hysteresis also can occur due to charge trapping. This happens when charges are captured in ‘trap states’ within the semiconductor material or at the interface with another material, such as an oxide layer. The trapped charges then can modulate the threshold voltage of the device over time and under different electrical biases, potentially leading to operational instability and variability in device performance.

Human factors also play a critical role in process drift, with errors stemming from incorrect settings adjustments, mishandling of materials, misinterpretation of operational data, or delayed responses to process anomalies. Such errors, though often minor, can lead to substantial variations in manufacturing processes, impacting the consistency and reliability of semiconductor devices.

“Once in production, the biggest source of variability is human error or inconsistency during maintenance,” said Russell Dover, general manager of the Equipment Intelligence product line at Lam Research. “Wet clean optimization (WCO) and machine learning through equipment intelligence solutions can help address this.”

The integration of new equipment into existing production lines introduces additional complexities. New machinery often features increased speed, throughput, and tighter tolerances, but it must be integrated thoughtfully to maintain the stringent specifications required by existing product lines. This is primarily because the specifications and performance metrics of legacy chips have been long established and are deeply integrated into various applications with pre-existing datasheets.

“From an equipment supplier perspective, we focus on tool matching,” said Dover. “That includes manufacturing and installing tools to be identical within specification, ensuring they are set up and running identically — and then bringing to bear systems, tooling, software and domain knowledge to ensure they are maintained and remain as identical as possible.”

The inherent variability of new equipment, even those with advanced capabilities, requires careful calibration and standardization.

“Some equipment, like transmission electron microscopes, are incredibly powerful,” said Jian-Min Zuo, a materials science and engineering professor at the University of Illinois’ Grainger College of Engineering. “But they are also very finicky, depending on how you tune the machine. How you set it up under specific conditions may vary slightly every time. So there are a number of things that can be done when you try to standardize those procedures, and also standardize the equipment. One example is to generate a curate, like a certain type of test case, where you can collect data from different settings and make sure you’re taking into account the variability in the instruments.”

Process drift solutions
As semiconductor manufacturers grapple with the complexities of process drift, a diverse array of strategies and tools has emerged to address the problem. Advanced process control (APC) systems equipped with real-time monitoring capabilities can extract patterns and predictive insights from massive data sets gathered from various sensors throughout the manufacturing process.

By understanding the relationships between different process variables, APC can predict potential deviations before they result in defects. This predictive capability enables the system to make autonomous adjustments to process parameters in real-time, ensuring that each process step remains within the defined control limits. Essentially, APC acts as a dynamic feedback mechanism that continuously fine-tunes the production process.

Fig. 1: Reduced process drift with AI/ML advanced process control. Source: Tignis

Fig. 1: Reduced process drift with AI/ML advanced process control. Source: Tignis

While APC proactively manages and optimizes the process to prevent deviations, fault detection and classification (FDC) reacts to deviations by detecting and classifying any faults that still occur.

FDC data serves as an advanced early-warning system. This system monitors the myriad parameters and signals during the chip fabrication process, rapidly detecting any variances that could indicate a malfunction or defect in the production line. The classification component of FDC is particularly crucial, as it does more than just flag potential issues. It categorizes each detected fault based on its characteristics and probable causes, vastly simplifying the trouble-shooting process. This allows engineers to swiftly pinpoint the type of intervention needed, whether it’s recalibrating instruments, altering processing recipes, or conducting maintenance repairs.

Statistical process control (SPC) is primarily focused on monitoring and controlling process variations using statistical methods to ensure the process operates efficiently and produces output that meets quality standards. SPC involves plotting data in real-time against control limits on control charts, which are statistically determined to represent the expected normal process behavior. When process measurements stray outside these control limits, it signals that the process may be out of control due to special causes of variation, requiring investigation and correction. SPC is inherently proactive and preventive, aiming to detect potential problems before they result in product defects.

“Statistical process control (SPC) has been a fundamental methodology for the semiconductor industry almost from its very foundation, as there are two core factors supporting the need,” said Dover. “The first is the need for consistent quality, meaning every product needs to be as near identical as possible, and second, the very high manufacturing volume of chips produced creates an excellent workspace for statistical techniques.”

While SPC, FDC, and APC might seem to serve different purposes, they are deeply interconnected. SPC provides the baseline by monitoring process stability and quality over time, setting the stage for effective process control. FDC complements SPC by providing the tools to quickly detect and address anomalies and faults that occur despite the preventive measures put in place by SPC. APC takes insights from both SPC and FDC to adjust process parameters proactively, not just to correct deviations but also to optimize process performance continually.

Despite their benefits, integrating SPC, FDC and APC systems into existing semiconductor manufacturing environments can pose challenges. These systems require extensive configuration and tuning to adapt to specific manufacturing conditions and to interface effectively with other process control systems. Additionally, the success of these systems depends on the quality and granularity of the data they receive, necessitating high-fidelity sensors and a robust data management infrastructure.

“For SPC to be effective you need tight control limits,” adds Dover. “A common trap in the world of SPC is to keep adding control charts (by adding new signals or statistics) during a process ramp, or maybe inheriting old practices from prior nodes without validating their relevance. The result can be millions of control charts running in parallel. It is not a stretch to state that if you are managing a million control charts you are not really controlling much, as it is humanly impossible to synthesize and react to a million control charts on a daily basis.”

This is where AI/ML becomes invaluable, because it can monitor the performance and sustainability of the new equipment more efficiently than traditional methods. By analyzing data from the new machinery, AI/ML can confirm observations, such as reduced accumulation, allowing for adjustments to preventive maintenance schedules that differ from older equipment. This capability not only helps in maintaining the new equipment more effectively but also in optimizing the manufacturing process to take full advantage of the technological upgrades.

AI/ML also facilitate a smoother transition when integrating new equipment, particularly in scenarios involving ‘copy exact’ processes where the goal is to replicate production conditions across different equipment setups. AI and ML can analyze the specific outputs and performance variations of the new equipment compared to the established systems, reducing the time and effort required to achieve optimal settings while ensuring that the new machinery enhances production without compromising the quality and reliability of the legacy chips being produced.

Being more proactive in identifying drift and adjusting parameters in real-time is a necessity. With a very accurate model of the process, you can tune your recipe to minimize that variability and improve both quality and yield.

“The ability to quickly visualize a month’s worth of data in seconds, and be able to look at windows of time, is a huge cost savings because it’s a lot more involved to get data for the technicians or their process engineers to try and figure out what’s wrong,” said Park. “AI/ML has a twofold effect, where you have fewer false alarms, and just fewer alarms in general. So you’re not wasting time looking at things that you shouldn’t have to look at in the first place. But when you do find issues, AI/ML can help you get to the root cause in the diagnostics associated with that much more quickly.”

When there is a real alert, AI/ML offers the ability to correlate multiple parameters and inputs that are driving that alert.

“Traditional process control systems monitor each parameter separately or perform multivariate analysis for key parameters that require significant effort from fab engineers,” adds Jain. “With the amount of fab data scaling exponentially, it is becoming humanly impossible to extract all the actionable insights from the data. Machine learning and artificial intelligence can handle big data generated within a fab to provide effective process control with minimal oversight.”

AI/ML also can look for more other ways of predicting when the drift is going to take your process out of specification. Those correlations can be bivariate and multivariate, as well as univariate. And a machine learning engine that is able to sift through tremendous amounts of data and a larger number of variables than most humans also can turn up some interesting correlations.

“Another benefit of AI/ML is troubleshooting when something does trigger an alarm or alert,” adds Park. “You’ve got SPC and FDC that people are using, and a lot of them have false positives, or false alerts. In some cases, it’s as high as 40% of the alerts that you get are not relevant for what you’re doing. This is where AI/ML becomes vital. It’s never going to take false alerts to zero, but it can significantly reduce the amount of false alerts that you have.”

Engaging with these modern drift solutions, such as AI/ML-based systems, is not mere adherence to industry trends but an essential step towards sustainable semiconductor production. Going beyond the mere mitigation of process drift, these technologies empower manufacturers to optimize operations and maintain the consistency of critical dimensions, allowed by the intelligent analysis of extensive data and automation of complex control processes.

Monitoring process drift is essential for maintaining quality of the device being manufactured, but it also can ensure that the entire fabrication lifecycle operates at peak efficiency. Detecting and managing process drift is a significant challenge in volume production because these variables can be subtle and may compound over time. This makes identifying the root cause of any drift difficult, particularly when measurements are only taken at the end of the production process.

Combating these challenges requires a vigilant approach to process control, regular equipment servicing, and the implementation of AI/ML algorithms that can assist in predicting and correcting for drift. In addition, fostering a culture of continuous improvement and technological adaptation is crucial. Manufacturers must embrace a mindset that prioritizes not only reactive measures, but also proactive strategies to anticipate and mitigate process drift before it affects the production line. This includes training personnel to handle new technologies effectively and to understand the dynamics of process control deeply. Such education enables staff to better recognize early signs of drift and respond swiftly and accurately.

Moreover, the integration of comprehensive data analytics platforms can revolutionize how fabs monitor and analyze the vast amounts of data they generate. These platforms can aggregate data from multiple sources, providing a holistic view of the manufacturing process that is not possible with isolated measurements. With these insights, engineers can refine their process models, enhance predictive maintenance schedules, and optimize the entire production flow to reduce waste and improve yields.

Related Reading
Tackling Variability With AI-Based Process Control
How AI in advanced process control reduces equipment variability and corrects for process drift.

Leave a Reply

(Note: This name will be displayed publicly)