Using Predictive Maintenance To Boost IC Manufacturing Efficiency

Smarter tools enable smarter fabs but the logistics of predictive maintenance challenge implementation.


Predicting exactly how and when a process tool is going to fail is a complex task, but it’s getting a tad easier with the rollout of smart sensors, standard interfaces, and advanced data analytics.

The potential benefits of predictive maintenance are enormous. Higher tool uptime correlates with greater fab efficiency and lower operating costs, so engineers are pursuing multiple routes to boosting productivity, including making tools smarter while implementing faster recovery procedures when faults do occur.

Semiconductor fab engineers are enjoying a boost in equipment performance via software and technology-enabled maintenance practices that address unscheduled downtime, which is a particularly painful cause of lost revenue. By combining planned maintenance, spare part readiness, and faster tool recovery methods, process engineers can improve overall equipment efficiency (OEE) and increase the time that equipment actually spends making chips.

Predictive maintenance is the identification and resolution of faults in semiconductor processing tools before any abnormal behavior results in an equipment failure. “In-situ metrology and sensor data are important factors in preventive maintenance,” said Russell Dover, general manager for the Service Product Line at Lam Research. “Artificial intelligence on a tool can drive proactive monitoring when parts need to be changed, and then enable the tool to change the parts itself. Fab tools that are self-aware, self-correcting, and self-healing should continue to become more common thanks to these benefits.”

The drive to self-aware and self-healing tools fits into the larger scheme of smart manufacturing and using advanced machine learning algorithms to enable continuous improvement. “When it comes to implementing the vision of smart factories, machine learning automation will have a massive impact on manufacturing in the future,” said Jon Herlocker, CEO of Tignis. “Process engineers will spend less time chasing issues and have more time to implement continuous improvement. Maintenance engineers will have time to do more preventive maintenance.”

Predictive maintenance is about fully comprehending the useful lifetime of all sensors and parts of wafer processing tools, something that can be modeled in digital twins. “Unscheduled tool downtime is one of the biggest issues in the fab. And what happens if they don’t have the replacement part in stock? That’s going to be a huge problem,” said Patrick Pannese, vice president of Strategy and Business Development at PDF Solutions. “The beautiful thing about digital twin technology, for instance, is five days before a part goes down, the program can show you whether you have that part in stock. So there are a lot of advantages you get with this technology.”

While technologies like digital twins are most often being applied in leading-edge 300mm fabs, the benefit of improved maintenance practices can boost the productivity of any fab, including legacy 150mm and 200mm factories. “Improving equipment reliability can help a fab enhance its tool availability (the share of time that a piece of equipment is ready to process incoming work) by more than 15%. When applied to bottlenecks, about 70% to 80% of this improvement is transformed into the overall equipment effectiveness (an overall measure of a manufacturing operation’s utilization relative to its full potential) of the fab,” stated a recent McKinsey report. [1]

That same study highlighted that every hour of planned tool maintenance typically can save three to four hours of unplanned maintenance.

Tying decision-making to ROI
Actual preventive maintenance schedules of semiconductor tools today are mostly based on qualitative field experience, which can lead to inconsistent results across different machines. For years process engineers have explored predictive mathematical models to capture component lifecycles, but this is challenging because of the wide variation in process conditions inside chambers. These include the operating range in RF power, temperature, pressure, and flow rates in a plasma etching tool, among other things. And the component failures can be caused by different root cause mechanisms, which also can change over time. As a result, there is a certain amount of uncertainty built into any component lifetime estimates.

Better maintenance of equipment also is tied to yield improvements. “There’s definitely a yield advantage to predictive maintenance,” said Dieter Rathei, CEO of DR Yield. “We have use cases where tools are flagged for maintenance, based on monitoring of end-of-the-line test data. So the “predictive” maintenance is not only triggered by analytics of the tool data, or subsequent inline SPC monitoring, but based on yield-related data. In one instance, this has created an estimated savings of more than $500,000 within weeks of implementing the yield data feedback loop.”

ROI can be substantial when even one preventive maintenance (PM) routine a year is eliminated. “Let me give you an idea of how important preventive maintenance is. If you look at an industrial robot operating in a vacuum chamber, that robot is rated for 11 million cycles before it fails, which sounds like a big number,” said PDF’s Pannese. “But if you look at 24/7 operation, that robot does 86,000 moves a day, and that means a failure every 127 days. So they will schedule a PM every 90 days. If you use a conservative number of $3,000 per wafer in an advanced logic fab, and a tool that runs 300 wafers per hour, you’re needing one PM every 7,200 wafers, so that costs around $21 million to take that tool down. But if you have a technology on the tool that can ensure three PMs a year instead of four, that’s what they’re going to pay for.”

Tool makers and fabs alike are anticipating greater adoption of machine learning-based automation to improve fab productivity. “We’re going to see much more of this technology, like digital twins and predictive maintenance, come into use in the next couple of years,” said Pannese.

Today, process engineers typically use scheduled tool maintenance, in which operational engineers prioritize the replacement of tool components and consumables on a regular basis. This may be done after several hundreds of wafers are processed, for example. And while this is effective from a prevention standpoint, it also can lead to longer overall tool downtimes if the frequency of maintenance is too high. The long-term goal is a balance between not-enough tool maintenance, which leads to unplanned tool crashes, and too-frequent preventive maintenance, which is more disruptive and costly. Most often, fabs are fine-tuning their frequency of preventive maintenance procedures on an ongoing basis.

In addition to productivity concerns, there are additional benefits to reducing the frequency of tool maintenance and kit replacements, including reduced environmental impact, lower energy use, and less production of waste products.

Leading equipment makers also are building greater intelligence into the tools. “Our latest wafer fabrication equipment is powered by Lam’s Equipment Intelligence solutions. For example, our Sense.i etch platform features automated detection, maintenance, and calibration systems that utilize machine learning algorithms for automated fault detection,” said Lam’s Dover. In a world where every fraction of a micron displacement can result in high yield losses, Dover emphasizes the importance of precise wafer alignment in single-wafer etch and deposition tools, where an advanced positioning system provides automated, high-precision, wafer-dynamic alignment and calibration.

Dover noted there are more than 2,000 sensors used to track and report on various parameters, such as radio frequency, gas flow, and pressure on the company’s etch and deposition tools. “Advanced analytics, machine learning, and AI interpret the resulting data to minimize unscheduled maintenance and improve productivity,” he said.

Key to understanding how to achieve a high level of control is knowing what to measure and how this metric impacts productivity. “What something like FDC does for you has not changed,” explained Mike McIntyre, director of software product management at Onto Innovation. “But there are many more sensors, so the data volume has grown immensely. When you put a new sensor on a tool, it captures a signal and a timestamp through the internet of things. But the data has no context. Context has to come from the tool or the MES operation that you’re relating the output to. If the sensor is in the exhaust manifolds of a factory, because you’re cleaning the exhaust, you don’t want to over-clean. You’d like to balance your cleaning of the exhaust to the output of the signal that monitors the flow conditions, but you want to be able to do that to start pushing that down to become more predictive in your range. So the biggest change is that real-time data enables a more predictive response.”

The journey to predictive action
A strong attention toward tool uptime first became a prominent issue nearly 30 years ago when multi-chamber tools first came on the scene for 200mm manufacturing tools performing physical and chemical vapor deposition (PVD and CVD), etching, and ashing (photoresist removal). In these systems, it was not uncommon to have overall equipment effectiveness (OEE) in the 30% range.

OEE is the amount of time a tool spends being productive and adding value to the wafer. It is a powerful key performance indicator (KPI) that captures the time the system spent producing valuable semiconductor chips. In the case of 30% OEE, the remaining 70% was spent on queuing (waiting to process lots of wafers), tool qualification time, and both scheduled and unscheduled maintenance. Lean six sigma productivity programs put a keen focus on defining, measuring, analyzing, improving, and controlling tool performance to boost OEE.

From that time on fabs have been introducing productivity improvement programs to optimize OEE, while expanding practices to cell levels (e.g., the lithography cell), fab level and enterprise level productivity.

“Time is money in the semiconductor world and downtime is very expensive,” said Vidya Vijay, senior program manager at Nordson Test & Inspection. “We don’t want to have a tool down at all.”

One of the most common indicators of a problem in process chambers involves particle events. “There are a lot of mechanisms that can cause particles within a tool,” Vijay said. “And when we are talking about detecting particles, and our sensors travel just like the wafers do, it is detecting particles as small as 0.1 micron up to 5 micron. So we can tie a particle event to its timestamp during transport of this wafer over 15 minutes or as long as two hours.”

Fig. 1: A particle sensor can quickly identify the cause of particle event. In this case, a two-chamber tool spent about 10 days at 50% output due to particle spikes on one of the chambers. The problem was solved in about 10 minutes with a particle sensor. Source: Nordson Test & Measurement

During tool maintenance processes, or in preparation for bringing a tool back online, wireless sensors in a wafer-like form are used to provide x, y, z leveling of the wafer, wafer-level resistance measurements for electroplating, or measurement of vibration or airborne particles.

The variety of process tools on which preventive maintenance is performed is expanding, as well. Companies have introduced new process tools for fabricating power devices that incorporate silicon carbide and gallium nitride films. Another hot market is silicon photonics.

“We see a lot of traction with all the optical waveguides where we apply ellipsometry and by reflectometry, we can measure the refractive index at the precision required, which is critical for silicon photonics applications,” said Samuel Lesko, general manager, TSOM Business Unit at Bruker.

The far-reaching impact of tool downtime
The maintenance ratio — scheduled downtime versus unscheduled downtime for a piece of equipment — is a useful metric for benchmarking maintenance efficiency. [1] During scheduled downtime, engineers resolve problems quickly using short-loop root cause analysis, implement ever-improving maintenance procedures and closely track spare part availability.

The form of in-situ measurement is often dictated by the wafer process itself. “The plating companies have been able to show that a typical process monitor step involves a test wafer or a set of test wafers, and then doing the metrology on the test wafers to measure the thickness uniformity. But there’s money involved in the test wafers and metrology,” explained Tim Skunes, vice president of Technology & Business Development at Nordson Test & Measurement. “They discovered that during the plating process, the contact fingers that the electrical current flows through will corrode over a period, and the corrosion causes a rise in resistance as uniformity gets worse. An auto-resistivity sensor is used to predict when that uniformity starts to degrade.”

The same is true for other types of sensors. “We can also look at a leveling sensor, which is going to show the end effector or chuck that is holding the wafer at the final position,” said Nordson’s Vijay. “This can definitely help in identifying early signs of mechanical failures or impending failures that could occur once the tool is brought online. And once the engineer has three or four cycles of vibration data, analysis can tell them what the schedule for preventive maintenance should be.”

Tying ROI to maintenance
The key is to identify patterns and outliers amid a mass of data, and this is where AI/ML starts fitting into the picture.

“We see a big shift in the industry right now with AI and ML toward connectivity and connected systems,” said PDF’s Pannese. “The next generation of process tools will incorporate connectivity products to meet standard protocols like GEM 300, EDA, etc. We have a hierarchical database where you’ll be able to connect many different types of systems in one product, which really helps with time to market because the developers have one standard database to connect to that meets multiple protocols that we see coming up in the future.”

Fig. 2: Tool fingerprinting captures the nuanced behavior of specific components such as a wafer handling robot. Source: PDF Solutions

Tight chamber matching — ensuring that the same film deposited in different chambers meets the same specifications for thickness, film stress, etc. — always has been a critical goal. But with each new technology node, specifications get tighter.

“Let’s take a robot that is picking and placing a wafer with a one-micron resolution, and it’s hitting the spec,” said Pannese. “But if you look at the three-dimensional trajectory of that robot over time, picking and placing that wafer, every robot would have a slightly different digital fingerprint. Once that starts to deviate, you can flag it so the engineer can see if maybe there’s a torque issue or a belt is loose before the component fails.”

Finally, there is a new kind of predictive maintenance method being implemented on semiconductor chips in the field, especially for mission critical applications like automotive and data centers. In data centers, power consumption has become an enormous problem. [2] On-die monitors that track power use with various workloads, such as those offered by proteanTecs and others, can proactively identify problems and make operational adjustments to the SoC to prevent failures in the field, which also could include chips used in equipment to manufacture semiconductors.

More sensors, greater levels of control, and software analytics are enabling more preventive maintenance on wafer processing tools. The industry has come a long way from the days when overall equipment effectiveness was in the 30% range, with current levels for a state-of-the-art fab above 80%.

Predictive maintenance comes down to identifying real problems in any component or sensor before it fails. Engineers will continue to develop more advanced models to account for more types of failure modes in an ever-increasing number of manufacturing scenarios.


  2. E. Sperling, The Rising Price Of Power In Chips, Semiconductor Engineering, March 14, 2024,

Related Stories

Getting Smarter About Tool Maintenance

Ramping Up IC Predictive Maintenance

The Drive Toward More Predictive Maintenance

Leave a Reply

(Note: This name will be displayed publicly)