Getting Smarter About Tool Maintenance

Lowering costs and increasing yield using advanced analytics for critical processes, both at leading-edge and mature nodes.


Chipmakers have begun to shift to predictive maintenance for process tools, but the hefty investment in analytics and engineering efforts means it will take some time for smart maintenance to become a widespread practice.

Semiconductor manufacturers need to maintain a diverse set of equipment to process the flow of wafers, dies, packaged parts, and boards running through factories. OSAT and fab factory equipment values range from ~$1.5 million (OSAT- wirebond facility) to $15 billion (300 mm wafers for 5nm devices). Any anomaly in tool performance can lower product yield and quality, while also affecting factory performance. Timely equipment maintenance can make a substantial difference in these metrics.

Over the decades, semiconductor wafer fabs and assembly factories have shifted from waiting until equipment breaks down to scheduling maintenance, where components are replaced or cleaned based on the number of wafers processed or according to a fixed time interval. In predictive maintenance, an engineer uses selected equipment data in a model When the model’s output threshold is reached, the component is replaced. Scheduled maintenance (a.k.a., preventive maintenance) is similar to bringing your car in for an oil change based upon time or number of miles driven.

Predictive maintenance is underway on selected equipment. Data engineering teams use predictive models to schedule a maintenance activity. But it’s not being implemented on all equipment at the same time because of the financial and engineering effort required. Consequently, it’s being prioritized for process steps that will benefit most from factory and device productivity.

As an engineering concept, predictive maintenance has been around for close to a hundred years. In “The Innovation Delusion,” authors Lee Vinsel and Andrew Russell wrote, “The roots of predictive maintenance lie in T. C. Rathbone’s 1939 paper ‘Vibrational Tolerance,’ in which he asserted that machines vibrated more as their conditions deteriorated. If engineers and managers could measure vibrations, he reasoned, they’d be more likely to spot problems before machines broke and halted production.”[1]

With a predictive maintenance strategy, the subsequent economic impact can be significant. As highlighted in a U.S. Department of Energy report, research shows that predictive maintenance could result in as much as a 30% to 40% savings in operating costs for businesses that implement it. It’s not simply the cost of the maintenance, though. This approach benefits factory operations, product yield, and quality. In fact, the DOE study cites a potential 20% to 25% increase in production levels, and ROI of 10 months, for predictive maintenance implementation. That, in turn, helps reduce costs and increase profit margins.

Semiconductor manufacturing equipment presents a diverse set of parameters. Consider vibration, sound, visual, pressure, as well as current, voltage and power. There is no shortage of available data engineers can use to build maintenance predictive models.

“A key component of Lam’s strategy is turning unscheduled maintenance into scheduled maintenance,” said Wojtek Osowiecki, product marketing engineer in Lam Research’s etch product group. “Thanks to our sensor-driven analytics solution, we are improving our ability to minimize downtime by combining required service, forecasting parts performance, and even adjusting it on-the-fly when needed.”

It’s not just the most expensive advanced node processes that benefit from preventive maintenance. IC makers using mature semiconductor/assembly processes can benefit, as well. This is particularly true in the automotive sector, where the quality target has moved from 10 DPPM to 10 DPPB. Early detection of anomalous equipment performance can reduce process variability and defectivity rates, boosting yield and quality while reducing cost.

Economic considerations
Moving to a smarter maintenance strategy requires an investment in IT infrastructure to connect data to an analytics platform, and to automate the maintenance business processes. Such investment needs to be financially justified.

Connecting an equipment or equipment component’s impending failure to a cost-benefit analysis requires an understanding of the impact on product yield and quality, as well as impact on the overall factory performance metrics, of which industrial engineers are aware. These two factors drive all Smart Manufacturing/Industry 4.0 activities, with a focus on asset utilization.

In his 2022 presentation on Amkor Technology’s I4.0 efforts, Elton He, vice president of customer satisfaction and operations planning, highlighted six key performance indicators (KPIs):

  1. Quality, measured in yield and customer feedback;
  2. Productivity, measured in employee efficiency;
  3. Manufacturing cycle time;
  4. Speed and quality of decision making, notably time saved in engineering data analysis and in-line decision making;
  5. Asset utilization for productive use, and
  6. Cost, which enables competitive pricing. [3]

Of these KPIs, timely equipment maintenance impacts quality, cycle time, and asset utilization. This is true for wafer fabs, assembly factories, and test facilities.

If a tool requires maintenance sooner than its scheduled maintenance date, it can adversely impact the yield and/or quality level of shipped product. If a tool’s health is such that a factory can postpone maintenance, it boosts asset utilization and shortens cycle time. Equipment maintenance includes equipment components, but also consumables such as seals, photoresist filters, probe needles, and load boards.

Industrial engineers measure a number of equipment-based metrics for factory operations. Unplanned maintenance events can adversely impact equipment uptime, which is the time a system is not offline; overall equipment availability, which impacts time for processing a product; and overall equipment effectiveness, which impacts time spend producing a good product.

Overall equipment effectiveness (OEE) measures the percentage of time a tool spends producing semiconductor wafers or packaged devices. “Wafer fab OEE is typically based on the equation below, and ranges from ~65% to ~77%,” said Lam Research’s Russell Dover, senior director of service product marketing. These ranges are general to a fab, not a specific benchmark related to Lam tools.”

Overall equipment efficiency (OEE) measures the percentage of time a tool spends producing semiconductor wafers or packaged devices.

The OEE for metrology tools is generally lower than the OEE of process tools, because factories prefer to limit any material queues at metrology stations. One industry source estimated the OEE of metrology tools is in the sub-75% range. On the other end of the spectrum is a factory’s constraint tool set, in which the OEE is ideally >95%. Process tools likely have OEEs between 75% and 90%, depending on operational criticality, incoming queue times, and tool redundancy.

From conversations with industry sources, OSAT OEEs of 60% are typically for assembly and test. This is partly due to the fact that the ebb and flow of die and packaged units is more variable than in a fab. In addition, this variability results from a more frequent changeover for workstation set-up than is seen in fabs.

The factory OEE will be greatly influenced by the tools, which represent bottlenecks in terms of throughput (number of wafers, die, units per hour), uptime, and critically. OEE can be measured across the whole factory, whether equipment-specific to a processing step or a single piece of equipment. Such a hierarchy in measurements assists factory managers in identifying trouble areas that most benefit from improvements. Proactive maintenance is one of those improvements.

Fig. 1: Using big data analytics for OEE and yield. Source: Amkor

Fig. 1: Using big data analytics for OEE and yield. Source: Amkor

Amkor’s He highlighted that big data analytics can assist with understanding process and tool performance using the following approaches:

  • SPC, APC, FDC — real-time monitoring, analysis, and control for select equipment
  • Big data with advanced analytics for OEE — using Spotfire, SAS, Big Query ML
  • Predictive maintenance capability for major bottleneck equipment

Because of the investment needed for predicted maintenance, a factory team typically focuses efforts to resolve bottleneck tools or systems supporting a critical manufacturing process steps. Engineers can apply a reliability-centered maintenance approach to assist in identifying the equipment that benefits from predictive or condition-based maintenance.

Fig 2: Hierarchy for applying predictive maintenance. Source: U.S. Department of Energy [2]

Fig 2: Hierarchy for applying predictive maintenance. Source: U.S. Department of Energy [2]

Cost of lost productivity
So what does this all mean in real dollars? A few numbers have been published or presented recently. One estimate states the cost of unplanned equipment downtime can be as high as $100k per hour. At Semicon West 2022 a presenter from Edwards Vacuum noted that an unplanned vacuum failure will cost a medium-sized fab about $150k.[4] In their 2022 ASMC paper, Edwards Vacuum authors listed the cost variables for an unplanned vacuum service. Among them:

  • Repair cost — always more parts need to be replaced than planned;
  • Wafer cost — $5,000-plus per wafer at advanced nodes;
  • Tool requalification — time and extra metrology costs, and
  • Unplanned downtime — longer to recover from than planned.

Other benefits exist in terms of quality and overall fab capacity, with the identification of misprocessed product caused by tools that have degraded.

“Predictive algorithms can also flag potential misprocessed product, which provides an opportunity to shift left, removing defective product from a capacity constrained manufacturing process,” said Wes Smith CEO of Galaxy Semiconductor. “There is also the potential to identify defective product that might not otherwise be contained during the downstream QA processes, resulting in a field return, the cost of which easily runs into the millions.”

Multivariate algorithms
Historically engineers have relied upon statistical process control (SPC) charts to manage process and tool performance. On every tool, there are multiple metrics to monitor, and if any one of them appears to be out of control, the cause is investigated and a remedy is applied. The maintenance procedure is one such remedy. For decades this approach has served tool owners in wafer, assembly, and test facilities. But SPC charts are no longer sufficient enough. Modern analytic approaches enable engineers to combine different equipment data sources to flag problems earlier.

Today, a factory generates terabytes of data. The quantity of data is driven by increases in factory size, number of mask levels and process steps, variety of device types, and sensor data from equipment and workstations.

Where data is generated in a 300mm fab over the course of a minute. Source: PDF Solutions

Fig. 3: Where data is generated in a 300mm fab over the course of a minute. Source: PDF Solutions

“Factories today are gigantic AI engines,” said Gregg Bartlett, senior vice president of technology, engineering, and quality at GlobalFoundries. “One factory can have at least 1 million sensors. But with all those sensors, it really comes down to the efficiency of analysis. You have to use an intelligent engine to figure out which signals to pay attention to and which ones are less predictive for maintenance. A lot of it started as feed-forward and feedback controls 20 years ago, which is routine today. Now, we look to analytics to address some of the variability and non-uniformities that correlate with changes in device behavior. And we need to learn from these correlations to predict, for instance, when a chamber component might need replacing based on the data analysis.”

A wide variety of sensor data can be used to detect fluctuations — anything from higher current draw on a motor and vibration levels to audio noise and particle measurements.

“Finding particles in chambers and loadlocks, is very important in finding failing parts or dirty chambers in real time,” said Vidya Vijay, senior program manager in the CyberOptics Division at Nordson Test and Inspection. “Our automatic particle sensors enable particle measurements in real-time. And it helps find particles during the pump down and venting process, dirty loadlocks, and to check tube status after and before preventive maintenance for cleanliness. This sensor also compares particles under atmospheric pressure and low pressure, which can aid in slit valve optimization, optimize gate valves by studying particles, and check the cleanliness of the carrier cassette or FOUP. Learning where the particles are coming from in real-time enables process engineers to tune or clean a specific area instead of breaking vacuum or spending days in troubleshooting causes of failure.”

With each advance in assembly and fab processes comes more complicated equipment and its associated sensors. There’s an opportunity to use multiple data sources for models to watch for equipment abnormalities that indicate the need for maintenance.

“Lam’s systems typically have more than 2,000 sensors that track radio frequency, voltage, current, power, gas flow, pressure, temperature, etc.,” said Dover. “A data collection plan (DCP) is defined by the customer based on experience or following Lam’s best-known methods. The DCP defines what is recorded in the tool logs or broadcast to the customer’s statistical process control or fault detection and classification (FDC) systems. FDC is used as a simple form of predictive maintenance, in which a tool will be scheduled for preventive maintenance based on a known fault type. However, FDC is based on univariate rules-based models, and many customers have invested heavily in multivariate ML, beyond FDC, to enable more sophisticated predictive models. Lam Research partners with the customer to both deliver multivariate predictive models and to assist the customer in developing its own predictive models.”

Others concur on this move to multivariate models for predictive maintenance.

“More complex models can characterize the build-up of films on chamber walls based on the plasma gas composition, vacuum levels, plasma, and RF sensors that can predict the likelihood of film delamination that will cause particle shedding and recommend a chamber clean or replacement,” said Jon Holt, volume manufacturing solutions worldwide fab applications solutions manager at PDF Solutions. “These are typically multivariate models that use AI algorithm(s) trained (supervised) over PM lifecycles. So basically, it is a comparative model based on an expected value.”

Predictive models can foresee a failing tool or component weeks in advance of actual failure.

“We have a case study demonstrating how multivariate monitoring of dozens of tool parameters gave an early warning about the failure of a tool component,” said Dieter Rathei, CEO of DR YIELD. “While all the individual parameters remained within their respective control limits, the multivariate signal preceded the component failure by several weeks. This makes a strong case for using innovative, computation-intensive data algorithms to gain insight from already existing data.”

By shifting to smarter equipment maintenance regimens, factory and equipment owners can reap the benefits on all the key performance indicators of quality, yield, cycle time, and equipment utilization. In factories, the increase in data and its usage in multivariate models enable this shift to predictive tool maintenance. As more success stories are shared, the industry can expect increased adoption even as factory managers remain cautious to changes in factory operations.

“There are two key challenges to enabling more predictive maintenance approaches. The first is increased investment in enabling multivariate analytics based on the full tool data logs,” said Lam’s Dover. “The second is an increased risk tolerance to allow such models, which by their nature are imperfect, to be used for tool control. Our industry is leading-edge, but we are very cautious when it comes to adopting new technologies, even when the mathematics and statistics are solid.”


  1. L. Vinsel and A.L. Russell, “The Innovation Delusion: How Our Obsession with the New has Disrupted the Work that Matters Most, Sept. 8, 2020, Currency, a division of Penguin Random House,
  2. Operations & Maintenance Best Practices Guide, Chapter 5, US Department of Energy, 2010,
  3. E. He, Amkor I4.0 Smart Manufacturing Initiatives, Semicon China, Nov. 1, 2022.
  4. E. Collart, A. Longley, D. Gordon, J. Nordquist and P. Matthews, “Predictive Maintenance Practices for Cryogenic Pumps in Semiconductor Manufacturing,” 2022 33rd Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC)

Related stories

The Drive Toward More Predictive Maintenance
Using data for just-in-time maintenance for factories and ICs.

Adopting Predictive Maintenance On Fab Tools
Predictive maintenance cuts equipment downtime while boosting fab efficiency

Using Fab Sensors To Reduce Auto Defects
Fab sensing technology coupled with analytics provides a path to improve reliability of autos.

Leave a Reply

(Note: This name will be displayed publicly)