The Drive Toward More Predictive Maintenance

Using data for just-in-time maintenance for factories and ICs.


Maintenance is a critical behind-the-scenes activity that keeps manufacturing facilities running and data centers humming. But when not performed in a timely manner, it can result in damaged products or equipment, or significant system/equipment downtime.

By shifting from scheduled maintenance to predictive maintenance, factories and electronic system owners can reap substantial benefits, including reduced total cost of ownership, improved factory performance and increased yield and quality.

Today, equipment sensors and IC monitors, known also as telemetry circuits, produce a wealth of valuable data that engineers are leveraging to proactively perform equipment/system maintenance, and to signal impending component failure. Compared to 10 years ago, at least an order of magnitude more data is generated. The challenge is adding the engineering resources needed for data analytics and new maintenance procedures, and in justifying this cost of implementation.

Scheduled maintenance is typically performed based on usage, which can be measured by the number of wafers processed or the number of months equipment was used. It can entail everything from replacing a component, to changing a fluid, or cleaning equipment or a component in the tool or system. This is predicated on the assumption that all components have identical reliability profiles and experience equivalent usage, also known as the mission profile.

But component reliability varies, and so do in-tool usage conditions. That has spurred interest in just-in-time maintenance, which can provide fundamental benefits such as earlier detection of failures, increased component lifetime, and better operational control.

As evidenced by recent conference presentations at multiple industry conferences, semiconductor manufacturers continue to pursue just-in-time maintenance schedules. The trend is to move predictive maintenance into the sub-fab and test operations, and to adopt machine learning (ML) algorithms to manage complex equipment or process scenarios.

“Most modern manufacturing semiconductor factories purchase OEM equipment with built-in sensors designed to monitor the manufacturing process to ensure defect free parts and ensure certification and compliance to the ISO standards (ISO-9001, ISO-9002, and related TS standards), ensuring quality and reliability in the manufacturing of devices,” said Jon Holt, volume manufacturing solutions worldwide fab applications solutions manager at PDF Solutions. “With appropriate data collection, analysis, and the application of monitoring and control, this information can be used to identify early wear-out of components and detect and predict when maintenance should be performed on the equipment.”

This type of maintenance has not been the practice for semiconductors and electronics. But a case can be made for complex SoCs, especially for those found in data centers and automobiles. With the now-prevalent use of circuits for internal telemetry measurements, many engineers foresee using these internal measurements for predictive maintenance.

“Today, we collect parametric data from the IC. By measuring physical parameters and using knowledge of how changes in those parameters affect reliability, IC vendors can set normal operating limits,” said Richard Oxland, product manager for Tessent embedded analytics at Siemens EDA. “And chips that are found to be operating outside those limits can self-report as non-functional or requiring repair.”

Predictive maintenance benefits
Industry 4.0 and smart manufacturing guidelines promote the use data generated during manufacturing to improve factory performance, increase production agility, and reduce costs. Predictive maintenance in a semiconductor facility (fab, assembly or test) contributes to all three areas. Industry experts cited the following specific benefits:

  • Increased overall equipment efficiency (OEE);
  • Improved yield and quality;
  • Reduced operating costs, and
  • Higher return on capital investment.

“The goal of predictive maintenance is to predict events and to minimize their negative impact on the target equipment’s performance and availability at an early stage, allowing corrective action to be taken before an event happens,” said Don Ong, head of innovation at Advantest. “Hence, it targets OEE (overall equipment effectiveness). This translates into tangible returns for the customer.”

It also represents a significant shift. “Historically, the monitoring and control of the process and equipment have been the responsibility of either process or equipment engineers,” noted Holt. “However, with Industry 4.0 and the ability to apply advanced models like AI/ML across the factory floor, many companies have formed ‘new’ factory automation teams to deploy solutions that link data across the factory and supply chain.”

For IC lifecycle management, engineers are in the exploratory phase of applying data for predictive maintenance use cases. For differing reasons, data center owners and automakers are the most eager to shift from scheduled to predictive maintenance.

“I’d highlight two industry sectors. The first is high-performance, hyperscale computing,” said Siemens’ Oxland. “They’re very interested because of the potential savings they can realize in both capital and operating expenses. Even very small percentage improvements yield substantial results. And here, I’m not only talking in terms of the cost of maintenance itself. Predictive maintenance also makes it easier to ‘right-size’ both the system and the chip design itself, and to identify situations where the system becomes less performant, with an impact on key operating parameters such as power consumption. The second, as you might expect, is the automotive industry. They are facing a perfect storm of financial pressure, changes in technology, changes in business models, and changes in legal and regulatory requirements. Obviously, the dollar cost of a vehicle recall is astronomical. But so is the potential reputational cost. Preventive maintenance can help with both of those issues.”

Test facilities typically have relied upon scheduled maintenance for cleaning probe card needles and sockets/contactors. Debris on wafer probe needles and test sockets/contactors impact product yield and quality, as well as test cell uptime.

“Often the question is asked ‘how often should I clean?’ The best answer is to clean only when necessary to maintain contact reliability and high yield,” said Jerry Broz, president and director of technology at Advanced Probing Systems. “The primary wear-out mechanism of wafer probe needles is attributable to the cleaning process, which can result in as much as ~95% of the probe life. In most cases, cleaning is simply procedural, using a fixed recipe and not dynamically optimized. A cleaning recipe will be applied based on past experiences, or will be implemented to address the requirements of a device with the highest sensitivity to contact. Too little cleaning results in reduced yields and increases wafer test process instability requiring frequent operator intervention. Too much cleaning reduces throughput and increases production costs without providing additional yield benefits. Ideally, cleaning execution should be triggered based on some metric, such as repeated opens, bin-out failures, yield drop, etc.”

Predictive maintenance in action
Equipment vendors base scheduled maintenance either upon the number of hours of operation, or frequency metrics such as number of wafer boats, wafers for fab, or the number of lots/units for assembly or number of touchdowns for wafer probe cards and insertions for load board sockets. Switching to an adaptive maintenance strategy requires determining which data would be a good predictor. From there, engineers need to select the type of algorithm on which to base the prediction, pilot the prediction and modify as needed, implement in the production environment, and sustain the predictive indicator in case it shifts.

Data from sensors within the equipment and IC, sensors outside the equipment/system, and product test data all can provide inputs to this development flow.

“We provide sensors that help with maintenance by using data to anticipate the need to replace components in semiconductor tools. For example, we expect some vibration as the robot arm moves through the tool, and we baseline this vibration as a standard with our AVLS3 or AMS sensors,” said Vidya Vijay, senior program manager at CyberOptics. “Process engineers use the trend analysis available while studying the vibration over a period of time, which aids in proper maintenance of the tools. This allows customers to intelligently decide how much life is remaining in that specific component.”

IC lifecycle management solutions using circuit monitors for maintenance is still in the early phase. Both the variety of internal measurements and the increased spatial density promises a rich data set including:

• Physical parameters, such as temperature;
• Circuit performance, such as path delays;
• Internal test data, like memory BIST results, and
• Functional performance, such as bus transactions.

“Some of the use cases are around predictive maintenance,” said Randy Fish, director of silicon lifecycle management at Synopsys. “We’re providing environmental monitors — process, voltage, temperature (PVT) — and now structural monitors, such as path margin monitors, which measure the margin of setup time on functional paths. We see that one of the use cases for path margin data, in the context of thermal and voltage data, is around predictive maintenance. We do have customers that are developing solutions around that use case.”

For a variety of tools in manufacturing facilities, scheduled maintenance already has moved to predictive maintenance. The trend is to move more equipment, while also investigating more complicated algorithms. Today, most engineers use simple rules based upon one or two parameters.

“Data is analyzed both comparatively to an average and to an expected value. It is also looked at in comparison to set and dynamic limits,” said Mike McIntyre, director of software product management at Onto Innovation. “Activities can be highly automated once characterized. The challenge here is to spend the needed time to characterize signals lest you react without complete knowledge.”

Simple algorithms work most of the time, but in some situations advanced algorithms work better.

“It has been my experience that simple rule-based predictive maintenance has been adopted by most 300mm fabs (80% to 90%),” noted Holt. “However, advanced predictive maintenance practices that utilize advanced multi-variate models and algorithms have only been applied on a limited number of tools, and then only at leading-edge foundries and IDMs.”

Once a parameter crosses a limit, the signal for a maintenance activity is sent.

“Predictive maintenance can be tied into MES systems, and is then very automated. The equipment can be taken down automatically,” said Holt. “The actual performing of the maintenance activity in some cases is fully automated (like chamber plasma cleans), but is more often reliant on equipment technicians to perform the maintenance (another source of variability).”

Shifting from scheduled to predictive maintenance in test facilities is in the early stages. ATE, wafer probers, and unit handlers have long mean time between failure. So the focus has been on product specific test collaterals — probe cards and load boards.

“In the real world, I see several scenarios that would benefit — for instance, sockets, device interface boards, probe needles, or probe cards. Today, there are solutions for preventive maintenance where we monitor electrical parametric results from test readouts. We can use this data at minimal cost,” noted Daniel Mu, value-added solutions manager at Teradyne. “We do not need complex calculations to estimate aging. We can look for trend changes that raise alarms, and even perform corrective actions, such as cleaning.”

Overcoming reluctance to implement
As with any change in a process, activation energy is needed to overcome a reluctance to do something new. To wafer fab engineers, this means using more complicated algorithms. For test facilities, it’s trying it out. For owners of systems with complex SoCs, it’s on the plate of possibilities.

Common hurdles to adopting a just-in-time maintenance approaches include lack of standards, convincing management personnel of the need to change, and the always prevalent data silos between engineering functions.

So what does it take to get more to go to predictive main applications? Economics truly drives decision-making.

“Most advanced fabs have expressed interest in predictive maintenance,” said Anjaneya Thakar, director of product marketing at Synopsys. “Though the adoption currently is low due to unavailability of compelling solutions, it is on the increase. With the availability of new technology in analytics (ML, compute performance), predictive maintenance solutions will get better — faster, more accurate — and this will drive wider adoption.”

“In reality, there are very few activities in the predictive maintenance area right now but there are a lot of people talking about it. In the factory, we see that either it’s not a high priority, or it’s still viewed as a science experiment,” said Teradyne’s Mu. “We need to show a real benefit over scheduled maintenance.”

Advantest’s Ong concurs. “In my opinion, the biggest obstacle to adopting predictive maintenance is whether the ROI is significant enough to make such a change.”

A second barrier involves data access. “In today’s world of Industry 4.0, not all fabs are truly smart in terms of maintenance. Sometimes, there is fear in using third-party sensors in OEM tools, and the approval process can be really long,” said CyberOptics’ Vijay. “We make it easy for the fabs by providing self-contained solutions with our wireless WaferSense sensors.”

For IC telemetry, the technology is all new and the emerging SLM applications are still in development.

“There is still a bunch of learning to be done. In general, semiconductor suppliers or EDA companies haven’t had access to all this data. And suddenly, now we have the monitor IPs, the software to insert these monitors, and the mechanisms to gather the data from the field,” notes Fish. “So there’s an immense amount of learning that we can do now. Some of it is really basic. For instance, we view mission profile as low-hanging fruit.”

Across the whole semiconductor supply chain – from fab to field – engineering teams are driving toward a just-in-time maintenance strategy. Wafer fabs lead the way and will continue to seek more sophisticated algorithms as needed. With lower profit margins, test and assembly factories look to these predictive methods for maintenance to reduce costs while improving yield and quality. For owners of large and complex systems with multiple SoCs, the possibility of predictive maintenance is within their reach due to the extensive use of IC telemetry circuits.

Still, the shift from scheduled to dynamic to predictive maintenance requires a demonstrable gain to support the required engineering effort. With the continued cost, yield, and quality pressures, one can expect to see more companies moving to just-in-time.

Related stories
IoT And Predictive Maintenance
Users are realizing ROI through predictive maintenance with new technology.

Lots of Data But Uncertainty About What To Do With It
Sensors are being added everywhere to monitor everything from aging effects to PVT, yet the industry is struggling to figure out the best ways to extract useful information.

Much Smarter Manufacturing
How AI, ubiquitous connectivity, and sensors everywhere are reshaping manufacturing of chips, and nearly everything else.

Changes In Smart Manufacturing
The impact of more data and AI on overall efficiency and ROI.

Leave a Reply

(Note: This name will be displayed publicly)