Test Connections Clean Up With Real-Time Maintenance

How to improve yield and prolong the life of probe cards and load boards.


Test facilities are beginning to implement real-time maintenance, rather than scheduled maintenance, to reduce manufacturing costs and boost product yield.

Adaptive cleaning of probe needles and test sockets can extend equipment lifetimes and reduce yield excursions. The same is true for load board repair, which is moving toward predictive maintenance. But this change is much more complicated than it might appear. It requires determining which parameters correlate directly with the need to clean, and in having sufficient fault coverage in the board diagnostic programs.

Probe cards and load boards provide the electrical interface between the device under test (DUT) and the ATE. Rising IC complexity and an increasing number of board pins contribute to an increasing cost and cost of ownership of these test boards. Timely maintenance extends board lifetime and improves equipment efficiency.

Until recently, maintenance activities were performed at fixed intervals, typically during ATE configuration changes or when a yield excursion indicates a potential board issue. Motivated by a reduction in total cost of ownership and increased yield, the industry is shifting to real-time or adaptive cleaning procedures and general board maintenance. In some cases teams are adding a machine vision capability to help detect issues.

Rising cost per board
Cost is the key driver for all of this. Test boards are becoming more expensive, and an increase in multi-site testing leads to a greater number of probes per wafer contact, or contacts per unit. So while more transistors per SoC increases the number of power pins and I/O pins required, the trends are similar for load board design in package test.

With preventive maintenance, engineers rely on maintenance schedules. Cleaning is typically performed after a set number of contacts, and board maintenance often occurs during a test configuration change or in response to a significant drop in yield.

The problem is that not all parts need to be maintained on the same schedule. Failing to clean some parts often enough can reduce yield, but not everything needs to be cleaned when it is scheduled.

“The key to changing any test process is showing a real economic value for the customer,” said Daniel Mu, manager of production value-added solutions for customers at Teradyne. “The customer will not do it for science. They do it to receive the benefit from the preventive or predictive maintenance. There is a high value in maintenance of consumables in the test cell, load board, needles, and sockets.”

Debris and probe wear
Each scrub of a probe needle on a die pad, and each socket connection with a DUT, creates friction between metal surfaces. Friction generates debris and wears the probe tips, pogo-pins, and sockets. In fact, the abrasive cleaning process contributes up to 95% of needle and socket tip wear. For probe tips, engineers must carefully plan cleaning events to maximize product yield while limiting tip wear. So on one hand, the less maintenance, the better.

But there’s a balance, because multi-site testing and more chip pins increase the contact force, friction, and debris levels. That, in turn can cause mis-testing, lower yield, and occasionally, device damage. Regularly scheduled cleaning of probe card needles and load board sockets is designed to prevent these occurrences. High levels of debris are the most prevalent causes of failures in probe cards and load boards.

“Most common failures are caused by accumulated debris (solder, copper, foreign material) on the probe tips. Other failure modes are probe wear, physical crashes, electrical discharge, or arcing,” said Darren James, technical account manager at Onto Innovation. Physical crashes can result from mishandling or prober/operator error.

As debris accumulates, the contact resistance rises. CRES is a key measure of wafer test effectiveness, and any increase subsequently can reduce yield or cause a faulty measurement of device performance parameters. The sudden appearance of foreign material on a socket can damage packaged units, as well, which is a particularly costly form of yield loss because it involves multiple chips. Cleaning removes the built-up debris on probe tips, contactors, and sockets.

To manage overall health of test interface boards, select control parameters are measured before, during, and after test. Typically, statistical process control (SPC) limits are set, and when those limits are exceeded, an alarm is sent from the test cell to the engineer to begin corrective action.

“These action plans are based on historical learning for product performance degradation characterization, such as material abrasive wear-out extrapolation,” said George Harris, vice president of global test services at Amkor. “These control limits are learned, studied, and characterized by the vendors and the users. They are both proactive, based on vendor product characterization, and reactionary, based on DUT metrics.”

Characterization to produce control limits and scheduled cleaning intervals provides an established process for all boards. But characterized limits are often based upon average or worst-case performance.

“When we established an AI team, we reviewed our customer’s processes looking for areas to apply these kinds of methodologies,” said Don Ong, head of innovation at Advantest. “At wafer probe, most of our customers follow a fixed cycle for cleaning, for example, every 5,000 touch downs. We considered that this is not very effective because the probe tips may not be that dirty, so we started looking into how we could make this adaptive.”

Additionally, characterized limit monitoring does not imply immediately response to an issue. IC suppliers of large SoCs bound for data centers would prefer rapid detection of potential device damage before it is tested. Damaged pins typically cannot be recovered, and the bumps on surface-mount devices may be irreparably damaged.

A dynamic cleaning approach requires engineers to determine the inputs and outputs of an algorithm. Adding a new test cell hardware module to assist with detection represents an even greater investment. As with any change in manufacturing, the roles of effectiveness, efficiency, and economy must be weighed.

Dynamic probe tip cleaning
Changing to a dynamic wafer probe cleaning routine is challenging because engineers need to identify parameters that absolutely indicate a cleaning is in order. Several parameters may correlate with dirty probe tips including contact resistance, DC voltage and current measurements, and yield.

But which parameters provide the best balance of low false negatives and false positives is not always obvious, given the complex interactions between the probing process and product yield variation. Engineers need to validate the process before shifting to a dynamic cleaning regimen in a production environment. The process also needs to be robust with respect to variations in a product’s characteristics.

Contact resistance appears to be an obvious choice, as this parameter indicates a good electrical contact prior to performing a test. Both DC and AC measurements change with a higher contact resistance. Yield also can decrease as debris accumulates at contact sites, in particular when comparing the yield on different test sites on a multi-site board.

“To anticipate optimal needle maintenance, we derive signals that are a combination of different measurements available in the ATE,” said Teradyne’s Mu. “We use parametric test data like contact resistance, voltage, and current to assess the contact quality, which is fairly straightforward. There are cases where the needles’ impact is ambiguous, and thus to reach a conclusion, more data and/or complex algorithms are required. The algorithm used depends on how complex the needle impact correlates with the data, and how fast we need to analyze the data and trigger the corrective actions. The best algorithm is chosen after experimentation and balancing these two factors — accuracy and speed.”

Measured parameters and changes in values or binning trends can indicate an issue.  But the ideal parameter needs to be evaluated and characterized on a product-by-product basis. With more engineering teams turning to machine learning for analysis, engineers are investigating ML approaches for cleaning frequency.

“Our goal in applying AI/ML was to implement a simple solution to do adaptive cleaning with machine learning,” said Advantest’s Ong. “We first looked into using contact resistance measured on each pin, because a pin with accumulated debris results in higher contact resistance. When we tried to implement it with one of our customers, we couldn’t get good results. In fact, the results we observed were actually pretty bad.”

What seemed obvious didn’t track well. Many factors contribute to CRES measurements, including prober overdrive.

“Next, we investigated binning trend, just simple pass/fail,” Ong said. “We adopted the random forest algorithm to look at the past few binning trends, and this approach correlated very well with when the probe card gets dirty. From there onward, we refined the solution.”

Fig. 1: Comparison between Fixed and Adaptive probe card cleaning. Source: Advantest 

Fig. 1: Comparison between Fixed and Adaptive probe card cleaning. Source: Advantest

In both wafer and package testing, engineers use yield trend analysis to trigger for multiple issues. Cleaning can be one of those. An ML approach appears to bring a trigger that is highly correlated to the need for cleaning.

Advantest’s solution relies on comparing the pass/fail results between sites, which can only be done with multi-site probing. The methodology is applied on a wafer lot basis, and the first wafer provides the baseline of the pass/fail ratio for each site.  So unlike determining a fixed schedule cleaning, which requires characterizing a large volume of data over multiple lots, it bases its determination just on the first wafer in the lot. As a result, wafer lot processing variation is accounted for during the initial analysis of the first wafer. This very localized ML solution relies upon a small data set to determine the limits.

Customers that switched to this adaptive cleaning approach were able to substantially reduce the cleaning frequency, which prolonged probe card life by 2X (see figure 2). Advantest showed results for four products and a variety of probe needle types at the 2022 SW Test Symposium.

Fig. 2: Adaptive Probe Cleaning (APC) results for four actual production products.  Reduction ratio is the ratio of decreasing times of on-line cleaning to times of fixed cycle cleaning performed by a prober. Source: Advantest

Fig. 2: Adaptive Probe Cleaning (APC) results for four actual production products.  Reduction ratio is the ratio of decreasing times of on-line cleaning to times of fixed cycle cleaning performed by a prober. Source: Advantest

In addition to improved needle lifetime, it reduces the disruption of cleaning. Any cleaning activity can impact the test process, so less cleaning time translates to more time spent on testing. That, in turn, improves overall equipment efficiency.

Socket cleaning
Just like probe tips, debris builds up on sockets and contactors in load boards, which adversely affects test results. Foreign material — also a product of metals rubbing — can cause shorts between pins. If the pins are located between the power and ground terminals, damage to the product and socket can occur. Foreign material also damages package pins or surface mount bumps. Such occurrences, if not caught immediately, can damage multiple units, increasing revenue loss.

A natural extension of adaptive probe cleaning is adaptive socket cleaning. However, because load boards are not as expensive as probe cards, the ROI has not prompted most IC manufacturers to do so. But for some IC suppliers, a socket contaminated with debris may have a significant enough impact that it motivates the company to invest in innovative detection techniques.

At the 2022 Advanced Semiconductor Manufacturing Conference, Intel engineers shared their work on implementing a real-time socket inspection system. “Defects or loose debris accumulated inside the socket can damage all subsequent units placed in socket until socket is cleaned/replaced,” the authors wrote. “To resolve this critical issue, we equipped each pick-and-place arm with a new machine vision system designed to fit inside the existing tool. The limited footprint constraints required a highly compact imaging system, which resulted in a variety of image artifacts, creating several unique challenges for the inspection system.”

These image artifacts required advanced algorithms to process the data. “We developed an inspection algorithm that utilizes a variety of advanced computer vision and machine learning techniques to normalize and match the images, remove artifacts, and detect defects. The flagged socket images can be manually dispositioned by the user and the socket can be sent for repair or cleaning as needed.”

The system searched for several socket issues including loose caps, loose caps with stain, loose pins, and foreign material. To be successful in a real-time environment they needed to complete the decision in less than 30 seconds, and successfully distinguish between real defects and image artifacts. The development of inspection image processing used a number of methods to ensure good image quality prior to the defect detection step. Qualification data for three products showed false positives to occur at a frequency of <0.1%. With the new inspection system, the Intel engineers reported they could quickly identify issues, prevent excursions, and ultimately improve production yield.

Test board maintenance
Cleaning represents an activity of on-going maintenance. Test boards for wafer probe and package also require maintenance. Electro-mechanical relays wear out, capacitors’ values degrade, and the vias between a board’s layers can become open. When these issues occur, either the board can be repaired or fully replaced. Typically, engineers and technicians track yield for specific boards, and as yield decreases, they flag the board for repair.

Several industry experts noted that board complexity increases on a four-year cadence with pins and performance increasing by 2X. This results in more initial board failures, and more frequent repairs.

“There’s a natural defect rate to produce good working boards, such that as you continue to scale up the number of traces and the number of components, you’re going to see an increasing rate of failure as measured by first pass rate,” said Steve Ledford, general manager of device interface solutions at Teradyne. “Naturally, this results in a higher decay rate of failures to these boards over time.”

Waiting for yield to decrease before detecting a failing board risks revenue loss. Teradyne developed improved diagnostic techniques and programs to ensure only good boards were shipped, and boards on the brink of failure could be detected (see Figure 3). The methods can be used throughout the life of the board. The ability to predict when a board requires maintenance or retirement benefits product yield, quality, and total cost of ownership.


Fig. 3: A board maintenance and diagnostic program can achieve 95% fault coverage. Source: Teradyne

Fig. 3: A board maintenance and diagnostic program can achieve 95% fault coverage. Source: Teradyne

“We researched how we could use the tester to be a diagnostics tool,” Ledford said. “This is not a new concept. For decades, engineers have written board diagnostic programs. But we improved upon it by increasing our fault coverage rate up to 95%, compared to most engineers achieving between 70% to 80%. We have a very robust fault model library that is combined with a test library. A second major element is that we’ve automated the test program creation.”

The same board screening program can be used during test production to flag degradation in board characteristics. In effect, it becomes a continual monitor of board health that minimizes impact to product.

Trends in multi-site testing and increasing product pin counts greatly impact the total cost of ownership for probe cards and load boards. Improving the response to maintenance lowers the ownership cost and improves yield.

In wafer and package test facilities, real-time maintenance approaches can effectively detect the need for cleaning and predict the need for general board maintenance. Chipmakers use different approaches to correlate a test parameter with the need for cleaning or board maintenance. In one instance, engineers added cameras to image debris problems. While all of this requires engineering effort, the benefits of preventive cleaning processes and real-time action outweigh the cost of implementation.

Related Stories
Cleaning Up During IC Test
Dirty probe tips and sockets adversely affect test, which can impact chip reliability.

The Drive Toward More Predictive Maintenance
Using data for just-in-time maintenance for factories and ICs.

The Mighty Sensor In The Fab
Why regularly scheduled equipment maintenance is nearing the end, and what comes next.


  1. Edwards, A. Kumar, A. Vaske, N. McDaniel, D. Pradhan and D. Panda, “Real-Time Automated Socket Inspection using Advanced Computer Vision and Machine Learning : DI: Defect Inspection and Reduction,” 2022 33rd Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC), 2022, https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9792494&isnumber=9792473

Leave a Reply

(Note: This name will be displayed publicly)