Coping With Parallel Test Site-to-Site Variation

Why this is a growing problem, and how it’s being addressed.


Testing multiple devices in parallel using the same ATE results in reduced test time and lower costs, but it requires engineering finesse to make it so.

Minimizing test measurement variation for each device under test (DUT) is a multi-physics problem, and it’s one that is becoming more essential to resolve at each new process node and in multi-chip packages. It requires synchronization of electrical, mechanical and thermal aspects of the whole test cell so that chipmakers can ensure variation is confined to the DUTs. This assumption is vital when applying statistically determined test limits that appropriately adapt to local process variation.

The test world is not perfect, which necessitates accounting for differences in test measurement environment. Getting this right can have a big impact on an IC product’s outgoing quality and reliability.

Fortunately, the test data from each test site can be used to determine these differences. Armed with this knowledge, engineers can adjust their statistically based pass/fail criteria algorithms accordingly. As a result, both product yield and quality improve. This is essential because parallel device testing continues to increase at wafer and unit level testing. And the end products are being used across a variety of markets, including data centers and safety-critical applications such as automotive, which continue to demand escape rates on the order of 10 ppm and lower.

Unit-level test includes final test, burn-in modules, and system-level test (SLT). But it’s wafer and final test that pose the more daunting technical challenges due to the smaller test interface boards for probe cards and loadboards, respectively.

“Our customers’ wafer probe cards are growing in their usage of multi-site test,” said Keith Schaub, vice president of technology and strategy at Advantest America. “Combine this with an increase in DUT pins for some products (large digital devices), and the common concerns of probe card planarity and the probe tip damage, due to burning when too much current is applied, become an even greater concern.”

1X to 16X site testing.

Fig. 1: Progression from 1X to 16X site testing. Source: Anne Meixner/Semiconductor Engineering

“You could probably drive a truck through this range,” said Mark Kahwati, product marketing director for semiconductor testing group at Teradyne. “There are some applications where it remains single-site. Then consider controllers used in automotive safety, airbag controllers, and ABS controllers. You see anywhere from 4 sites to maybe 8 to 12 sites. Then, with relatively low pin count devices in automotive, the number of sites approaches 64 sites in parallel, if not more.”

While the same economics drive the increasing the number of devices tested in parallel, those numbers can vary greatly by industry sector and device type (see figure 2 below).

Industry Range of parallelism Comments
RF consumer 8 to 16+
RF mmWave 2 to 4 At wafer sort Probe head limited
Digital: microcontroller 16 to 4000
Digital: advanced devices mobile 6 to 16
Digital: advanced large 1 to 4
Automotive 2 to 32 At wafer sort
Automotive: large devices 8 to 12 Package test
Automotive: smaller pin count devices 4 to 64 Package test

Fig. 2: Number of sites per test insertion, by industry and device type. Source: Teradyne

In parallel test, every effort is made to minimize ATE and associated test hardware differences between each test site. With their latest ATEs, vendors provide new capabilities to support multi-site testing with increased attention to reducing the test hardware contributions. Analog test measurements require more care in the design of path from the automatic test equipment (ATE) hardware to the DUT, but ATE instrumentation can be calibrated to account for differences along these paths.

Nevertheless, differences persist. And when applying statistically based outlier detection techniques, these differences matter. Engineering teams at Texas Instruments, AMS AG and Skyworks Solutions have documented the impact of site-to-site differences. In their 2015 DATA workshop paper, engineers from Skyworks Solutions and Galaxy Semiconductor articulated why it matters:

“It would therefore be logical to assume that adjacent columns or rows of devices should show nearly identical data distributions. However,” they wrote, “a tester with multiple test site hardware components will show systematic variation from one test site to the other…In spite of the best efforts to ensure that test hardware is consistent from one piece to another, measurable biases often emerge. These biases can and do contribute to variation in the statistics behind NNR values. These biases, because they are consistent and predictable, can be managed with a linear offset applied to the measurements.”

Test limits based upon the statistical techniques have become a common tool in a product engineer’s toolbox. Such techniques inherently assume that all die/units have the same measurement environment. As a result, when testing devices in parallel, engineering teams first focus on achieving that assumption.

Reducing test cell site-to-site variation
Any measurement system has sources of error. For semiconductor device testing, both the signal paths and the power paths between ATE and DUT need to be considered. At each hardware device and connection there exists a tolerance for each measurement parameter. For instance, edge placement accuracy represents a timing tolerance for pin electronics cards. These tolerances add up between the path of the DUT pin/pad and the ATE instrument.

Fig. 1: Progression from 1X to 16X site testing. Source: Anne Meixner/Semiconductor Engineering

Fig. 3: Contributions to measurement errors in the test path. Source: Anne Meixner/Semiconductor Engineering

For first-order understanding, the physical area of a device board/probe head combined with a device’s pin count factors into the amount of parallelism that is physically possible. Next, the mechanical, thermal, and electrical attributes of the test cell need to be understood, as all of them can contribute to errors.

Reducing these contributions to measurement error meets the overall goal of having a high accuracy test set-up. With multiple sites come a few unique challenges to meet equivalence between sites. Engineering teams need to:

  • Balance the thermo-electrical challenges across the multiple sites;
  • Manage the test cell resources to deliver identical voltage and currents;
  • Design ATE instrumentation to stringent specifications to reduce tester channel differences in site to site;
  • Include calibration techniques to reduce signal path variation, and
  • Design test interface boards — also known as probe cards and loadboards — to assure equal transmission line lengths and environment, such as coupling to nearby signals.

“At wafer test there are a number of items impactful to site variation — mechanically, how the probes contact the pads, contamination on the pad or the probe, and temperature variation across the wafer/chuck,” said Darren James, technical account manager and product specialist at Onto Innovation. “Electrically, design and layout of the interface and the probe card to provide good impedance matching of the sites/pins is especially important if resources are shared between sites. Interface design also will impact the amount of cross-talk and leakage.”

From an package test perspective George Harris, vice president global test services at Amkor Technology, noted several commonly observed causes for site-to-site test variation:

  • Routing on the board impacting resistance, capacitance, inductance, coupling and crosstalk variations between sites;
  • Thermal differences across the test boards, both on top side and backside;
  • Differences in tester resources between channels.

“It’s always best to design and characterize the production test environment relative to the product specification requirements,” Harris said. “Even fairly simple products pushing the test environment with many sites tested or stressed in parallel may have power distribution differences, as will a complex SoC.”

Identifying and dealing with site-to-site variation
Testing cuts across multiple processes as it shifts both left and right. As a result, variation needs to be dealt with in the context of other processes. For example, during test, engineering teams need to identify excursion-based site-to-site variation to which they can respond. In contrast, product engineering teams may need to account for site-to-site variation when applying their pass/fail criteria.

“Engineering teams need to have a test process control system in place, with analytics to assist with the root cause of variance when issues like site-to-site variation are detected,” said Greg Prewitt, director of Exensio solutions at PDF Solutions. “The control system needs to be able to alarm/alert quickly so the team can take action to resolve the situation before material needs to be scrapped. Some of the best practices include automated responses, such as clean probe needles, or activation of an Out of Control Action Plan (OCAP) processes, which in turn needs to be integrated with manufacturing execution systems (MES) for automated holds on suspect lots.”

When parallelism equals a whole wafer touch down, engineers need to consider more advanced statistics as they crop up. Consider, for example, smart card devices pushing 4K sites at wafer test.

“Big probe heads for this many sites raise challenges of temperature variation across the chuck, which could impact device temp sensor measurements if not managed,” said Ed Seng, product manager of digital segments at Teradyne. “Site-to-site correlation at these high counts has to be done more statistically, relying on a higher volume of data as compared to single stepping a single die across a wafer.”

With 4K site comparisons, the correlation analysis becomes way more complicated than 4 or 8 sites.

So exactly how is site-to-site variation analyzed? Engineers can use gauge R&R techniques to assess repeatability and reproducibility across multiple sites. For 2X to 16X parallelism, analyzing site-to-site variation easily can be supported by most statistical software packages (e.g. JMP, R).

Factories can respond to tester hardware differences that require preventive maintenance. Yet the subtle differences in that signal path, from instrument to DUT pad/pin, add up. The latest ATE models have been designed to minimize such differences. Also, test interface boards — such as probe cards and loadboards — must be designed with expert knowledge in PCB technology in order to minimize differences.

But in both wafer and unit-level test factories, the reality is there exists a lot of older ATEs. As a result, the latest products may be tested on older equipment. That, in turn, can result in site-to-site test result differences. If the differences are minor and the test process is well controlled (i.e., a Cpk greater than or equal to 1.33), the impact to device yield and quality will be negligible.

The definition of negligible though changes when applied to sensitive analog measurements coupled with the application of statistical outlier detection test criteria.

Outlier detection tests range from the simple part-average testing (PAT) to the sophisticated Near Neighborhood Residual (NNR) testing. When required, these data analytics-based test techniques can accommodate the observed site-to-site variation. In fact, it becomes a necessity, as illustrated by two examples for how engineers accommodate for it. The first example looks at RF test and PAT. The second looks at IDDQ wafer test and NNR.

“For an RF device, we ran into a similar problem where devices were tested in quad sites. We had one site that gave statistically different tests results from the others. With RF, it’s very difficult to match four sites very well. The RF performance characteristics on four sockets, four contactors, and four sets of components are going to be different,” said Jeff Roehr, IEEE senior member and 40-year veteran of test. “If we didn’t account for that, we would have a very wide distribution in the test data, which made it hard to see the outliers. We learned over time that we had to analyze test data on a per-site basis. In effect, we had four sets of software running simultaneously doing PAT.”

With device populations on the order of hundreds to thousands, engineers establish PAT and Dynamic PAT limits. On smaller statistical populations of about 25 to 40, like those used for Z-PAT and NNR, the impact of test hardware site-to-site becomes more noticeable. Especially with sensitive analog measurements, neglecting the impact can result in failing good die, as well as passing bad die.

Over the past decade, several papers describing outlier detection techniques have stated that test hardware site-to-site variation impacts the ability to precisely discriminate between good and bad die. A 2016 Texas Instruments paper noted that site-to-site variations need to be accounted for when applying NNR techniques. And a 2018 AMS AG paper on adaptive test for mixed-signal ICs included site-to-site variation in its dynamic PAT limits.

In a 2015 DATA workshop paper, engineers from Skyworks Solutions and Galaxy Semiconductor presented a method to offset site bias when applying NNR. For each test measurement, they shared a technique for calculating each site’s bias To illustrate their technique assume 4X testing and a test called ACB22. The calculation follows:

  1. Calculate the medians for test ACB22 for sites 1, 2,3, 4: ACB22Med (site 1)
  2. Calculate the mean of these four medians: Mean of ACB22Med
  3. Site bias for site 1 equals Mean of ACB22Med minus ACB22Med (site 1)

Applying the resulting site bias to the NNR limits more precisely discriminates between good and bad die.

With the continuing cost pressures to test semiconductor devices in parallel comes the engineering effort to create the measurement environment so each site is equivalent.

“The economic motivation for higher multi-site test still holds,” said Teradyne’s Seng. “The same type of multisite challenges exist as they have in recent generations but continue to grow into the next degrees of technical complexity,” said Teradyne’s Seng. “Most of the challenges are in the device interface area, from the tester instrument device interface board (DIB) connection through to the device connections. The best test systems will take care of all the other multi-site factors and make it fast and easy to implement high multi-site test solutions.”

Still, not all engineers get to test their products on the best test systems. With products that are tested in parallel, they need to manage their current products with the test equipment they have in their factories. This requires them to reduce site-to-site variations in test processes as much as possible with design and necessitates them to respond to excursions associated with the realities of a factory floor. In addition, inherent site-to-site variation needs to be considered when product engineers use statistically based pass/fail test limits. Fortunately, the test data can be used to discern the differences between test hardware and DUT contributors.

Parallel test execution reduces overall test cost. Yet the simplicity in a diagram of testing four units belies the engineering effort behind it.

Related Stories
Geo-Spatial Outlier Detection
Using position to find defects on wafers.

Part Average Tests For Auto ICs Not Good Enough
Advanced node chips and packages require additional inspection, analysis and time, all of which adds cost.

Chasing Test Escapes In IC Manufacturing
Data analytics can greatly improve reliability, but cost tradeoffs are complicated.

Adaptive Test Gains Ground
Demand for improved quality at a reasonable cost is driving big changes in test processes.


Ahsan Islam says:

A good article!

Leave a Reply

(Note: This name will be displayed publicly)