IC Test And Quality Requirements Drive New Collaboration

Tight integration of test equipment, monitors, and analytics are beyond the scope of one company, accelerating data sharing and the breakdown of silos.


Rapidly increasing chip and package complexity, coupled with an incessant demand for more reliability, has triggered a frenzy of alliances and working relationships that are starting to redefine how chips are tested and monitored.

At the core of this shift is a growing recognition that no company can do everything, and that to work together will require much tighter integration of flows, methodologies, and most important, data from multiple processes and sources. The IC manufacturing industry always has aimed for 100% yield, zero scrap, net-zero carbon emissions, and it’s well understood that these goals are becoming tougher to reach with each new technology node or multi-chip package.

What’s changed is that day-to-day improvements, such as reduced escape rates, lower cost, and higher throughput, also are becoming more difficult to achieve. And to resolve those issues, analyses must pull in data from multiple sources including fab measurements, multiple test insertions, as well as in-field monitoring. The result is a growing number of strategic collaborations among companies with different expertise, enabling engineers to cross-correlate data from multiple sources more effectively.

While these partners are not sharing raw data, they are sharing metadata, and algorithm and modeling results. That provides the motivation to develop methods for broader data sharing, which is necessary to trace the sources of yield and reliability issues throughout a device’s expected lifetime.

Paradigm shifts in testing challenges
Underpinning these alliances are multiple changes in the way chips are fabricated. Those, in turn, are shaping approaches to testing and analysis of test results, while also encouraging partnerships to address the complexity and cost constraints.

“Throughout the history of the semiconductor industry, it’s had an amazing ability to continually innovate to deliver performance improvement at lower cost,” said Regan Mills, vice president of SoC marketing at Teradyne. “We’re getting into an era where things are changing on multiple vectors at the same time. You get new construction of the semiconductor using chiplets and 3D packaging. Just the process itself is moving from planar devices to finFETs to gate-all-around, with really complex 3D devices that are packed together. And the tolerance for defects is going down. That complexity creates a lot of cost pressure.”

Teradyne recently struck up a collaboration with National Instruments, combining respective systems for its customers (see figure 1).

Fig. 1: New testing and analytics flow, including two types of edge processing. Source: NI/Emerson

Fig. 1: New testing and analytics flow, including two types of edge processing. Source: NI/Emerson

“There are several reasons we decided to start striking a partnership with ATE vendors, including Teradyne, but it is especially a result of the demands on quality,” said Eran Rousseau, vice president of enterprise software product marketing and business development at NI, an Emerson company. “At some time in the past you might have said, ‘I’m focusing on quality primarily for this market segment, and another might be at a reduced level.’ That’s just not the case anymore. Yes, there is still a notion of bin segregation and product segmentation, but even the lower end of the market has very stringent requirements in terms of quality. And among the challenges of lowering the cost, having good quality, and optimizing the efficiency, we need to strike the right balance. Obviously, we cannot do this by ourselves.”

To continuously improve quality and yield while reducing test cost, chipmakers and their suppliers are significantly changing the way the test cell operates, for instance, combining test and on-die measurements with data analytics. At the same time, there is a significant move toward more adaptive testing, which optimizes the test amount and program for incoming lots of wafers. That, in turn, is likely to drive real-time modifications at the device level at some point in the future.

But part of delivering more advanced testing solutions also involves tearing down silos between different providers. “In the past, we had a tester vendor, we had a products analytics supplier, we had an MES vendor, etc. Now, I don’t want to say customers want a one-stop shop, but they do want to see things orchestrated by somebody who is driving equipment efficiency, equipment control, and ATE performance,” said Rousseau.

Chipmakers are looking for tight integration of testing equipment, monitors, and analytics, and what that vital combination can deliver. “The next evolution that’s really going on is around getting device insights — what customers can infer from their own devices,” said Eli Roth, product manager for smart manufacturing at Teradyne.

To that end, Teradyne is setting up strategic partnerships with PDF Solutions and proteanTecs to enhance semiconductor test and debug processes for advanced SoCs. “OptimalPlus (NI) and PDF have platforms that gather data from multiple insertions and have many algorithms to analyze gathered data and help move critical information to the edge, giving the opportunity to make decisions on the fly,” said Roth.

Indeed, real-time decision-making to select the most efficient test per device or per wafer is one of the goals of the strategic collaboration among Advantest, PDF Solutions, and proteanTecs, captured in a recent Advantest podcast.[1] Companies are applying different machine learning algorithms that build on dynamic part average testing (DPAT), called ML-DPAT, to drive to lower sub-ppm levels in mission-critical devices. Collaborative forces are especially needed to better root out die-to-die interface issues in 2.5D and 3D packages.

At the chiplet integration level, complexity kicks up a notch because of lack of access to internal device I/O pins. “We’re integrating chiplets into large diverse die and the access mechanisms will become increasingly limited,” said Ken Butler, senior director of business development for ACS at Advantest during that podcast. “This boils down to increasingly subtle defect mechanisms that are harder and harder to detect. Are they even visible at time zero? Or is it more of a reliability mechanism that is going to fail over time with stress? Or is it a rare instance of something like silent data corruption, which was talked about a lot at the recent International Test Conference, that is hard to find? We need all the tricks we can come up with, including many agents in the part and really advanced analytics, in order to be able to process the data. Now we can do a lot better job of trying to track down these very subtle defect mechanisms that we otherwise might miss just using conventional methods.”

Others agree. “In terms of the kinds of outliers [we’re investigating] in the die-to-die interface, take something like an HBM interface where there’s say a thousand pins or so,” said Marc Hutner, senior director of product marketing at proteanTecs (at the time of this interview, currently at Siemens EDA). [1] “Is it just a single pin that is sort of out, or is it a group of pins? Is it only happening on a subset of parts that are produced? So we can start to get those trends because we’re looking at it as a per-die per-interface analysis. You know, it [could be] a layout problem leading to that, as well. There are lots of ways that you can, by taking the measurement, come to some level of understanding of ‘what is the design problem,’ or ‘what is the manufacturing problem.’”

A key goal in partnering is detecting yield and reliability issues sooner. “A lot of the value in doing machine learning is to ultimately drive an earlier decision. In other words, don’t wait until final package test to find out something has gone bad. You want to know that earlier upstream,” said Vishnu Rajan, engagement director at PDF Solutions. [1] “I look at this in terms of making the best use of your available testing capacity. So if you think of the three things that you’re typically trying to optimize — throughput, quality, and yield — that is perfect for a multivariate analysis where you say, ‘Hey, at least include these three things. And if you’re going to turn certain tests off, well then this is what I’m trying to optimize for, and this is my available test capacity.’ This is where advanced analytics being able to make those decisions really comes into play.”

Synopsys is also partnering with Advantest to expand the reach of its silicon lifecycle management analytics software into the test cell. “One of the megatrends we’re seeing is in order to have defect data more readily available and actionable, people are running more volume diagnosis in production,” said Matthew Knowles, senior director of product management at Synopsys. “In the past, only a couple of big companies collected and analyzed all their failure data. Other customers did it on an ad hoc basis, for new product introductions or to identify certain yield issues. But now we’re seeing all customers be more proactive, because they have to be, because they cannot suffer excursions going on for a week or longer.”

Another collaboration between PDF Solutions and proteanTecs directly connects PDF’s Exensio analytics platform with proteanTecs’ on-chip performance agents and analytics platform. Complementary expertise and streamlined integration of the platforms enable data-driven insights into yield, performance, and reliability issues with visibility into most stages of manufacturing — from device qualification to simulations to test and assembly operations. For example, engineers can review timing-margin measurements of logic paths made by the margin agents using proteanTecs platform and correlate it with signatures that are revealed on specific wafers by the PDF platform (see figure 2).

Fig. 2: By sharing select model results, users get around the lack of a universal data format. Source: PDF and proteanTecs

Fig. 2: By sharing select model results, users get around the lack of a universal data format. Source: PDF and proteanTecs

Importantly, the two companies do not share their raw data, but instead independently enrich their datasets, validate the data, and derive different analytics values. Modeled and analyzed results can be mutually shared and correlated.

In addition to agents that monitor design process sensitivity, proteanTecs offers monitors for interconnect performance and device degradation. Burn-in, which requires testing, burn-in, then retesting, is an expensive and time-consuming step. Alternatively, several companies are looking to enhanced stress testing at ATE or use on-chip reliability structures to identify failures.

All of these efforts are the result of the relentless pressure on chipmakers to identify true faults in high-volume production, including outliers, sooner. It involves addressing the sources of random and systematic defects that arise during new product introductions, production ramp and high volume manufacturing. The drive to near-zero defective parts per million devices can further benefit from integrated lifecycle management systems.

Lifecycle analytics and device traceability
In an ideal world, silicon lifecycle management (SLM) tracks devices from cradle to grave including device design, verification, validation, manufacturing test, assembly, and field use. RMAs (field returns of devices that fail) and recalls require sorting and analysis of all the test data and fab data to identify why a device failed sooner than expected.

A standard identifier for each die is considered essential to SLM. The closest the industry has to a standard is the electronic chip ID. The ECID contains the chip’s x and y location on the wafer, lot number, wafer number, and test program used. ECIDs are written during functional sort and are read through the chip I/O pins using specific test program and code.

While ECIDs are widely used for larger components, such as CPUs, GPUs, and memory devices, they generally are not used by chipmakers of many analog or discrete devices like capacitors.

“The proportion of devices with without ECID has remained the same for probably the last six, seven years,” said Rousseau. Instead, NI developed its own time-based classification systems using virtual chip IDs. “Based on data from pick-and-place machines and other tools, we can trace back that die’s ‘DNA’ to, for instance, how many times that specific device got mounted or de-mounted from a substrate. We use it in our analytics to eventually identify the wafer lot, because when we’re dealing with preventive recalls, for example, you can narrow the scope significantly. Rather than [recalling] all of a certain device made at TSMC in January, we can tell you now it’s January 2, after 8 p.m., so let’s recall 800 devices rather than hundreds or thousands of devices.”

The suppliers of testing platforms and data analytics are collaborating to deliver better test coverage and detect and analyze failures at the test cell despite the hurdles of sharing data. There is a flurry of activity behind combining various sources of in-line test data with multiple algorithms and analytics engines. Everyone is looking for best-of-breed solutions in this relatively immature AI/ML space.

As these tester-analytics platforms become more mainstream, chipmakers will find innovative ways to pull in even more critical information. “There is virtually no limit to the type of data that can be included in adaptive test, starting with test data, but including things like data from temperature or pressure sensors in the tester or images from cameras and inspection systems,” said Michael Schuldenfrei, an NI fellow. “This is particularly true for AI-based algorithms, where environmental indicators can be key inputs to a model and are sometimes critical in understanding the true cause of a phenomenon. For example, in burn-in, different chambers can experience different temperatures, requiring real-time changes to the test program to compensate.”

Over the next several years, all industry participants expect rapid progress, particularly as ML becomes more widely implemented. “What we’re seeing is the opportunity to address a class of problems that really benefit from real time data and real-time processing,” said Teradyne’s Mills. “It’s about using the data more effectively. And we’re at a point now where you have sufficient computing horsepower, sufficient progress in machine learning in an AI domain, and essentially problems big enough that convergence in a collaborative test and analytics environment makes a lot of sense going forward.”


1. “Advantest, proteanTecs, and PDF Solutions harness AI power for yield, quality, and reliability,” Advantest Talks Semi podcast, May 24, 2023. https://www.buzzsprout.com/1607350/12911337

Related Reading
Adaptive Test Ramps For Data Intelligence Era
Timely changes to test programs will speed throughput, but one big hurdle remains.
Fingerprinting Chips For Traceability
Diverse identifier technologies enable fingerprinting for all device types.
Hidden Costs And Tradeoffs In IC Quality
Why balancing the costs of semiconductor test and reliability is increasingly difficult.

Leave a Reply

(Note: This name will be displayed publicly)