Customizing IC Test To Improve Yield And Reliability

Identifying chip performance specs earlier can shorten the time it takes for processes to mature and lower overall test costs in manufacturing.


Testing the performance and power of semiconductors as they come off the production line is beginning to shift left in the fab, reversing a long-standing trend of assessing chips just prior to shipping. While this may sound straightforward, it’s a difficult challenge which, if successful, will have broad implications for the entire design-through-manufacturing flow.

Manufacturers typically grade chips just before shipment. Not all chips are created equal, and that disparity typically widens with the introduction of new manufacturing processes. It takes time to reduce variation and for processes to mature. In the past, foundries guard-banded new processes to take that into account. But because the number of new functions being added into advanced chips already has pushed them beyond the size of a reticle, there is no room left for that margin. As a result, it’s essential to identify any potential defects or irregularities earlier, because they could impact the lifecycle of these increasingly complex and costly chips.

At the same time, that data can be used to fine-tune new processes to improve yield faster. It also can be used to improve the efficiency of testing, which has been under pressure from fabs and end customers. In the past, test was firmly fixed at 2% to 2.5% of the total cost of manufacturing chips. Those numbers have been steadily increasing for several reasons:

  • Chips are becoming more complex, and at advanced nodes they are more difficult to test. They require more time for test, inspection, and metrology at multiple insertion points throughout the flow. In addition, the test probes need to be customized, and testing requires more of them for each new design, which further increases the cost;
  • Demand for reliability, particularly in mission- and safety-critical applications, requires higher test coverage;
  • The test equipment itself is increasing in complexity, and more equipment is required to get the job done. There are more sensors, AI/ML to analyze the data generated by those sensors, and more testing overall because there are more features per die and more die per package.

“What we’re looking at is how to optimize across the value chain, especially when you have more complex problems to tackle with 2.5D and 3D devices, heterogeneous integration with more in a package, and you don’t have pin access to everything that’s going into that package,” said Ira Leventhal, vice president of applied research and technology at Advantest America. “For HPC devices you see test costs going up from the classical 2.5% of revenue up to maybe 3% or 4%. So how do you get that back down? You can’t just keep adding tests. Rather than looking at every insertion, you have to ask, ‘What am I doing here and how do I optimize them?’ Doing more in parallel is the only way to tackle that in the longer term. You pay a little more money now to get yourself over the hump, but then that has to settle down to the historical level.”

Earlier data and better feedback can help offset those costs in other ways, both directly and indirectly.

“We’re seeing an increasing shift from the classical performance-based binning to application-based binning,” Leventhal said. “In the past, devices were a lot simpler. They were mapped into very specific applications. Now, with AI chips, you may have a multitude of applications. They’ve got many different cells in the device and multiple cores, and so the binning of that device to an application becomes a much more complicated problem than just speed binning of a processor chip. This is where you need to bring into play models that can solve complex binning. There’s also a time-based aspect to it. More predictive models will tell you realistically how long you can expect these devices to last.”

Much of this is increasingly domain-specific, and that only adds to the complexity. What works well enough in a single-chip application may not be sufficient when it’s combined with other chips or chiplets in a package. And what is good enough for one application or use model may be very different in another, where the life expectancy might be longer or shorter, and specs may be looser or tighter.

“This really goes beyond just looking at the parametric data and pass/fail criteria,” said James Guilmart, principal solutions marketer at NI. “It’s more about complex dynamics and multi-variable analysis and saying, ‘Okay, this passed all these thresholds, but this unique combination is close to the edge here and it actually means it may fail in three months. So we probably don’t want to put that in a pacemaker. As we look at these more advanced application-specific bins, it’s going to take a lot more multi-variable analytics to classify these die on a more granular level to know how they’re going to perform in the field. And then, you need to make sure that aligns with the expected use case for that device.”

Customizing the test process
All of these steps cost money, too, whether that’s a result of more time spent in test, more insertion points throughout the manufacturing process, or new equipment required to probe more chips and assess the results.

It’s always about what’s the goal of the insertion,” said Michelle Evans, strategic applications software solutions manager at Teradyne. “Depending on the class of device, you can do intermediate binning throughout one insertion. And you can make conditional branching. For mobile applications processors you can bin them out at different stages. I know at this stage it’s going to be a ‘good’ bin, and that can be a ‘one, two, or three.’ It doesn’t always have to be a one. Mobile applications processors do that today. When you get to medical, though, that’s a different kind of binning. It either works and meets requirements or it doesn’t.”

This becomes more challenging as chips are combined in a package. Leads are not always exposed, which is why there is so much interest in telemetry types of data from on-chip monitoring. The challenge is first, to extract the necessary data, and then to be able to utilize it at appropriate points throughout the flow.

“On the tester, you need the right information to help you decide and execute your binning strategy” said Alex Burlak, vice president of test and analytics at proteanTecs. “What are the physical measured parameters you are basing your binning on? This is where our portfolio of agents come in. You need to have granular visibility into the parametric data within the die from multiple locations within the die, transistor process grading, leakage signature, RC delay, path delays of actual logic paths and more. This approach provides accurate performance binning at a much earlier stage than today — even at wafer sort. This is done online per chip. This means that you can now keep a die-bank instead of an inventory of finished goods. You can bin prior to package selection, you can pair dies of similar performance based on this data.”

And just because chips/chiplets passed tests prior to integration in a package doesn’t mean they will remain good once the package is sealed up.

“The chips can be tested individually, but you have this packaging and interconnect portion that wasn’t ever tested, but it’s in the final product,” said Teradyne’s Evans. “The first step is to weed out the gross defects, and then you go into an area where it’s mission-critical. If it’s at the highest speed, highest power, or whatever the criteria is for that part that makes it mission-critical, you test that first. That allows you to create what I call a ‘truth table.’ You don’t bin there. But you’ve created an intermediate flag that allows you to continue on or go a different route in the flow, all within one insertion. There’s always some conditional branching.”

Testing in context
Not everything has to run at top performance. As advanced packaging becomes the norm for leading-edge designs, granularity is leading to some new approaches.

“The conventional way to think about this is that you don’t want to put a high-speed part and a low-speed part in a two-chip module because you’re bounded by the worst part,” said Mark Laird, senior staff application engineer at Synopsys. “But I’ve also seen a use case that is more interesting, where they care about total power. So in their case, you’re trying to balance high power and low power on the same module. That’s a different optimization.”

The shift here is a system-level view of performance and power, rather than an individual die. “We’re working with a Vmin data set, but not just predicting Vmin on an ATE. We’re also predicting it in a system, because the Vmins on an ATE versus an SLT tester can have pretty wide variation. Sometimes the ATE will have lower Vmin than SLT, but I’ve also seen higher Vmin on SLT, so it’s product/variant-dependent. You could have the same part for desktop and server, but they’re going to have different packages with different pin-outs for those two products. You want to be able to predict final binning at wafer sort and then feed that data forward to say, ‘Not only do I think this is going to be a server part, but it’s this classification of server part — a high-speed server part, for example,” said Laird.

Automation with better data
Underlying all of this is a shift toward better data, and better ways to utilize that data. EDA vendors have been working with foundries and equipment makers to unify data from concept all the way through to the field. This end-to-end data flow is now beginning to incorporate data from various process steps, basically moving DFT and other processes from lab to fab to field. The big shift, from a test standpoint, is developing a good test plan from the outset, because so many of the leading-edge designs are highly customized and extremely complex. A timing mismatch in any element may have broad impacts on the performance of other elements in a package, for example. However, once all that is resolved and the parameters are set, then much of the rest of the testing process can be automated.

“Even though it’s complex as you look at those variables, once you understand it, add in the historical context, and know what the data is pointing to, then it becomes automated,” said NI’s Guilmart. “One of the key enablers is the data model. How are we getting this data and how are we storing it? What is the structure of our data schema? We’re really prioritizing the lifecycle of a product. For each die, you want to associate all the data flows and the data and the test outcomes that it went through. You can think of that almost like a digital twin for a die. You have its whole production process journey. When we get that final test data and we start to analyze it and look for deviations or drift, we can look at all of this together and start to cluster them and start to figure out what caused anomalies. It’s a broader picture of the data.”

That data, in turn, is integrated with on-chip/in-package data to provide visibility into how chips are behaving in the fab, inside a package, and ultimately in the field. Some of this is tied to what proteanTecs’ Burlak called an “edge library,” which is software that can be integrated into the test program. “All the modeling and analysis is done in the analytics platform. Our edge library, deployed on the ATE, processes all the information from the agents and additional customer data, runs the models, and predicts what is the right bin for a chip while it’s still on the tester.”

Binning earlier during manufacturing is an important step toward improving reliability and reducing the cost of test. But it’s just one facet of the reliability equation, which also includes metrology and inspection. The ability to assess how a chip will behave over time, in specific applications and under well-understood use cases, greatly increases the chances for successful implementations and reliable performance.

“One of the classic problems we’ve seen when failing devices come back is that unless you’re seeing a number of failures that are pretty similar in nature — enough so that someone is going to take a closer look into these things — you don’t know what went wrong with a device and what you can trace it back to,” said Advantest’s Leventhal. “Now, we’re going down the path of being more predictive about what’s going to impact reliability. It’s not an easy problem to solve. You need to do the groundwork up front, to collect that data and bring it all together and find needle-in-the-haystack relationships, and then be able to use that information to your advantage and be predictive about it.”

For equipment makers and chip companies, this opens a slew of new opportunities. With rising complexity everywhere, the ability to comprehend interactions and correctly interpret large volumes of data is forcing more automation and equipment upgrades. And for chipmakers and the entire manufacturing and packaging ecosystem, successful data integration could open the door to significant improvement in device quality, stabilization of costs across multiple processes, and far fewer returns due to failures in the field.

Related Reading
Fab And Field Data Transforming Manufacturing Processes
Data from on-chip monitors can help predict and prevent failures, as well as improve design, manufacturing, and testing processes.
Mission-Critical Devices Drive System-Level Test Expansion
SLT walks a fine line between preventing more failures and rising test costs.

Leave a Reply

(Note: This name will be displayed publicly)