Test Challenges Mount As Demands For Reliability Increase

New approaches, from AI to telemetry, extend well beyond yield.


An emphasis of improving semiconductor quality is beginning to spread well beyond just data centers and automotive applications, where ICs play a role in mission- and safety-critical applications.

But this focus on improved reliability is ratcheting up pressure throughout the test community, from lab to fab and into the field, in products where transistor density continues to grow — and where many devices are optimized for different market segments. There are more devices to test, and some test processes themselves are becoming more complex. Rather than just traditional testing of electrical signals and ensuring sufficient power is delivered to various components in a system, testing now must consider mechanical handling of bare die, simultaneous management of thermal and power profiles, as well as materials engineering to manage the challenges of probing, contacting, and managing thermal and power profiles.

To make matters worse, with each new technology node and growing heterogeneity, testing is taking more time, from initial planning for how to test to the actual test time. And it can vary greatly from one design to the next, and from one foundry to the next, even within the same market segment. So a chip/chiplet used in one heterogeneous package may be expected to behave very differently from that same chip/chiplet in another package. One may include silicon photonics, while another uses standard electrical junctions. And even in the power-application chips, use cases may vary greatly for a chip used in a power grid versus an electric vehicle.

Test is a requirement for all of these applications and use cases. Unlike in the past, however, component variety and circuit density has increased; and they often are tested differently than in the past. Put in perspective, development of various components and updates to firmware or even manufacturing processes occur on different schedules. But quality needs to be consistent across all devices, even when there are multiple options for putting these devices together in a heterogeneous integrated product.

“Today, there are about 15 different packaging techniques that are available,” said Yervant Zorian, chief architect at Synopsys. “But in each case, the interconnects are made of different material, and they can have various challenges in manufacturing, like weak contacts. That can result in static or more dynamic faults, like in the very high-speed domain. How do you test them? How do you monitor them? How do we ensure their speed is running properly, while you can neither probe them nor come in between them. So, a DFx solution is needed.”

Technology and sector drivers for the next 10 years
Semiconductor test challenges can vary greatly, depending upon the technology being tested and the market in which it will ultimately be used. It may be based on the most advanced process node technology, or it may include a combination of leading-edge and mature process technologies running at different voltages and speeds, and with varying life expectancies. These also can vary significantly by industry sector, which determines the yield-quality-cost triangle decisions. But in all cases, the number of options and possible permutations is increasing rapidly.

“Automotive industry parts are expected to drive development for the next decade, if not more,” said Vineet Pancholi, senior director of test technology at Amkor. “Higher-speed digital logic, mixed-signal in-vehicle infotainment, power discrete switches, and MEMS and sensor products are the top application products in the automotive segment. In the hand-held and wearable market, NB-IoT with low power RF applications are expected to continue to explode in volumes, as well. And in advanced 2.5D and 3D packaged ICs, commoditization with chiplets will continue to drive development, keeping up with Moore’s Law.”

How to test these different devices often depends upon volume and quality expectations, and which end market they are targeted for.

“There are lots of different leading-edge customers, and they lead in different areas,” said Rick Burns, president of semiconductors at Teradyne. “There are customers in the mobile market with enormous volumes. Economics dominates everything there because the volumes are so high. Customers at this leading-edge look at their test solution as a cost and yield optimization opportunity. But in the same breath, for cloud customers it’s not about the economics. For them, it’s about capability. They want to have complexity levels that used to be unobtainable, and that’s where they need to go.”

The real value for cloud providers is uptime, and the cost of design, manufacturing, and test is less significant than it is for a consumer electronics application. Hence, a cloud provider can justify the cost of long test processes and investments in more customized solutions. In contrast, products with cost pressures drive a whole different set of innovations.

“There continues to be an increased focus on quality that everyone is driving toward,” said Keith Schaub, vice president of technology and strategy at Advantest America. “This mantra of zero defects is what everyone wants. There’s no such thing, but we need to use all available capabilities to get as close as possible to that. At the same time, there’s always this emphasis on cost reduction.”

Achieving these seemingly incompatible goals requires thinking about quality over time — pre- and post-manufacturing — as well as during various insertion points in the manufacturing and/or packaging process.

“For me, the number one challenge is changing fault models to really account for what’s happening in the newest CMOS technologies. The next is using more in-die monitors and BiST,” said Andrzej Strojwas, CTO at PDF Solutions. “Third, is addressing heterogeneous integration test challenges, for which there are many — for example, matching chiplet performance prior to assembly. Design companies pioneering the use of this technology, like AMD, have noted this requirement.”

So rather than just relying on traditional pass/fail decisions, test is becoming more nuanced and more pervasive over time. This is essential, because the world has an insatiable appetite for dies with 4 billion-plus transistors. Along with that come challenges in meeting quality levels at various steps, managing test data analysis, and moving a huge amounts of data in and out of a device throughout their projected lifetimes with ever-smaller sets of pins.

Improving product quality
As larger computational SoCs move beyond data centers and into cars and other safety-critical applications, the mantra of ‘zero defects’ is being repeated in fabs, wafer test, assembly and package test facilities. The recent reports from Meta and Goggle engineers regarding silent data errors (a.k.a. corrupt execution errors) has raised the flag regarding the subtle nature of faulty behavior from what used to be inconsequential manufacturing anomalies. Now a slight increase in contact resistance or a minor shift in a transistor parameter can result in a minor path delay, but only with specific inputs and surrounding electrical thermal environments will it cause a failure. These behaviors require advanced fault models.

In the past 10 years, the behavior of subtle manufacturing defects have been described as either marginal defects or systematic defects. In general, these are attributed to aggressive design rules which do not fully account for the interactions between lithography, etch, and fill process steps. Even following design-for-manufacturability guidelines does not eliminate these problems. Certain layout patterns have a higher sensitivity to these interactions, resulting in a higher probability of a defect. That, in turn, may be combined with process variation, which can ultimately impact transistor behavior under specific electrical and/or thermal conditions observed in a customer’s system.

“Contrary to the previous technologies, the defects that are more of a systematic nature play a much more important role,” said PDF’s Strojwas. “You will not be able to eliminate the systematic defects (yield limiters). And what we observe in volume production is these systematic defects occur and require screening. Layout-pattern-specific defect models need to be included in the models that ATPG uses.”

These advanced fault models require higher test pattern count. Several experts noted that system-level test has distinct advantages here because engineers can afford a half-hour of scan testing. In addition, this test insertion closely represents an end-customers production environment.

“To maintain a high-test quality, more advanced fault models will be required to be run in-system,” said Lee Harrison, director of Tessent product marketing at Siemens EDA. “These address new defects that we are only just starting to see at the very latest nodes, but which will become commonplace by 2033. We also need to target manufacturing test quality in-system.”

Test data has value well beyond just reducing field failures. It also has increasing value for the final product being delivered. Building from embedded DFT solutions like memory BiST and logic BiST, there has been steady growth in on-die circuitry to monitor internal behavior, leveraging telemetry to deliver more targeted test content over more test conditions. This data can also drive design resiliency usage in the field.

BiST has been so widely adopted for good reason,” said Marc Hutner, senior director of product marketing at proteanTecs. “SoC complexities and the rising costs of testing warranted a new approach, and BiST has taken the industry forward a great deal. But we need to take it even further. BiST still has several disadvantages — for example, the silicon area it takes for isolation when it’s required to run in mission. Or the fact that it is pass/fail and does not include application context (like similar operating conditions), so it doesn’t represent defects that occur during usage. It finds the failures while they already are affecting the logic functionality of the device. This is where deep data that’s based on chip telemetry comes into play. You’re really getting all this visibility, but without paying in resources. It is predictive and finds faults before they become real logical failures. It targets precursors of failure and their changes over time.”

ATE improvements
The automated test environment industry has been working to keep up with this rising complexity, as well. Taking note of the machine learning capabilities of tensor processors and GPUs, ATE companies developed computational engines that sit alongside of test equipment to identify subtle defects in real-time across a tsunami of test data.

“Today’s smaller geometries and increased device complexity require more AI/ML power to enhance data analytics,” wrote Shinji Hioki, strategic business development director at Advantest America, in a 2022 SEMICON Japan paper. “Data analysis used to be done in the cloud or on an on-premise server. The tester would send data to the cloud or server and wait for the analysis results to judge defects, losing a full second of test time or more – a large deficit in high-volume manufacturing operations. Edge computing, on the other hand, takes only milliseconds, delivering a huge benefit in test time savings.”

Fig. 1: The ML model development retraining cycle feeds data into ACS Edge, which communicates with the V93000 for concurrent test and data analysis. Source: Advantest

Fig. 1: The ML model development retraining cycle feeds data into ACS Edge, which communicates with the V93000 for concurrent test and data analysis. Source: Advantest

Both Advantest and Teradyne offer a separate computing resource their customers can use without ATE programs knowing what’s going on.

“Most of our customers are much better at the analytics for their products, so we’ve taken a different approach, which is to make it easy to access that data to provide fat pipes of data on and off of our system in both directions,” said Teradyne’s Burns. “Then customers can be looking at a data feed, and they can be making adjustments to the activity in the test cell based on the information they’re gleaning from their data feed. We’ve provided a local inference server that lives essentially connected into the high-speed network of the tester itself. So if you want to do real-time processing, a customer can put their own encrypted algorithms down into the test cell to manipulate to observe and the data, perform their own algorithmic understanding, and then apply some controls based on what they observe.”

The pin problem
One of the big challenges in test today is extracting the data out of a die or packaged part. Years ago, microprocessor vendors abandoned test-only pins as pin availability became constrained, opting instead for re-using existing pins. But those low-speed pins are no longer sufficient to move the volume of data needed for scan-based testing and now telemetry data. ATE and EDA companies have worked together to use SerDes interfaces, such as PCIe, in their native protocol.

“Test is no longer just this tax that is required to achieve a ramp to yield,” said Rob Knoth, product management director in Cadence‘s Digital & Signoff Group. “It is now, suddenly, a high-bandwidth pipe of data on and off the chip, from the smallest level to the largest level, pre-silicon manufacturing to post-silicon. A lot of the infrastructure we put in place to make test possible is uniquely positioned to enable this kind of telemetry that we’re talking about. This is high-bandwidth I/O, on and off the device. And if you start having tests using high-speed functional I/O, you unlock a tremendous potential both for manufacturing test cost reduction, as well as for in-system test availability. That’s the next big frontier.”

Fig. 2: Increasing serial protocol and ATE data rates. Source Cadence

Fig. 2: Increasing serial protocol and ATE data rates. Source: Cadence

Others agree. “The battle over the I/Os available for these devices and the ability to have dedicated general-purpose I/O pins dedicated for test is always shrinking,” said Siemens’ Harrison. “Embracing the functional high-speed interfaces enables the reuse of existing interfaces along with the ability to stream data faster.”

Packaging issues
This is critical in 2.5D and 3D ICs, as well. “We can lose our ability to talk to the chip in the traditional sense,” said Advantest’s Schaub. “You begin with known good die, you’ve tested it, and now you integrate it into something else and it becomes a sort of sub-assembly. You still need to test again, but you no longer have any access to it. So you have to somehow communicate through one or multiple chips in order to get the data that you want or that you need.”

Industry experts highlighted multiple test challenges needed to support the expected increase in heterogenous integrated systems, particularly when it comes to chiplets. While the ubiquitous known good die (i.e., zero circuit defects) problem is ongoing in terms of die-level testing, there are well thought-out strategies that resolve this expectation. These range from increasing outgoing quality from die-level test across all semiconductor technologies and increasing design resiliency in complex logic devices.

“Heterogeneous integration has many challenges for wide adoption, many of which were discussed at the Chiplet Summit in January 2023,” said proteanTecs’ Hutner. “There is a major blind spot when testing die-to-die interfaces and knowing how well they are working. Today most interfaces include a PRBS test mode that says that the interface is working at a particular condition. It does not say how close to failure, just that it works at a particular voltage and frequency. Over time and under stress, the performance of the interface will change and may result in faulty lanes and early life failures. The chiplet economy will require further monitoring capabilities that enable evaluation of the interconnect health in mission mode and provide guidance on when to repair or replace a unit.”

Testing complex chips and advanced packages is becoming more challenging, but the test world has been actively developing solutions that reach well beyond just final test. The test process starts early in the design cycle with DFX and telemetry architecture, and now it extends all the way into the field, where devices can be monitored for everything from aging to total failures. Also, asymptotically achieving 0 DPPM means manufacturing test cannot detect all defects but much more data is being analyzed in real-time to reach a product’s yield-quality-cost triangle.

Test takes longer at advanced nodes than in the past, but it’s also being used to track and analyze much more. Chips are denser, package interconnects are more complicated and fragile, and the number of packaging options utilized by chip architects continues to grow. Yet test remains viable, essential, and remarkably resilient in the face of all of these changes.

Leave a Reply

(Note: This name will be displayed publicly)