Gaps remain, but the broad integration of data from design through manufacturing, test, and assembly holds promise for improving reliability.
Silicon lifecycle management (SLM) is gaining significant traction, driven increasingly by stringent reliability requirements for safety-critical devices in aerospace, medical, and automotive.
Improving reliability has been a discussion point for years, but it has become especially important with the use of chips designed at leading-edge nodes in both mission- and safety-critical applications. Even marginal defects at those geometries can lead to device failure at any point during usage in the field. Moreover, the proliferation of multi-chiplet packages makes failures in any part of these stacks hard to locate and test, requiring more complete access to trends in chip behavior.
The priority is preventing failures in the first place. But when failures do occur and devices are returned, more robust solutions are needed to quickly find the root causes. This is where SLM fits in. It can trace the performance and variation in devices, from design through manufacturing, test, assembly, and system use. It is of greatest interest to manufacturers in critical domains, including in large server farms where intermittent, rare phenomena such as silent data errors must be detected. In those cases, integration of device data from design to use is transitioning from a “nice to have” capability to a priority.
“SLM is the ability to insert monitors and gather data throughout the life of the chip or connected devices across a fleet of vehicles, data center servers, or mobile devices. It’s about an optimization that never stops,” said Randy Fish, director of product line management for the Silicon Lifecycle Management family at Synopsys, pointing to the expansion of existing design for reliability methods to catch potential failures throughout the supply chain. “Design for reliability has already been done with a lot of BiST, for instance, driven mainly by automotive. Anytime you turn your key on and off, it goes through self-tests. But you can be much more rigorous, and we’re seeing that now.”
Fig. 1: Yield ramp showing the relationship between process and product analytics & control. Source: Synopsys
Part of this expansion relies on data sharing among fab, design, test and packaging engineers so that machine learning-based analytics can provide higher overall yield at lower test costs. Methods of anonymizing data and encrypting data are progressing rapidly, but sharing requires a sea change in operating methodologies. Select companies are partnering up to make this happen, but it will take time to determine the best results.
Silicon lifecycle management represents a paradigm shift from siloed data management practices to data sharing on an as-needed basis. In addition to horizontal data sharing, chipmakers are interested in sharing information from the bottom up to make better decisions at the organizational level with regard to quality and reliability.
“Together with customers, we are creating a way to integrate data from the equipment to the enterprise level or ERP (enterprise resource planning) system. There are not many companies that can instrument a robot or a piece of equipment, integrate the data efficiently, then move it up to the application level, integrate again, then move it up to an ERP system,” said Ranjan Chatterjee, vice president of smart factory solutions at PDF Solutions. “So depending on if you are the CEO or the factory manager or an operator, there is a specific dashboard for you.”
There is no shortage of possible applications. “We see a lot of opportunities for AI models, and there are a lot more to come that we don’t even think about, because of the ingenuity of engineers combined with the exponential progression of the new tools,” said Dieter Rathei, CEO of DR Yield. Rathei emphasized that having access to a data from all sources is foundational to using AI models because training requires a large cache of data. Once the data is cleaned and structured, AI or ML models can help detect systematic defects, for instance, from lithography patterning variation, that may have eluded inline operators and engineers.
More data, less time
The amount of test data being analyzed for yield learning is growing rapidly, fueled by the increasing complexity of SoCs. “We are gathering a lot of data for yield learning, and part of that focuses on test and diagnosis,” said Marc Hutner, director of product management for Yield Learning Solutions at Siemens EDA. “For a number of our customers we collect all of the logic test data — so ATPG scan data — and we analyze it. They have goals, such as, they must get that data analyzed within a certain number of hours, or a day from when the packaged devices gets tested. Then they can take a look at the trends in that data. That can reveal a systematic problem across that material. And then they can start to determine whether there is a problem in the line or with the die itself. It also can be a test instrumentation or contact issue, for example, caused by debris buildup on the tester pins.”
Scan diagnostics also can help provide greater test coverage and reduce test time by incorporating edge or near-edge machine learning models, which are readily available in the industry. Edge AI/ML methods are actively being deployed as part of expanding the highly automated test floor capabilities.
In the face of ever-shrinking process windows and the sub-ppm defectivity targets, chipmakers continually are improving the design-to-test processes to ensure maximum efficiency during new product introduction, yield ramping, and into high-volume manufacturing.
Tracing electrical failures to the process tool level has tremendous advantages in terms of excursion detection and preventing additional failed die. “One of our customers experienced a few lots failing at wafer electrical test,” said Melvin Lee Wei Heng, senior manager for enterprise software applications engineering at Onto Innovation. “We performed the equipment commonality analysis, a one-way analysis of variance (ANOVA), and were able to pinpoint the responsible process step and then a specific etch tool. Analysis of FDC signals on the etch tool uncovered a chuck temperature trend that affected lots being processed.”
Wafer analysis of these lots pointed to an abnormal backside coating that wasn’t detected by defect scan monitoring upstream. “Once we identified the issue, we worked on implementing the solution,” Wei Heng said. “We tightened the FDC chuck temperature parameter to monitor wafers with abnormal backside coating. By doing so, the customer was able to prevent that particular test failure on multiple process tools. The tool interdicts when an abnormal chuck temperature is detected.”
In semiconductor test, the changes can be even more dramatic, driven by moves to smaller device geometries, advanced packaging, and chiplet-based designs.
“Those requirements are driving us to change the nature of the type of analytics that we do, both in terms of the software and the hardware infrastructure,” said Ken Butler, senior director of business development in the ACS data analytics platform group at Advantest. “We’re getting to the point where people really want to do very advanced analytics in the context of production test. So I’m going to apply device test, collect a bunch of data, and run this AI or ML algorithm that’s going to make some determination about the health of the device. Maybe I should apply more test content to understand the data better. Or maybe I need to do diagnostics because it looks like it’s a failure and I need to understand why that happened. Or maybe I’m going to declare this part an outlier, because it looks different than all the parts that just finished testing, so I have to treat it differently based on the analytics.”
Others agree. “Where we see opportunity is a class of problems that really benefit from real-time data and real-time processing,” said Regan Mills, vice president of marketing and general manager of the SoC unit at Teradyne. “As you go through the test flow, you get to a point in time where you need to make some decisions about an individual device. Those decisions could be that you have one-time programming of the device that you want to do as fast as possible to reach the best performance. And obviously, you make better decisions about whether the device is good or bad. Or maybe you’re speed-grading. The key is that you want to use not only the information you’re getting from that device, but the information that you’ve collected from its peers over time. So you’re making a broader decision. And you’re using aggregated data in a way that hasn’t typically been done before.”
Both Teradyne and Advantest have developed open architectures for their data and analytics solutions. As a result, design data can be brought into the equation, as well as equipment-based monitoring data, which can include in-die monitoring analytics or proprietary data analytics programs. The chipmakers then can integrate the tools they want to use in an ongoing manner.
Fig. 2: Open analytics solution can provide local test optimization and rapid data analysis in a secure, bi-directional data stream. Source: Teradyne
On-die monitors or sensors provide a critical piece in the silicon lifecycle management puzzle, as well. To facilitation SLM, sensors are placed on the chip to monitor performance characteristics such as PVT (process, voltage and temperature) monitors or sensors for measuring timing margin and fluctuations of noise in complex, digital SoCs.
“Your eventual goal is to make sure your chip will operate properly given its intrinsic and operational conditions,” said Noam Brousard, vice president of solutions engineering at proteanTecs. “To really get a clear picture of this, it is necessary to enhance your visibility beyond monitoring just temperature and look at the actual circuit logic under these circumstances — during testing, but also in actual field functionality. We’re going to monitor the margin to failure of the logic paths that most limit performance scaling, because those are the most susceptible to failing. For example, temperature changes the character of the propagation in the chip, and this can cause a low slack path to fail. During testing and verification, it is important to not only capture failures, but also devices with such low marginality that might fail early on in the field, despite passing testing.”
Conclusion
Pressure is growing to attain nearly defect-free devices, despite shrinking features, process variability, systematic failures, and the move to 3D-ICs, which make the benefits of silicon lifecycle management much more attractive. These include optimizing device performance, catching more marginal failures, and a fuller picture around root-cause analysis. As tester companies build out their real-time analytics capabilities, critical testing results are being integrated to the left with IC design, and to the right with assembly operations, final system testing and in-field use.
Now that AI and high-performance computing capabilities are widely available, the goal of connecting all the data from design, through manufacturing, testing, and on to the system level is more feasible than ever. Nonetheless, the task of combining data from disparate sources means building infrastructure, properly extracting the data, and verifying the data to be crunched by machine learning models. These models need to be maintained, updated, and optimized. Chipmakers and the equipment and software communities need to perform these tasks rapidly in order to meet the time to market goals of the devices. Adoption of SLM may very well hinge on the ability to boost reliability and yield in ways that are not currently accessible.
Meanwhile, changes are happening up and down the supply chain. “You can see the foundries are becoming a little bit more like the EMS firms. EMS companies are now looking at buying OSATs to move up the value chain, and the foundries are starting to do some back-end tasks,” said PDF’s Chatterjee. “So there’s a big shift in the industry. That is why the platform is important, because most people do not want to build these integrations every time. They can re-use what they have, and they can share data from the semi front end to the semi back end to the OSATs to the system — what I call sand to landfill or recycling — and the data will travel with the wafer or die.”
Related Reading
AI/ML’s Role In Design And Test Expands
But it’s not always clear where it works best or how it will impact design-to-test time.
3.5D: The Great Compromise
Pros and cons of a middle-ground chiplet assembly that combines 2.5D and 3D-IC.
Leave a Reply