Driven by a plethora of benefits, data sharing is gradually becoming a “must have” for advanced device nodes and multi-die assemblies.
Semiconductor companies increasingly need to share data to solve problems faster, boost yield, and trace the root cause of failed devices. But to make that work, companies need assurances that their data will be secure, free from data leaks that could result in the loss of valuable IP.
Data sharing is becoming critical at leading device nodes, where process variability is starting to consume ever-greater portions of the process margin. For instance, the behavior of nanosheet transistors is increasingly affected by issues such as line-edge roughness and EUV stochastic defects, as well as backside power parasitics.
Multi-die assemblies add their own sense of urgency. Chiplets may be manufactured by different foundries or developed by different vendors. Systems companies need to know more about each individual chiplet because advanced packaging introduces new failure modes as it couples front-end variation with assembly yield.
“We can predict the yield of a device downstream, and that’s becoming very interesting in heterogeneous packaging,” said Eli Roth, product manager of Smart Manufacturing at Teradyne. “If you can predict that a die is likely to fail before you package it, you can save all those other dies as well. I see that becoming a bigger trend going forward.”
Sharing data is not a new problem. Fabless companies and EDA vendors have been lobbying for more access to foundry data for decades. But there is a growing sense of urgency at the leading edge, where tolerances are shrinking and the cost of failure is rising, and there is a growing acceptance across the ecosystem that this is no longer optional. Recent solutions revolve around anonymizing data at the process, test, or metrology level, so that cleaned and structured data can be managed by a data governance specialist who controls what data is shared, when, how, and with what level of security.
Failure analysis, predictive analytics, and controlling test cost
Data sharing is an essential vehicle for rapid analysis of field failures or RMAs. “Data concerning failure candidates, including layout data, can be shared securely between a fabless company and a foundry to facilitate failure analysis,” said Randy Fish, product line director for SLM at Synopsys. The level of data sharing amongst different parties is typically decided at the contractual stage, for instance, between a fabless firm and a foundry. Fabless firms typically want access to fine-grained test data and failure analysis data, including the voltage and temperature conditions of failure. Fish noted that foundries may share Wafer Acceptance Test (WAT) data with their users on a per-wafer basis.
In the absence of data sharing, new products can miss their PPA targets or take longer to reach production. Once a device is in production, there’s often a desire to transfer electrical test data from the IDM or OSAT to the cloud for cloud-based analytics. “One clear example of predictive capability in production today is per-chip Vddmin prediction for static Vddmin setting,” said Eidan Mendelsohn, vice president of product engineering at proteanTecs. “Models trained in the cloud are deployed at the tester to predict the minimum safe operating voltage for each device, based on data from agents that are highly sensitive to process variance. This has enabled customers to reduce test time and lower operating voltage, delivering measurable power savings without sacrificing reliability.”
Other predictive techniques play a critical role in controlling test costs by preventing expensive failures further down the production line. “Predictive techniques aim to fail or bin parts as early as possible to avoid expensive failures at FT [final test] or SLT [system-level test],” said Fish. Other binning becomes clear following high-temperature operating lifetime (HTOL) tests and wafer maps. “Reliability prediction can be made based on data like HTOL and spatial indicators, such as “Good Die, Bad Neighborhood,” that identify higher failure risk dies. Predictions of parametric values are important, as well.”
“At wafer sort, speed, power, and Vmin, predictions are made to potentially eliminate certain tests later,” said Fish.
Data centers are already reaping specific benefits of data sharing. “In the field, data sharing enables fast RMA root cause analysis by comparing system-level behavior against production fingerprints,” said Mendelsohn. “Data centers use this approach to trace field failures back to latent defects that were present, but undetectable, at production test, significantly reducing investigation time and preventing repeat escapes.”
Indeed, latent defects are becoming a huge problem at leading device nodes. “During mass production, shared data analytics enables actionable insights,” Mendelsohn said. “We’ve had a customer use wafer-level and lot-level analytics to identify spatial timing margin anomalies that would otherwise pass traditional tests, preventing the shipment of marginal material.”
In general, the benefits of data sharing involve yield improvement and producing more good wafers in the shortest production time (cycle time). “Probably the most prevalent [benefit] is improving yield, improving efficiency, and improving process flows,” said Roth. “We’re going to start getting into yield protection in the future, and hopefully someday we’re going to be able to connect process and test results.”
Roth noted that the priority depends on the customer. “You’ve got IDMs that care about everything. Foundries care more about operational efficiency and throughput than process issues. And the fabless prioritize yield.”
Yield protection comes in play when, for instance, a tester probe needle gets dirty or overheats. “So I can actually lose yield through the tester if I’m not monitoring the condition of the probe needles,” he said. “I can end up scrapping some parts that are otherwise good.”
Finding data correlations that may not otherwise be apparent to product and process engineers requires free-flowing communications, not the protection of data in silos. “The primary benefit of data sharing is the ability to establish true end-to-end correlation across the semiconductor lifecycle,” said Mendelsohn. “When data is shared from design through test, production, and field operation, it becomes possible to see how early indicators translate into later outcomes. This stage-to-stage correlation is fundamental for identifying issues early and acting on them before they become yield loss, quality escapes, performance waste, or field failures.”
Based on models, which can include digital twins, data sharing enables predictive capabilities that go well beyond predictive maintenance on tools. Such capabilities are enabled by machine learning-based analytics.
“The adoption of AI machine learning in semiconductor testing has opened the door to a wide range of new prediction capabilities that are not limited to maintenance,” said Roberto Colecchia, ACS product marketing manager at Advantest. He detailed a few use scenarios, including:
Advanced analytics and AI play a major role in driving the need for data sharing because they rely on large data sets and information across the value chain — design, manufacturing, test, and assembly — to optimize yield faster, improve preventive maintenance routines, and drill down to the root cause of field failures.
Why share data?
The semiconductor industry has been slow to share data primarily due to concerns over IP theft of test, yield or other sensitive manufacturing data. The industry is working around these concerns by turning to third-party platforms that form a manufacturing connectivity network. Examples include PDF Solutions’ Sapien/secureWISE platform, NI’s SystemLink, and the Athinia data collaboration hub (a partnership between Merck and Palantir), which provide secure environments for sharing data while protecting confidential information.

Fig. 1: A configurable manufacturing platform is designed to connect any data source, application, device, or tool, regardless of location or operating system. Source: PDF Solutions
“One reason data sharing remains limited today is that organizations are only now recognizing the opportunity to modernize and connect their data ecosystems,” said Advantest’s Colecchia. “By sharing data across different test insertions, manufacturers unlock deeper visibility into their test processes, enabling them to spot trends, anticipate failures, and streamline operations. This level of data integration not only boosts yield and lowers test costs but also elevates overall device quality while fueling innovation and strengthening competitiveness across the semiconductor industry.”
Colecchia highlighted applications where data sharing is taking off. One is adaptive test, where test routines are altered either to reduce cost or improve device quality. Data sharing is also fairly common as a means of optimizing design-for-test. And with the emergence of multi-die assemblies, data sharing is becoming essential among fabs, fabless companies, and OSATs.
Clean, structured data
The first step on the path to sharing data involves securing and “cleaning” the data.
“The first part is securing the data, so making sure that the connection and the pipe is all secure,” said Ranjan Chatterjee, vice president and general manager for Smart Factory at PDF Solutions. “The other part is making sure that clean data, for example, being sent to an OSAT from a fabless company, has the right context behind it. That secure data has to be in the right format to be sent to the other person, and this involves moving from systems of record to systems of action, which means that the data can be acted on. Unfortunately, right now people might send data across facilities over email, and then you don’t know if the person received it or not or whether it is in the right format to open it. So when they’re connected together in our manufacturing hub, you know that the person saw it and they did something with it. That’s the important part.”
At its core, a manufacturing hub is designed to deliver actionable insights. “These systems are enterprise systems with rules-based access control. That means that rather than asking for permission from another party, the system knows who has access to what and what level of security is guarding the data,” said Chatterjee.
Some people refer to this data preparation as a process of making data AI-ready. “It’s essential to structure the collected data in a way that the machine learning ingestion pipeline can readily consume. The data needs to be cleaned, prepared, and filtered. Only the relevant data should be utilized,” said Colecchia. He points to one report that describes data cleaning as the most time-consuming and least enjoyable task for data scientists, often consuming 60% to 80% of the total time spent on data-related tasks. [1]
Clean data does not contain duplications, missing values, or irrelevant data, according to Fish. Oftentimes, the engineering team has insufficient data to run an ML or statistical model. In such cases, synthetic data can be created from simulation results to supplement existing data while updating the model as production progresses.
Anonymized data means that it cannot be traced to sensitive raw data. “Our testers need to be able to take an action (like any piece of instrumentation or sensor that generates data), and then securely provide that data to wherever your data storage is going to be, for instance, in a text output file,” said Teradyne’s Roth. “It then needs to get transformed, usually, into some sort of standard data structure without being device-specific or piece of equipment-specific. Then it is made accessible by your statistical model or machine learning model in that structure. Once the model runs, the output can be sent back along the same path to the test equipment that can execute that action. If you’re doing feedforward/feedback, you might even want to do process improvement. So you need a data system that can ingest tabular data and transform it into the data model that makes sense for that platform. That could get into your MES system, material routing, and other data systems.”
Security measures, including encryption, occur during the cleaning process. “We are actually cleaning the data, and that is where we anonymize it,” said Chatterjee. Deployment of data sharing is headed up by a governance planner. “There is a hierarchy from the factory floor to the top floor. But there is also a hierarchy within the enterprise level, with a PLM system, an ERP system, and an MES system. That is also where you clean the data, correlate the data, and then make it available to the other systems, so this hierarchy goes away. The governance planner ensures different parties have the data that they need when they need it and it’s in the correct format.”
One of the impediments to data sharing is the wide variety of data formats among process tools, metrology systems, testers, etc. While SEMI groups actively work on compatibility, standards typically have a hard time keeping pace with industry needs. “IP protection is ensured through stringent security measures and industry agreements on standardizing information/data formats for metrology and test data,” said Synopsys’ Fish.
Probably the most protected data contains device specifics. “It’s tough to get device and process data out,” Roth said. “I’ve seen some process data that’s getting translated into a health score. For example, a tester could be at 95%, but then decreases, indicating an impending maintenance operation.” He noted that device data, such as test data, is the most fiercely protected, while process data for wafers is the second most guarded data, and equipment health data might be the least protected, relatively speaking.
On the test side, preventive maintenance is becoming more responsive. For example, probe needles in the past would be replaced following a set number of touchdowns. Now there is more monitoring of environmental conditions to determine exactly when a PM should occur rather than following a periodic schedule. The same is happening at the wafer processing tool level.
Some advanced predictive capabilities are in the works. “We’re also looking at the field return data,” said Roth. “What do their system-level tests look like preceding that factory return, and can we predict that failure would happen? We have developed some models predicting yield on devices, looking at wafer sort results and labeling them as ground truth values, then examining the full data set for a bunch of parts, and making a prediction of the SLT yield. I haven’t seen that in volume production yet.”
Roth added that ML models are not necessarily complex to implement. “A lot of other people are treating ML models as trade secrets, but the concept is not hard. If you’ve got a ground truth set of data — a full production lot of devices that yield all the way through — you can train your model and make predictions that way. The rub is going to be on parts that need a lot of ground truth, like, how can you train these models quickly?”
Do I want a digital twin?
In some cases, engineering teams are turning to digital twins to better understand semiconductor processes. “Digital twins are experiencing a resurgence of interest, along with other industry efforts to use data to make decisions across the entire value chain. Digital twins are being applied to manufacturing execution systems (MES) to optimize and model complex manufacturing environments, VR models that visualize tools in real-time, and physical models that simulate processes and predict results, among other applications,” said Sean King, product manager for enterprise software at Onto Innovation, in a blog. [2] “AI is a driving force, as many manufacturers seek to inject advanced AI into their digital twins to enable more dynamic modelling.”
King outlined important considerations when starting a digital twin project. “First, determine the problem at hand and the outcome you seek. Are you trying to capture how a process is working or how it should work? If there are conflicts between these two things, how can they be merged?”
King further suggests not adding any unnecessary components, such as detailed visualizations. “You need to determine if you have the right infrastructure and supporting systems to obtain, organize, and distribute data in your processes. When it comes to organizing data, you should make sure that the appropriate amount of time is spent ensuring initial data cleanliness and organization, while also making sure that data is programmatically stored to continually feed the models. As for model security and traceability, once developed, how can you be sure which model is in use?”
The holy grail in the semiconductor industry involves a digital twin of the devices themselves. “Where it gets more sensitive is if you’re starting to think about, how do I digital twin a device? Because if I can test and manufacture an entire device virtually before it goes to silicon, it would collapse the market in terms of time and costs. I’m part of a group at SEMI that’s working on just that,” said Roth.
What’s next?
Companies have made significant strides is cleaning and structuring data in preparation for data analytics. Because the fab manages so much data at many levels, integration and flow of secure data is best managed by an enterprise-wide system capable of handling massive amounts of data each day.
While secure access to test, metrology and process data is a work in progress, people can envision the next step, which makes better use of AI algorithms to simplify user interfaces to speed problem solving. “Once all these systems are tied together (ERP, MES, etc.), they are secure across companies and within the company,” said PDF Solutions’ Chatterjee. “Then you can use agentic AI to actually automate these workflows so that it doesn’t require the level of skill set that is required today. So as semiconductor manufacturing becomes more complex, you will not need the domain expertise that is required today. But at the same time, you can operate the various front-end paths, testing, and packaging.”
References
Related Stories
Zero-Trust Data Sharing Architectures Redefining Chip Manufacturing
Collaboration becomes necessary at advanced nodes, but implementation can be painful.
Is End-To-End Security Possible?
New regulations make this non-negotiable, but multi-die assemblies and more interactions at the edge are creating some huge hurdles.
Secure Handling Of Financial Data In Manufacturing
Data sharing becomes more challenging when AI and multi-die assemblies are involved.
Leave a Reply