Data from on-chip monitors can help predict and prevent failures, as well as improve design, manufacturing, and testing processes.
The ability to capture, process, and analyze data in the field is transforming semiconductor metrology and testing, providing invaluable insight into a product’s performance in real-time and under real-world conditions and use cases.
Historically, data that encapsulates parameters such as power consumption, temperature, voltages, currents, timing, and other characteristics, was confined to diagnostics, testing, and verification in the lab and in the fab. However, recent advancements in AI/ML, combined with increasing needs for product reliability and performance in critical applications – such as autonomous driving, aerospace, and medical implants – are pushing IC manufacturers to integrate data from on-chip monitors into the end devices themselves, where that data can be leveraged to prevent failures and improve reliability over a device’s projected lifetime.
“If you’re patterning a device today, you’re looking at the physical attributes of the CD and the thickness of the films that you’re placing down, but you have no idea what that device is eventually going to do,” says Eli Roth, smart manufacturing product manager at Teradyne. “Teradyne might test a final device, but we have no idea what’s going on in the fourth layer deposition, for example, and how that could affect the quality of that device in its real-world application. I can easily envision a case where, by using ‘edge data,’ we discover deposition layer seven copper really matters in a particular application. It’s a key parameter of that device. So how does that feed back to your manufacturing flow? It’s not hard to envision how that would be valuable.”
By incorporating this so-called edge data into the design phase, engineers can make data-driven adjustments based on real-world and real-time performance. For example, by examining power consumption patterns from edge monitors, engineers can pinpoint inefficiencies, paving the way for the creation of more energy-efficient designs. In addition, constant monitoring of parameters like temperature and voltage can provide insights into potential thermal management and power supply challenges, prompting proactive software/firmware updates and interventions to avoid failure.
That’s the vision, at least. But synchronizing and integrating all of these capabilities will require changes in processes, industry infrastructure, standards, and new protocols for capturing this data and delivering it back to semiconductor manufacturers. In addition, it requires sorting, processing, and analyzing an enormous volume of data.
“If you want to use data better, then you have to know what your analytics strategy is,” says Ken Butler, senior director of business development at Advantest. “Manufacturers are gunning for extremely high quality and reliability with extremely low parts-per-billion error rates, and they are more and more turning to analytics to solve that problem. You can only throw so much test time at a device and still be able to compete economically. So you use analytics to better optimize your test resources and be able to detect failures before they become an issue for your downstream customer.”
Good data, better results
Improving end-device reliability is fundamentally anchored to the trustworthiness of the data received by the manufacturer. The quality of data is paramount, because implementing changes based on this data can be costly. Inaccuracies can lead to decisions that result in expensive mistakes, potentially costing millions of dollars. Figuring out which data is good, and how best to utilize it, becomes more difficult as the volume of data increases.
“The challenge is integrating all the data, including defects from metrology, tool conditions, FDC data, test data, etc. Once that is available, then you have a really interesting playground for modeling and developing algorithms,” says Dieter Rathei, CEO of DR Yield. “The sad part about this is that the first task is the hardest one — to get all the high-quality data with the right metadata into the system — clean data free of measurement errors that can send you the wrong direction or make your data less valuable. Unfortunately, maybe 90% of the work is getting the data into the systems. And then comes the fun part, to correlate the data and develop new algorithms. We are doing all of this work.”
The ultimate vision is end-to-end analysis, spanning from early in the design process all the way through manufacturing and into the field. “Data and AI are two sides of the same coin,” says Shankar Krishnamoorthy, general manager of the EDA Group at Synopsys. “The expectation is really around rapid reduction in root cause analysis time. If it is a manufacturing yield issue or a parametric yield issue, you would use the data analytics. You would zoom into the outliers. You would essentially tie that back to where in the design these failing paths are showing up and then be able to quickly find the root cause. If it’s a library characterization issue, it’s a design robustness issue or a process variation issue.”
At the heart of this shift is data, and that data needs to be cleaned and sorted quickly. “You now have these big chips that do a lot of processing, and that can be reconfigured remotely, but that requires reliable data,” says Christophe Bianchi, chief technologist at Ansys. “Automotive applications require under one defect per billion parts, but it’s also true with autonomous driving. Those safety guidelines call for one event per billion kilometers. So you have to simulate a billion kilometers every time you upgrade the code — five or six times per year over-the-air on those cars — and you have to guarantee it. That’s not going to happen without a way to feed back information about what’s really happening at the system level.”
Automotive OEMs and suppliers are particularly focused on reliability, for obvious reasons. “The automotive industry is pushing for a one-part-per-billion defect rate,” says Teradyne’s Roth. “Everybody’s scrambling for any way to find a defect sooner, but we all know that additional testing is going to cost more money. If you’re adding complexity, adding test coverage, adding quality requirements, that generally is going to mean more testing, more screening, and that’s eating into margin, it’s increasing the cost. It’s oftentimes a space limitation, as well. You’re trying to compress as much functionality into the assigned physical area of the die as you can. But if you add a bunch of instrumentation that’s not delivering functionality, you’re not gaining any value. There’s definitely a tradeoff.”
In automotive and other safety-critical applications, failures can have legal consequences. “The problem is probably broader than the data,” says Ansys’ Bianchi. “There is also the liability attached to it. If I use edge data to make a decision that has an impact on the safety or on the risk assessment of the car, how does that liability transfer from one entity to another? When we look at autonomous driving, for instance, guaranteeing that it’s a one-per-billion defect is the transfer of liability. If I can prove one defect per billion from the analysis of edge data, then if the chip fails, it’s the use of it that is at fault, not the chip.”
Fig 1: An example of how edge data can be used to predict remaining useful life (RUL) in a PCB Health Monitor. Source: Ansys
Data overload
While the utilization of edge data in design and manufacturing is promising, multiple challenges still need to be addressed, such as managing the vast and continuous stream of data produced by edge devices and the overall telemetry process that delivers the data securely and reliably.
There is also the issue of bandwidth. Generating data is one thing, but collecting it, packetizing it, and transmitting it back up the value chain is another. A new automobile may have more than a thousand ICs and sensors, each generating its own telemetry data. The U.S. auto industry alone sells nearly 14 million vehicles a year, and more than 67 million are sold worldwide.
“It’s way too much data,” says Bianchi. “It makes more sense to process the data locally with some little AI to extract the right information in a manageable form at a system level or at the decision taking place. I would not look at a sensor, for example, as just a single entity measuring a given physics like temperature or vibration or voltage. It will most likely contain its own edge compute core to transform physics it measures into a pre-compressed and pre-processed relevant data.”
Analysis of edge data can optimize test time and resources by targeting where a given IC or device needs the most testing. It also can run test analyses to turn the large amounts of data into useful bits of information, which can reduce bandwidth requirements, as well as power and memory/storage requirements. But test edge analysis is not just about collecting data. It’s also about capturing performance and identifying problems right at the edge.
Then there are the telemetry circuits, or process monitors, that keep track of timing or resistance or other parameters in real-time. The insight those monitors provide ensures that any deviations from the norm can be promptly detected and addressed, enhancing the reliability and performance of the IC.
Moreover, these sensors play a pivotal role in predictive maintenance. By continuously monitoring the chip’s parameters, they can predict potential failures or malfunctions, enabling proactive measures to be taken before any significant damage occurs. This not only extends the lifespan of the IC but also reduces downtime and associated costs.
The industry is already seeing notable improvements using on-chip edge devices measurements, margin modeling, testing, and AI guided analytics (see figure 2). In this example, classic parts average testing shows all parts are within the specification limit, but on-chip monitors reveals dynamic differences in actual quiescent current (IDDQ) relative to the estimated value, which are caught by the combination of proper data, model and test protocols.
Fig. 2: Leakage outliers that were returned as device failures using proteanTecs’ agents and applied ML models, deployed on Teradyne testers for IDDQ testing. Source: Teradyne/proteanTecs
“It’s lots of data from lots of places,” adds Teradyne’s Roth. “That’s where AI and ML model is really handy — to try to analyze all these data pieces that come in and produce the specific data that actually matters for design and manufacturing processes. There’s still a lot of sophistication going into when you’re going to apply that model or not, but it’s making strides all the time.”
“In addition to high sampling, traceability across multiple process steps is critical to identify the root cause,” said Frank Chen, director, Applications and Product Management at Bruker. “For example, the symptom may be non-wet issues from die warpage that appear in strip-form during die-attach. With traceability, it was possible to link the failed dies to the edge of the wafer where the dies were thinner and more susceptible to warpage. Providing all this data gives valuable insight to quickly debug new processes and monitor stable ones for excursions.”
Security and integrity
One of the big challenges with edge data is security. Silicon data often contains sensitive information about device capabilities, layout, performance characteristics, and operating conditions.
“You need lots of data to build interesting models and interesting applications,” says Synopsys’ Krishnamoorthy. The challenge here will be managing the data and understanding who gets access to what, and that becomes a bigger challenge as you go forward.”
The entire design-through-manufacturing ecosystem has been struggling with how to ensure proper data anonymization, encryption, and data governance to protect information from unauthorized access or misuse.
“Security is a key area in pretty much all the products that we’re offering,” says Advantest’s Butler. “We have it first and foremost in mind. The data from the edge is critical, but you don’t want it getting in the wrong hands. In many cases, even the analytics are considered proprietary and part of our customers’ competitive advantage, and they don’t want that to escape either.”
Beyond the immediate concerns of data encryption and anonymization, the integrity of edge data is also paramount. It’s not just about protecting data from external threats, but also ensuring that the data is structured appropriately and remains unaltered throughout its lifecycle. Data corruption, whether intentional or accidental, can compromise an entire system.
To address this, manufacturers are implementing robust mechanisms for data validation and verification, ensuring that the data they are working with remains genuine and hasn’t been manipulated. Manufacturers also work with device manufacturers to prioritize the most important data. “We spend a lot of time making sure that not only is the data clean, but also is it correlated and connected to the right data sets, and identifying data as low, medium or high value,” says Danielle Baptiste, vice president and general manager of Enterprise Software at Onto Innovation. She added that for fabless companies, data traceability is becoming a high priority.
Data security extends from the metrology or tester tool level to the chip in the car or server or medical device, including over-the-air software updates. “In the semiconductor space, we look at security from our own very narrow view of the world about preventing hacking into the system,” says Ansys’s Bianchi. “But there’s a lot of effort going into preventing automation attack where I have an encryption system in my chip, or have a communication access to the car, which unlocks the car, or have a wireless access that can change the software uploaded onto my navigation system, or my autonomous driving system, that takes control the car. All of that is extremely complicated.”
Finally, the transmission of edge data presents its own set of security challenges. Data in motion is vulnerable to interception, and not every player in the semiconductor ecosystem addresses security with the same level of intensity or focus.
Standards
Currently, the lack of standardization in edge data telemetry, storage, and security poses challenges for companies and researchers alike. Different manufacturers and platforms often employ varied data formats, retrieval methods, and sharing protocols. This inconsistency not only hampers interoperability between systems but also complicates the process of safely retrieving useful data from diverse edge devices while securing valuable IP from competitors whose dies may be sharing the same heterogenous platform.
“Data management and security is simple when we are on the test floor,” says Nir Sever, senior director of business development at proteanTecs. “It gets way more interesting when we are running in the field. There are some connected edge applications, like cars and phones, that do transmit telemetry information, but it is still uncharted territory. There is no standardized method for doing that, so we are still in the infancy of telemetry from the edge.”
Standardization is further complicated by the increasing complexity of chip designs and the development of heterogenous packaging and chiplets. Today’s chips are not just about packing more transistors into a smaller space. They involve integrating diverse components like CPUs, GPUs, memory, and specialized accelerators. But there will likely be a limited number of dedicated I/O interfaces for edge data telemetry and data pulled from different components may share interfaces, which makes preventing high-value data leakage a challenge.
“Eventually the market will settle on a set of standards,” says Teradyne’s Roth. “But I’m seeing that customers, or third-party agents, or anybody in this space, are finding value in their particular data models and data strategies. Companies see so much value in their model that there’s not a lot of inertia to give that away, because they think that’s their value proposition. There’s just so many different players and so many disparate types of data they want to look at that getting to a standard will be a challenge.”
Brad Perkins, product line director at Nordson Test & Inspection, believes edge data telemetry will likely remain a proprietary format. “If we look at SEMI or NIST standards, every company does it a little bit their own way. They reference the standard, but they’re only 80% adhered to it. With this sensitivity of data, I don’t believe there’ll be a convergence to a universal standard. Companies will have their own unique platforms, and everybody will have their own cryptocurrency equivalent of communication protocols, so that only a TSMC protocol, for example, could be unencrypted by TSMC.”
Whether a common standard will be developed, or whether edge data gathering, encryption, and transmission will remain in proprietary silos, isn’t clear. For a robust edge data ecosystem to develop, however, companies will need to access data from other companies in the value chain, and transmit it without compromising proprietary information. This will be a challenge. It demands the establishment of robust permissions and authentication systems at the chip level. These systems must be integrated throughout the design-to-manufacturing-to-application lifecycle, monitored for the device’s lifespan, and validated at multiple yet-to-be-determined checkpoints.
“Over time, there might evolve standards around what type of data is allowed to be shared and what’s not,” adds proteanTecs’ Sever. “In the next few years, there will be more and more usage of telemetry information coming from connected edge devices, such as cars, and some of that data might be shared. But for now, every data language is different. So even if there was a protocol to transmit data, the data itself cannot be understood unless you generate the data.”
Data ownership
While there’s a pressing need to address the challenges of ensuring the protection and facilitation of proprietary data streams from various customers, this raises another pivotal question for which the industry has yet to find a unified answer — who owns the data? This is a contentious issue, especially given the collaborative nature of modern chip design and manufacturing. There is an increasingly high level of interdependency between companies, and the number of those interactions is growing as chip complexity increases, and as chip designs become increasingly heterogeneous.
“If I integrate the AI from a memory manufacturer, or from somebody who provides analog chiplets, for example, what information do I gain access to?” asks Butler. “And how I test it in the context of my system that I’m integrating with all these chips from various sources? And are they willing to allow me to have adequate access to all that data to be able to fully debug a part to chase down any issues? Those are big problems that the industry is going have to wrestle with.”
As companies work together, sharing resources, tools, and expertise, the lines of data ownership can become blurred. Is the data owned by the consumer who owns the edge device, the company that generated the data, the one that processed it, or the one that utilizes it in their design and production facilities? The stakes are high, as this data often contains intellectual property, design secrets, and proprietary methodologies that can give a competitive edge in the market.
Furthermore, with the advent of cloud-based platforms and third-party analytics tools, the data landscape becomes even more intricate. When data is stored or processed on external platforms, the question of ownership extends to these third-party entities. Clear contractual agreements, transparent data handling policies, and robust encryption methods are essential to ensure that data ownership rights are preserved.
“One way to answer the question of who owns the data is to ask, what is the data?” says Perkins. “Well, the data is what’s used to train the AI system that’s interpreting the data. So our customer that generates the data owns the raw data, but we own the IP to the models that we create for exporting the results from that. We can have agreements with customers that say, ‘You can be part of a big data pool and contribute your data to that and get the benefit of larger sets of data, or you can have a private model that’s just based on your data and not commingled with anyone else’s.’”
But there also is widespread uncertainty about what data can be shared. “A big challenge that we’re looking at now is that concern of data,” adds Roth. “There’s a lot of data hoarding going on simply because people don’t know yet what the value of that data is, and they’re afraid to let it go.”
Conclusion
Navigating the emerging ecosystem of edge data for IC manufacturing requires a delicate balance of innovation, collaboration, and regulation. As companies work together, sharing resources, tools, and expertise, establishing clear boundaries for data ownership and usage becomes paramount.
Without a unified approach, there’s a risk of data mismanagement and leakage, potential intellectual property breaches, and missed opportunities for optimization. The semiconductor industry must prioritize the development of comprehensive standards and protocols that not only facilitate seamless collaboration, but also ensure the protection and integrity of proprietary data. Only then can the full potential of edge data metrology be realized.
—Ed Sperling and Laura Peters contributed to this report.
Related Reading
Using AI To Improve Metrology Tooling
Virtual metrology shows benefits in limited trials, but much work still needs to be done.
Using Generative AI To Connect Lab To Fab Test
NI’s CTO looks toward an intelligent and unified data model as a critical element in future test.
Leave a Reply