Gaps Emerge In Test And Analytics

Comparison data is required for understanding drift and AI changes, but that’s not so simple.


Sensor and process drift, increased design complexity, and continued optimization of circuitry throughout its lifetime are driving test and analytics in new directions, requiring a series of base comparisons against which equipment and processes can be measured.

In the design world this type of platform is called a digital twin, but in the test world there is no equivalent today. And as more AI/machine learning is added into processes, and as devices adapt over extended lifecycles, having a reference for data is becoming more important.

“For every AI tool, you at least need a basic database comparison,” said Subodh Kulkarni, CEO of CyberOptics. “One of the things we do is what we call autonomous image interpretation, or AI². Essentially what it is doing is taking every picture we are collecting and storing it based upon the features. Then we can see if a picture we captured is similar to a picture we captured earlier in the database.”

This is easier said than done for testing purposes, however. One big challenge is figuring out what is an acceptable starting point. In the past that reference point was the design specification, so as tools and sensors drift they could be recalibrated to account for that drift. But processes also can drift, and so can data generated by circuits as they age. And as devices get more complex and increasingly heterogeneous, understanding the impact of drift becomes more convoluted. Even though individual chips or IP still may be within spec, multiple chips in a package or system collectively may be out of spec.

AI, machine learning and deep learning add another level of complexity into the mix, because one of the main goals of these technologies is adaptation and optimization. As of today, most AI inferencing is opaque, so isolating what has changed over time often isn’t possible. Even where changes can be identified, it takes a lot of work to understand the data because those changes need to be mapped over time and in the context of other systems.

Having a point of reference in all these cases would be a big help. This is especially important in safety-critical or mission-critical applications, where testing needs to be done throughout the projected lifetime of a device. But if that device is used for a decade or more, those circuits will degrade in unique ways, depending upon use models, ambient conditions, and software upgrades that can impact both overall usage and system behavior.

“With AI, if changes are not explainable, today you need to retrain it,” said Michael Schuldenfrei, corporate technology fellow at OptimalPlus. “For many models, there is no way to explain what is happening, especially throughout the lifecycle of a chip or device. This is a good reason to bring AI into the analytics process. Even with issues like this, it’s still better than without AI because you can do things like separate environmental factors from the core of a machine. There also are ways to combine data to break down problems, so there is more than one model.”

Cloud vs. edge
Another problem is that not all test data will be analyzed in the same way or the same place. Traditionally, chips were tested for such things as power, voltage and temperature, then binned in production according to how they performed. While this type of testing will continue, in complex systems it’s not sufficient to ensure there are no failures in the field.

As chips are used for longer periods of time in automotive, industrial and medical applications, and even inside of data centers, it’s imperative to understand what causes defects after months or years of use in the field. It could be a flaw in the design, but it also can be a latent defect caused by a manufacturing process that goes undetected in inspection and manufacturing test.

“Analytics is at the core,” said David Park, vice president of marketing at PDF Solutions. “You need to test in context. Some of this can happen in the cloud, where you can do all sorts of tests and check for signatures. Some of it also can happen at the edge. For both of these you can apply machine learning to find the signature of the problem and take action. In the past, we’ve seen machine learning done in a sandbox. That’s a lot different than in production. You can have bad materials that throw everything out of whack.”

Some of this can be done real-time in production, as well, using a feedback loop. In that case, the data analysis needs to be done at the edge because even with the fastest cloud model, it takes time for data to travel to and from the cloud. But each has its strength. The cloud is very good for analyzing huge amounts of data to determine the exact batch, lot and wafer where problems showed up and what was different about that run versus another processing of another wafer. The key is to be able to span both worlds, because in analytics each has a role to play.

The big problem here has been access to data. This has improved somewhat over the past decade, particularly as the foundry business has consolidated and manufacturing processes became more unique to each foundry. But there still is a limit on how much data the foundries will give up, and how to trace that data back to the source if something goes wrong.

“All fabs are protective of their data,” said CyberOptics’ Kulkarni. “Our customers give us enough data to improve sensors. But you also can allow the user to fine-tune the weighting so the tool gets better with usage.”

Margin call
Until recently, adding margin into chips was the most common way to offset problems. Extra circuitry could be used in case problems crept in over its lifetime, basically adding a level of redundancy so that signals could be rerouted as needed.

Margin began running out of steam for many of the same reasons as Moore’s Law. Extra circuitry means signals must travel further through increasingly skinny wires, slowing performance and increasing heat. For a simple von Neumann architecture, this is relatively manageable. But in large, heterogeneous chips, understanding different components interact can add enormous complexity. And at the most advanced nodes the amount of power management infrastructure required to avoid thermal effects and various types of noise only exacerbates those concerns.

“A lot of people are concerned about guard-banding,” said Raanan Gewirtzman, chief business officer at proteanTecs. “In the past, you could just add more margin to fix a problem. Now there are more variables and tighter tolerances, but you can’t add more margin to chips and you can’t sufficiently test everything. So you need a better understanding of the way it’s supposed to be used, and you need to look at variability over time because it keeps morphing.”

This becomes even more difficult in automotive applications, where some of the AI logic is being developed at the most process advanced nodes. What is required is in-circuit monitoring.

“In-circuit monitoring and machine learning interpretation of the data generated allows you to continuously track parameters,” said Gewirtzman. “You do this periodically so you know in advance if something is going to shorten the life of a device. This is very important in a car, where there are more electronics and the electronics are based on advanced process nodes. More and more companies are developing semiconductors for automotive, including many startups, and for the whole industry this is a big transition. You need this new way of looking at the electronics from inside and guaranteeing that everything is operating at planned.”

But it’s also not just about margin. It’s about margin in the context of an increasingly complex device.

“In automotive, there are two big factors, lifespan and size,” said Ron Press, technology enablement director at Mentor, a Siemens Business. “Some of these systems have thousands of duplicate blocks, and today you have to allow test to identify the block that went bad and figure out how you’re going to use a spare one. But there are so many blocks and the chips are so big that this becomes a problem. The processor companies have been doing this for years. They decide what to spec it for, and maybe you have the same die with four cores and one doesn’t perform well. But now you have a thousand cores and there is no standard way companies can reconfigure them.”

The designs themselves are becoming so large, in fact, that in many cases they are bigger than a single reticle and need to be stitched together. “So now you have very precise testing, where you test everything before you put it in a car,” Press said. “You use a structural manufacturing test before the system test, and then you use BiST (built-in self-test). For AI applications, we’re seeing more and more of this. The designs are getting so big that there is a growing concern about being aware of what is happening. If it’s 10 times bigger, that’s even harder.”

And that monitoring needs to continue throughout the lifetime of the device, which adds yet another set of issues because all of that data needs to be compared to something.

Better data
Alongside of all of this, there needs to be a consistent way to assess data. Not all data is useful, and not all data is consistent over time.

“We need better methods for parsing and cleaning data, applying machine learning as early as characterization, enriching yield databases with MES (manufacturing execution system) data and other data sources including environmental and genealogical,” said John O’Donnell, CEO of yieldHUB. “We also need far greater scale in volume analysis, as well as a platform allowing closer collaboration along the supply chain. And we need to integrate all the tools required for new product introduction into a common platform.”

There are potential pitfalls in each of these steps, O’Donnell explained. For example, data from subcontractors varies, which can impact the parsing and cleaning of that data. And for using ML earlier in the flow, there needs to be better standardization for naming tests and how conditions are stored. And multi-die packaging makes enriching of yield data significantly more difficult.

Multi-die packaging can create problems across the supply chain because there are so many touch points in the manufacturing process. So even a good die may be damaged in shipping, packaging or handling. And finding a problem in a multi-die package is much more difficult.

“With a multi-chip packaging, you know the device failed, but was that because of the silicon or a wirebond or some other way of connecting the chips together,” said PDF’s Park. “A lot of companies are moving to a ‘More than Moore’ approach, whether it’s a system in package or multi-chip module because you get performance and power scaling, but without monolithic silicon. The challenge is that if you have 10 devices in a package, any one can cause a problem. Quality is only as strong as the weakest component.”

This provides multiple degrees of freedom for design teams, but it makes the job much harder on the back end.

“There’s a big challenge in manufacturing,” said CyberOptics’ Kulkarni. “You need tools that can do more than just analyze raw data. You need more intelligence in the different layers.”

This works in two directions. While most of the data is about identifying nano-level aberrations, it’s also important to use that data to develop a big-picture view. But that also requires the data to be scrubbed.

“Simple interfaces are needed that also allow commenting and uploading,” said yieldHUB’s O’Donnell. “You always will have people who fancy their own methods, including analysis scripts, so need to make sure any system accommodates extraction to a third-party system.”

And finally, all of this can vary greatly by market, by application, and by design. There are more variables in every complex design, and with new technologies such as 5G, there are so many moving parts that even a reference model isn’t good for long periods of time. In that case, innovation and stop-gap approaches are still necessary.

“In many cases, depending on the frequency, it becomes hard to get a proper sense of ultimately what the bandwidth of the device would be just because you have tiny probes and that can change the impedance,” said Alejandro Buritica, senior solutions marketing manager at National Instruments. “It’s not quite the same as testing it at the package level, but at least you know that the control signals are working to turn on the amplifiers controlling dynamic behavior of gain states and phase states and things like. You know that once you package it, now you’re going to measure the proper performance of those going to actual RF pins that you would have exposed to. So in that case, you could potentially identify improperly manufactured wafers simply by their parametric behavior, knowing full well that you’re still not even coming close to assess the full RF performance of the device.”

What all of this points to is more customization and granularity of the analytics around data. No two devices are exactly the same, and data developed for one isn’t necessarily optimized for another. But that data also can help define the parameters within which chips should operate, and help predict how they will work over time.

“If you look at EDA tools, usually those tools represent knowledge that has been acquired over time, so the tools are the same for everyone,” said proteanTecs’ Gewirtzman. “Now, because everything is more data-driven, you can perfect tools based on data. So if you can improve a tool based on silicon correlation, for example, you can improve the libraries.”

—Susan Rambo contributed to this report.

Leave a Reply

(Note: This name will be displayed publicly)