Chip Monitoring And Test Collaborate

What’s driving tighter interaction between processes, and what’s next.


As on-chip monitoring becomes more prevalent in complex advanced-node ICs, it’s easy to question whether or not it conflicts with conventional silicon testing. It might even supplant such testing in the future. Or alternatively, they could interact, with each supporting the other.

“On-chip monitors provide fine-grained observability into effects and issues that are otherwise difficult or impossible to identify from conventional product test data, and therefore offer new opportunities not only for performance, yield and quality improvement, but also test optimization,” said Dennis Ciplickas, vice president of advanced solutions at PDF Solutions.

Interaction with test becomes the next logical step. “The industry is moving from static monitoring and siloed testing to continuous-loop and feedback monitoring, with testing that improves upon itself using a combination of rules and AI,” noted Keith Schaub, vice president of technology and strategy at Advantest.

It may be tempting simply to throw away devices that fail a test, but monitoring provides more insights into the possible causes of the failure. “There’s still a cost to collect data even from a defective chip,” said Robert Ruiz, product marketing director, functional safety practitioner, digital design group at Synopsys. “But in many cases, it’s worth the cost to improve the yield or to alter the design to improve the overall yield.”

Functional test and on-chip monitoring are learning not only to live with each other, but to complement each other and to work together. While tests confirm the structural integrity of a chip, concurrent monitoring can provide context, depth, and color to the test results. Together, they can provide a stronger basis for the ongoing field monitoring and in-system testing.

“You can’t really have the analytics without the tests because you still need to do structural testing,” said Lee Harrison, automotive IC test solutions manager at Siemens EDA. “And likewise, doing structural testing is great, but the analytics is giving you so much more data on the system than you would ever get from a pure structural test.”

Fusing these worlds together isn’t easy, though. “It’s definitely two big players that are learning how to play together,” observed Gal Carmel, general manager for automotive at proteanTecs. “But test has a lot to gain from monitoring, and vice versa.

Testing and monitoring as separate things… to an extent
Testing and monitoring have completely separate origins. The notion of testing is inherent in any manufacturing process of any kind. The goal is to ensure that whatever was built was done so correctly. That goes for semiconductors as well as any other manufacturing output.

The goal of testing is to define ahead of time what critical aspects of the product must be verified. Semiconductors are no different. But defining those tests can be extremely complex. The whole notion of coverage reflects the fact that it’s very hard to test absolutely everything. Add economics into the equation, balancing the cost of testing against the cost of failure, and it becomes even harder.

The result of testing is typically either a pass/fail determination, or perhaps a binning decision. Once the testing is complete, it’s over for the life of the product. Or at least, that’s usually the case — with one evolving new exception.

Automotive systems are now required to execute tests “in-system” — at power-on, at various intervals while in operation, and at power-off. This is to conform to safety requirements and to ensure that no latent damage or aging semiconductors put any of the people in the vehicle or nearby vehicles in danger.

Monitoring, by contrast, has originated based on a few different use cases. That’s because there are multiple types of monitoring, and they may serve different real-time purposes while also providing layered analytics.

Fig. 1: Monitoring (called “embedded analytics” by Siemens) ultimately sends its results to the cloud where most analytics engines reside. Source: Siemens EDA

Fig. 1: Monitoring (called “embedded analytics” by Siemens) ultimately sends its results to the cloud where most analytics engines reside. Source: Siemens EDA

Some forms of monitoring help to adjust real-time performance parameters as conditions like temperature or supply voltage change. Others are intended more for providing visibility into what’s happening on the chip as it executes its main mission.

For chips assembled within a system, monitoring can assess chip operation in the context of the full assembly. “The trends toward huge (full-reticle) dies, extremely thin device packages, and heterogeneous integration are leading to complex thermo-physical interactions that cause failures, such as SRAM bit flips at Vmin conditions,” said Ciplickas. “Those are difficult to stimulate during chip test and therefore may evade conventional quality screens.”

The kinds of information that monitoring can provide include software issues, security breaches, safety issues, bad bus transactions, and boot sequence problems. Extracting this kind of information requires normal operation rather than a test mode.

“In order to truly optimize chip performance and reliability during its lifetime, it’s imperative to understand the dynamic conditions and the parametric properties of the device,” said Amit Sanghani, vice president of engineering, digital design group at Synopsys. “This is achieved through the deployment of distributed sensing and monitoring fabrics throughout the die. However, equally important is the visibility, control and interaction with various key technologies and analytic engines across the various stages of the lifecycle in order to have a real impact on the chip’s performance, sustainability and health.”

That very fact distinguishes it from test, because test — almost by definition — utilizes special test modes for efficiency. Even in-system test may involve small parts of the system being taken offline briefly so that a test can be performed.

So monitoring provides a view not during some special mode (although it could do that too), but mainly during operation. Said another way, test is disruptive. Monitoring is not. “When you compare to traditional BiST, it (BiST) is more intrusive,” said Carmel. “Monitoring runs in the background, while the system is in operation, so there is no need for downtime.”

While test and monitoring might appear to be coming closer to each other in the context of in-system test, this very distinction keeps them as separate functions.

Monitoring is closely tied to analytics, which can include test data
The principal use of monitoring is to send data to the cloud, where it becomes part of a large corpus of data related to a specific device. And by “specific,” that doesn’t mean just a specific part number, but a specific serial number. That body of data contains the history of all of the monitoring data that has flowed up to the cloud since the part was first powered up.

Monitoring shows only so much, however. While different monitoring layers may successively add useful information when drilling down to find the root cause of a problem, the overall analytics benefit from any other data related to that specific part. “The more context you provide, the more levels of analysis and dissection we can provide,” explained Noam Brousard, vice president, systems at proteanTecs.

That can include fab inspection data, metadata from the equipment used to build the chip, wafer sort data, final test data, burn-in data — anything that someone might turn to in order to figure out what’s going on in a given situation.

“Much of the information that we upload is the test name, the test type, the stage in which it’s at,” said Brousard. “And when we get all that metadata together with a measurement, then we know when we can compare apples to apples and how we can propagate our insights throughout the different test stages.”

“We connect up from the OSAT [offshore assembly and test] during the testing,” said Guy Cortez, staff product marketing manager for silicon lifecycle management in Synopsys’ Digital Design Group. “Data can flow to the customer, and the customer can forward it to us into a designated server. Or it can come right from the OSAT directly to us.”

Exactly where the data is stored is up to the chipmaker. “We will access the data from the cloud for running analytics,” Cortez said. “If the customer decides they want to host all their data on the premises, that’s an option, too. They own the hardware in that case, and we still are able to get access to it.”

That means that test results need to be shipped up to a data center — cloud or on-premise — to become part of that body of data. This is starting to happen, although two scenarios suggest very different timing for the data transfer.

The simplest purpose for the data is to reside with all of the other data for reference when needed. It becomes part of the big-data trove that analytics engines can rummage through. As such, timing isn’t critical. So test results may be sent up a couple of times a day, daily, or even every few days. In these situations, they’re sent as a large batch.

Many monitoring sites can interpret test result files, such as those in STDF format, so they’re prepared to deal with both the syntax and semantics of those results even though they come from a very different world than the monitoring results. “We know how to accommodate many different frame formats or data formats,” noted Brousard.

Monitoring and testing working together
The second possible use for monitoring data is for making adaptive test decisions. For example, a passing device (according to the tester) that is marginal (according to the monitor) may be binned differently from a passing device that’s in the center of the intended distribution.

Fig. 1: Monitoring (called “embedded analytics” by Siemens) ultimately sends its results to the cloud where most analytics engines reside. Source: Siemens EDA

Fig. 2: Monitoring may be used for finer-grained binning than may be possible with test alone. Source: proteanTecs

“Process-corner sensors can shed light on performance variations that correlate with excessive current flow and power consumption,” explained Ciplickas. “Similar steps can be taken with on-chip voltage and temperature sensors. They can be used to drive dynamic test conditions and test-flow optimizations to avoid test-related yield losses.”

Test time, however, is money. So any decisions must come quickly, and a trip to the cloud for that decision is far too slow. “Test houses have reasons, such as latency, why they may not want to have decisions going up to the cloud, run some kind of compute, and come back with a result,” said Brousard.

That means that some small version of the monitoring analytics engine may reside on the tester for local decision-making. In such a situation, the data also would be sent to the cloud to become part of the data corpus, but the decisions won’t have to wait.

Automated test equipment (ATE) has limited computing capabilities, however. “There’s only so much you can compute on ATE equipment,” said Randy Fish, director of marketing for silicon lifecycle management in Synopsys’ Digital Design Group. If that’s exceeded, where a local decision is still needed, then a local server may be needed to support the real-time test-flow decisions. AI operations may require such a server.

Given local AI capabilities, it even may be possible to correlate early monitoring measurements with later test results. That could provide another decision point to shorten the test flow.

“On-chip sensor data is another valuable data source that can improve an overall test operation’s flow when included in a semantic data model and used to drive machine learning algorithms,” explained Ciplickas. Sensor data can be collected early in the test process and used to dynamically drive the tests performed for a given chip, wafer or lot. This data can be used with machine learning to implement ‘virtual test steps’ that predict the performance or results of downstream tests, saving time and reducing the cost of test.”

For instance, a certain test might be needed only on some chips. Monitoring data could identify which chips, with the other ones bypassing that test. Similarly, if those results indicate the device is likely to fail, then the testing can be discontinued — again shortening the test time and lowering test cost.

Ultimately, the chipmaker decides how to implement this. “It’s up to the customer’s use model how quickly they want to adapt the test environment,” noted Fish.

This is a situation where monitoring is augmenting test even as the test program is running, with the test program accessing monitoring data. The test program, however, may be capable of controlling the monitoring, as well.

Test can control monitoring
A test result, while useful, is relatively opaque. There’s a lot of context that may be missing. What else was going on during that test? Did some condition affect the test result in some way? The need for that information may not come immediately, but in the case of triaging a failure, it could be very useful.

For that, monitoring companies may work with test engineers to have them turn on certain monitors during specific tests, perhaps turning them off after. Throughout the test, accompanying monitoring data that can be correlated to the test can provide context and color during any post-analysis.

“What we’re doing is expanding monitoring into test,” said Brousard. “We are monitoring while you run your tests to provide much more new data than before. This allows you to detect more issues, understand the context and source, and, of course, cut test times significantly.”

ProteanTecs isn’t alone in this. “We provide numerous high-fidelity characterizations using on-chip sensors that interact and combine with test data to create a richer picture of device health and diagnostic depth for DUTs (devices under test),” said Ciplickas.

In order for the test program to be able to read monitoring data or control the monitors themselves, there must be commands available to the test program. “The more context you provide, the more levels of analysis and dissection we can provide,” explained Brousard.

APIs provide that access, meaning that monitoring instructions may become part of the test program itself. The test and monitoring results may be in separate files, or they may be communicated in separate packets, but metadata then helps to correlate which monitoring results came during which tests.

While the device is in operation, monitoring and test may interact less. At that time, any in-system testing is run not by some test program, but by using self-test functions that are self-sufficient. Those BiST operations don’t interact with any monitoring, and any concurrent monitoring may not have access to the BiST results.

In this case, however, both the test and monitoring are running at the behest of the system firmware. There’s nothing stopping that firmware from invoking both tests and monitoring at the same time with the intent of having the monitoring enrich the test result. “When you run your testing during the operational lifetime, it’s just another opportunity to use our visibility,” said Brousard. “This is done without disruption to system operation.”

BiST results tend to be pass/fail — often without specifying which specific component failed, so usually no specific measurement is associated with the test run. But the firmware could elect to upload the test results and accompanying monitoring data, both to color the result and to identify which chip failed. In this case, then, the firmware is acting as the test program to an extent.

Far from competing, it would appear that testing and monitoring are merging not to become one thing, but to weave around each other, extracting value from both. “On-chip sensor data should not be seen as an ‘island’ of data, but as another ‘arrow in the quiver’ in the overall test process,” said Ciplickas. “Collecting and organizing on-chip sensor data to ‘feed forward’ for test flow optimizations and to ‘feed backward’ for manufacturing, test, and even design improvements can be a big challenge, but one that comes with commensurate gains in performance, yield, and reliability.”

Others agree. “You need to have all of the critical components in place that provide the links spanning domains such as design, security, safety, manufacturing, debug and finally, to in-field operation of the chip. True predictive maintenance and optimization opportunities can only be realized if there is a tight feedback loop between the on-chip instruments and the various lifecycle data analytics engines,” said Synopsys’ Sanghani.

Leave a Reply

(Note: This name will be displayed publicly)