中文 English

Designing Chips For Test Data

Getting the data out is only part of the problem. Making sure it’s right is another challenge altogether.


Collecting data to determine the health of a chip throughout its lifecycle is becoming necessary as chips are used in more critical applications, but being able to access that data isn’t always so simple. It requires moving signals through a complex, sometimes unpredictable, and often hostile environment, which is a daunting challenge under the best of conditions.

There is a growing sense of urgency about improving the reliability of chips before it leaves the fab or packaging house, and a need for monitoring IC behavior and performance over longer lifetimes. In the past, this was a relatively straightforward exercise. A planar chip could be probed as necessary, and external leads could be hooked up to testing equipment to run a barrage of tests. But as chips become more heterogeneous and three-dimensional, and as density continues to increase, these kinds of approaches are no longer applicable. Leads are often buried, there are more elements to be tested, and more data from those tests needs to be analyzed. In some cases, data may need to be collected a decade or more after the chip leaves the fab.

Driving this shift are automotive OEMs, which are demanding improvements in reliability as more automated features are controlled by advanced-node chips, and large data centers, where a mix of processing elements inside of custom and off-the-shelf designs make the chips more energy-efficient but harder to test. And in the future, others will likely join that chorus from a number of nascent safety- and mission-critical domains.

“Customers want to monitor and track the performance and health of the device across its lifespan,” said Keith Schaub, vice president of technology and strategy at Advantest America. “They want to embed sensors inside the design itself, just like you put sensors in an automobile, and those sensors can generate data throughout the lifetime of the product — even during the testing. You can have those sensors generate various data at wafer, at package, at system, and then in the end device when consumers are using it. This is similar to what the mobile operators do with your cell phones, where that generates a bunch of data and they always ask if it’s okay to send the data back to the provider.”

In some cases, this requires extra circuitry to act as a conduit for data from a variety of sensors that are used to measure and monitor behavior the various elements. That circuitry needs to be baked into the design early enough to avoid problems later in the design flow, but not so early that the design changes significantly. The main concerns are ensuring the integrity of diagnostic data and ensuring the monitoring/test data can move without impediments, which can be anything from noise to electromagnetic interference.

“It’s important to be able to enable and access the monitors’ telemetry data without interference at any point,” said Noam Brousard, vice president of product at proteanTecs. “We can connect to a variety digital interfaces that allow data to be extracted at any stage, in production or in the field, by various hardware and software. But it’s critical to be able to collect this data at the right time and place. Otherwise, you may get data that doesn’t reflect the actual system performance.”

Today’s floor-planning and design-for-test (DFT) tools aren’t set up specifically to determine the best way to move test data. The floor-planning tools can help optimize layouts, and the DFT tools can help develop a strategy for how devices will be tested. But as chips become larger and more complex, these two worlds are converging.

“The one thing you really have to think about when you connect things up from the I/O pins — and really, from the tester to the scan chains — is throughput,” said Geir Eide, director for product management of DFT and Tessent Silicon Lifecycle Solutions at Siemens Digital Industries Software. “If you’ve got a 100-MHz limitation at the core level through these internal test structures, when they’re just wiring things up directly like this, you now also have the same frequency for the data coming in from the tester and being distributed across the chip. If we just focus on manufacturing tests, even though most instruments on the tester can tackle gigahertz, you’re leaving a lot of that capability on the table because you can’t really ship data through the chip fast enough.”

In advanced automotive chips, where the quality requirements are especially stringent and the need for timely data is essential, that can be a big problem. “You have a very short time window to deal with this,” Eide said. “Especially for many of the in-field requirements, traditionally you have to sacrifice quality for time. The test quality targets for manufacturing tests are much higher than what they are for in-field tests, but this gap is shrinking. Dealing with larger amounts of test data efficiently is a problem that is spreading from manufacturing test to in-system test. So having that highway across the chip to be able to send massive amounts of test data is becoming a requirement in-system tests, as well.”

There are a couple of approaches being utilized here. One is to add dedicated wiring for this data, which may be necessary depending upon where various in-chip or in-package sensors are located. The other uses existing circuitry to carry this data, even though it wasn’t specifically designed for this purpose. There also can be a combination of both, but that requires domain-specific knowledge to partition and prioritize data.

But in all cases, moving the data through a complex chip or package is a challenge. “There are main two scenarios, one where you have different accesses to the chip — and you may have multiple accesses for some — and others where you have an in-field test, which is a different scenario,” said Steve Pateras, senior director of marketing for test products at Synopsys. “In one, you have a high-speed I/O-based access. There’s IEEE 1149.10, but there’s also this whole concept of re-using functional I/O for high-speed bandwidth access to test and other instrumentation. PCIe and USB seem to be the most popular now, and you can piggyback on these functional interfaces to extract large amounts of data from the chip — multiple tens of gigabits per second bandwidth. We’ve been working with the ATE companies for using that during manufacturing test. You can use the exact same interface for in-field because these are functional interfaces that exist in the system.”

Once that data is accessible, then all sorts of information can be gleaned from it. “The raw data is uploaded to our software platform where knowledge-based algorithms are applied to provide actionable insights back to the user,” said proteanTecs’ Brousard. “This can be in either test or functional mode.”

What’s being measured?
Getting this kind of detail from a functioning chip or system has been discussed for years, and it’s finally starting to gain traction. But it’s also just one more element in a broader shift toward data-driven architectures, where the goal is to move data through a chip with less effort and at higher speeds. What’s different is this monitoring and test data needs to be completely accurate and exactly the same at the point of creation and during extraction and analysis. Any flaw in that data can result in a field failure.

“The tolerance levels that people are expecting out of a 7nm or 5nm design are very tiny, and no one’s going to collect enough metrology or inspection data because you can’t afford to inspect every wafer,” said John Kibarian, CEO of PDF Solutions. “There are some steps you can take with metrology and inspection, but you can’t do that across the board. So you need to be able to work with the equipment data directly — and the consumable data, which is photo resists and slurries — and you need to be able to track the relationship between those various materials variations and what that means to the product. The great arbiter is what impact did those tests have when you tested the product. Does the product behave differently or not? Ultimately, that data is very valuable to understand. Test data is a very, very important part of that chain.”

It’s also easy to disrupt and corrupt that data at multiple points in the test cycle (see figure 1), and that needs to be considered throughout the flow.

“The chuck temperature goes up to 300° and down to -60°, so there’s a huge temperature difference you can achieve,” said Jens Klattenhoff, vice president and general manager of FormFactor’s Systems Business Unit. “Usually, you can be pretty precise. But with the very small pad sizes, there’s no chance to be as precise. This also involves probe-CuP (copper pillar) positioning. All the little drifts are occurring due to temperature. But with RF measurements, for example, particularly with 5G at higher frequencies, there’s a totally different drift caused by the instruments. For DC measurements, it’s mostly the position of the needles needed to hit the small pads even under this temperature variance. On the RF side, there’s a different drift, and this is the frequency drift.”

Fig. 1. Potential impacts to data collection. Source: FormFactor

Fig. 1: Potential impacts to data collection. Source: FormFactor

This is particularly relevant as designs become more heterogeneous, because they include more analog/mixed-signal elements. “Traditionally in data and testing, SoC data is not very interesting,” said Mark Roos, CEO of Roos Instruments. “You know it’s scan data or functional test data, it’s pass fail data. It’s used to yield your parts but it doesn’t yield much information, whereas an analog test every possible number tells you something.”

Test isn’t the only process being stretched out over more process steps. As an increasing number of chips are added into the same package, metrology is being used at more insertion points, as well. As with other processes, the emphasis is on good data.

“We are asked to provide more stringent reproducibility and repeatability criteria, so that we are absolutely certain how this system has been produced versus the process specifications,” said Samuel Lesko, product manager in Bruker’s Nano Surfaces Division. “So rather than pressure on the data, we see pressure on absolute accuracy, tool-to-tool matching, and more frequent QC wafer turnaround to assess a real baseline, so we are sure we know the exact value of whatever data we output on the process wafer. It’s all about accuracy and long-term reproducibility.”

This concern for data accuracy is spreading throughout the fab. Some types of inspection, for example, are being repeated to make sure the data is accurate, particularly where AI is involved. “Even though we don’t tell our customers, we actually do multiple passes,” said Subhodh Kulkarni, CEO of CyberOptics. “When we see data that looks marginal to our AI, technology, if we have time and a part is still there, we quickly go back and collect high resolution data at that point, while the rest of the data is still being churned. We are taking advantage of any slack time in our existing technology. But certainly, for the sophisticated advanced packaging type customers, they are okay to sacrifice the throughput to improve reliability.”

Along with better data, another challenge is to conflate different types of data from a variety of tools and processes.

“You have different structures of data coming in, and we have engagements with customers on trying to commonize that for them,” said Ben Miehack, product manager for defect inspection and metrology at Onto Innovation. “And they really want to have us output the data in a structured format so that they can consume it before it gets passed on. So, typically before we used something like a KLARF (KLA Results File) — an industry standard from KLA — or certain wafer map E142 or some SEMI-type communications format, but that’s now changing where the volume of data needs to be restructured.”

And finally, companies are trying to figure out ways to share all of this data across multiple entities in order to improve quality.

“The cumulative data about a die’s fitness needs to move through multiple hands,” said Jay Rathert, senior director of strategic collaborations at KLA. “That could be inside an IDM fab, where everything — design, manufacturing, and test — is under one roof. Or maybe it’s a fabless designer using a foundry or an OSAT, which is the most complicated use case. We’re trying to find a way where people are comfortable routinely sharing a certain level of granularity in the data, but where you don’t have to expose sensitive process IP. So how can you skim off just the safest layer of that screening data to say, ‘This die doesn’t look like the others.’ You want to let the fab protect their process IP, and any other type of confidential information they want to retain, but feel confident in passing along just a nugget of information that this die on this wafer should be tested a little differently, or burned in a little differently, or run through system-level test. We need to get people accustomed to this idea because the whole industry needs to take quality to the next level.”

What’s next?
Companies are now focusing on how to ensure that accurate data stays accurate. Synopsys, for example, has started packetize test data from different cores in a chip. In the past, this has largely focused on scan data, but that approach is being broadened to include other kinds of test data. The packetization helps because it allows the data to be moved in discrete bundles.

Another challenge is understanding what data needs to be collected up front or later in the design-through-manufacturing flow, and how that can be affected by other elements that may not be implemented at the same time. For example, there are several power issues that arise later in the design cycle that involve test. “DFT might be inserted later in your design cycle,” said Renu Mehra, R&D group director for the Digital Design Group at Synopsys. “And if you’ve specified your intent in a very generic way, saying that every time I have a border between power domain A and power domain B and I need an isolation cell, then if there are DFT signals that are then getting introduced later in the design phase, they also will see this power intent and those DFT signals also will get appropriately isolated, for example.”

In fact, DFT has a very complex interaction with UPF. “That’s a whole can of worms because DFT introduces not just scan signals, so scan signals might be going from one power domain to another,” Mehra said. “They have DFT wrapper cells, the compressor that compresses pieces that that are introduced, and as a result, even though we don’t realize it, DFT introduces significant new logic in the design. And based on the power of MV design, we need to be very careful that when we are introducing a new cell, we are not breaking the power. So if there was no communication between Domain X and Domain Y, and no isolation was specified in between them, and one of them happened to be shut down, DFT doesn’t realize this. It needs to add wrapper cells if it adds a connection between these two domains. There is no isolation specified. It is going to transfer the corruption from the dead part of the circuit to the live part of the circuit, and you have to be very careful on how DFT is introduced. sometimes for DFT, we have to add new UPF intent right after the DFT is inserted to kind of fix it.”

The chip industry is starting to embrace on-chip, in-system monitoring as a crucial tool for identifying everything from the health of a chip to potential unwanted activity, and it has begun to figure out ways to build this into tools and processes. The challenges now include getting the data out of the chip in the first place, and then ensuring that the data being collected from inside the chip or package is accurate and unaffected by a variety of other activity on and off the chip or package.

There’s a clear business opportunity in being able to provide this kind of data to industries that demand reliability. Some of it being done at the behest of customers, such as automakers, which see this kind of data as vital to avoiding expensive recalls and liability lawsuits. But what’s clear is the chip itself is being used as another source of data metrics, either corroborating existing test results or sounding an alarm when something slips by. Having extra resources can only help, and having those resources in places that aren’t accessible using traditional equipment or methods can provide a big improvement in overall reliability.

—Ann Steffora Mutschler contributed to this report.

Design For Test Data
Creating chips that can be tested throughout their expected lifetimes.
Merging Verification And Test
Reliability concerns throughout a device’s lifetime are driving fundamental changes in where and when these functions occur.
Hunting For Open Defects In Advanced Packages
No single screening method will show all the possible defects that create opens.
Design Issues For Chips Over Longer Lifetimes
Experts at the Table: Keeping systems running for decades can cause issues ranging from compatibility and completeness of updates to unexpected security holes.

Leave a Reply

(Note: This name will be displayed publicly)