中文 English

Using Manufacturing Data To Boost Reliability

Correlation becomes big challenge as data sources and volume increase.


As chipmakers turn to increasingly customized and complex heterogeneous designs to boost performance per watt, they also are demanding lower defectivity and higher yields to help offset the rising design and manufacturing costs.

Solving those issues is a mammoth multi-vendor effort. There can be hundreds of process steps in fabs and packaging houses. And as feature sizes continue to shrink, and more elements are connected into a package, some of those steps are taking longer and becoming even more complex. That, in turn, makes it imperative to sift through a growing volume of manufacturing and packaging data, and to correlate various types of data from all of these process steps, as well as others.

“There are a bunch of sources of data,” said Andrzej Strojwas, chief technology officer at PDF Solutions. “We are dealing with a broad spectrum of data, from in-situ, real-time data from the equipment, to the information from inline metrology, inline defect inspection, test chip data, and then test data at wafer sort, at final test, and at burn-in. There are also new types of data from structures inside the chips that address reliability. Everybody is using PVT sensors — process, voltage, temperature — and they are crucial for predicting how these dies will behave in the field. There are also new types of sensors which monitor reliability risks due to accelerated aging, and mechanical stress changes throughout the assembly process and in the field.”

On top of that, inside any fab or packaging house is a mix of equipment, some new and some less so. As a result, that equipment collectively generates data of uneven quality and quantity. And to make matters even more complicated, frequently not all of it is under one roof. That makes sharing data in a timely matter both more essential and more difficult, compounding a problem the foundry and packaging industries have have never completely solved.

“This data about a die’s fitness has to move through multiple hands,” said Jay Rathert, senior director of strategic collaborations at KLA. “Maybe it’s an IDM, in which case everything is under one roof. Or maybe it’s a fabless design in a foundry or an OSAT, which is the most complicated use case. We’re trying to find a way that people are comfortable sharing a certain level of granularity in the data, but where you don’t have to expose your entire process. In aggregate, there’s a deep understanding of what’s going on, but we don’t normally have full access to all that data because there’s a lot of deeply proprietary information in there. The challenge is to skim off just the safest layer of that while still being able to say, ‘This die doesn’t look like the others.’ So you’re letting the fab protect their process IP and any other type of confidential information they want to retain, but you also feel confident passing along just that nugget of information that this die on this wafer should be tested a little differently, or burned in a little differently, or run through system-level test. People need to get comfortable with the idea that the whole industry needs this.”

Others agree. “Disjointed and disconnected data is a long-term problem, and it’s been exacerbated by the movement away from a true OEM,” said Mike McIntyre, director of software product management at Onto Innovation. “Most manufacturers have a model where at least some critical pieces are done outside of their manufacturing environment. Historically, if you were an OEM, you at least had the ability to see all the data, and access all the data that was available to you. Now, we need to create an infrastructure that will bring this data together from disparate activities. And by providing content expertise, we can begin to join disparate data such that you can form those relationships for analytics.”

This disconnect in data is made worse as the size of designs continues to grow, as well. As a result, devices are tested on a more piecemeal basis, basically building on the divide-and-conquer approach for developing chips. Supplementing data from various processes with internal data from a working chip can provide a more complete view.

“Test engineers, for the most part, are just looking at a stuck-at test or a very localized problem,” said Marc Hutner, senior director of product marketing at proteanTecs. “The exciting part is when you can take data from inside the chip and get alerts and insights from the whole chip or package. And as more and more integration happens, you can start to find those kinds of relationships. You can still look at it locally within a section of the die, but you can back up to the die level and even start to look at it across entire lots, as well. So it’s multiple levels, and it works for advanced packaging, as well.”

Packaging challenges
One of the big changes in chip manufacturing has been the growing complexity of the packaging. As the benefits of Moore’s Law diminish at each new node, an increasing number of chipmakers have opted for more customized and increasingly heterogeneous designs and architectures. Delivering reliable chips on schedule is hard enough, but putting multiple components into an advanced package complicates it further.

“In the automotive industry, more than 50% of the failures are due to packaging,” Strojwas said. “The assembly process itself is very complex and important for many industries, especially for automotive and military applications but also for the data centers. So we need to track where a particular chip came from, what part of the wafer, but that’s just the beginning of the story. You also need to understand what’s happening with substrates when you attach everything. In addition to the sensors that already exist, you now need to add an additional type of sensors and full traceability.”

Complex packaging creates challenges across the board, from inspection and metrology, to data collected in the field that is looped back into design and manufacturing.

“There are more patterns and shapes and complexities being brought in at the package level,” said Subodh Kulkarni, CEO of CyberOptics. “Some of those are very sophisticated packages. There are gaps between chips that are down to tens of microns, and within that there are some passive components. Now, somehow, we are expected to do 100% inspection with good accuracy, which is not trivial. It’s like trying to identify a few people in New York from a long distance away. So it is getting quite complex because of the dimensions and how tightly things are being packaged together. Extremely sophisticated front-end processes are now being used in the back end to process those packages and put them together.”

New uses for data
That, in turn, requires even deeper inspection and metrology, which generally lengthens the time it takes to perform the related process steps and creates even more data. And depending on what are considered acceptable yield and reliability metrics for any particular application or use case, it can have a significant impact on the overall process flow.

At the same time, deeper inspection opens up new opportunities for using that data in areas like atomic force microscopy (AFM). “Traditionally with AFM, we were working with step height roughness, but we expanded to CD (critical dimensions), where we were chasing smaller and smaller line/trench geometries,” said Ingo Schmitz, technical marketer at Bruker Nano Surfaces. “Now we’re moving toward measuring things that are more specific to EUV, like top-line roughness because of the off-axis illumination. Another really big area involves the reticle. A reticle can cost hundreds of thousands of dollars, so reticle repair and maintenance has become a really big deal. As reticles get used in the exposure systems, they become contaminated. So AFM is now used to identify defects and repair structures.”

Fig. 1: Contact holes in dual damascene trenches. Source: Bruker

Fig. 1: Contact holes in dual damascene trenches. Source: Bruker

EUV lithography is being used by Samsung, TSMC, and Intel at the most advanced nodes, but the dimensions are so tiny that some irregularities, which could be ignored in the past, now can have a significant impact on performance and power over time.

“An EUV blank should have zero defects, but there is no such thing,” said Schmitz. “What companies do is essentially place patterns onto the EUV blank so they have the least amount of defects. Defect size becomes more critical as you go down to the 10 to 20 nanometer arena, and AFM is the only system that remains at that dimension.”

Nearly all designs at the leading edge are customized in some way, and many are utilizing options they never would have considered in the past, such as multiple accelerators and different types of memory.

“We’re moving away from a homogeneous single piece of silicon that has a system-on-a-chip into the new paradigm of, ‘I’m going to build application-specific chips and integrate them into a system on a module and be able to get the benefits of unique processing, and put all that information together,'” said Onto’s McIntyre. “In the past, you could test it all homogeneously before it exited the wafer. Now, you’ve got to put all this extra assembly information together and bring that data to further integrate with it. Just because everything is passing spec doesn’t mean the integrated solution passes spec. That’s the challenge.”

Materials issues
The potential for defectivity extends well beyond the surface inspection and measurement. Materials in everything from the substrate and RDL to dielectric thin films need to be both pure and applied with atomic-level precision.

“Permanent materials are subject to harsh conditions and must survive the device’s expected lifetime without any chemical or mechanical changes,” said Cristina Matos, technology director for WLP materials at Brewer Science. “We have focused on designing new materials and balancing structure-property relationships to ensure materials meet expectations of accelerated aging, temperature cycling, and harsh storage conditions.”

This has broad implications for the entire supply chain. “There are several ways these demands have impacted manufacturing processes,” said Tom Brown, executive director of manufacturing, quality, and logistics at Brewer Science. “First, as end user expectations increase, customers’ performance requirements become more extensive. For example, a typical product certificate of acceptance may have once required only 10 items as criteria. Now, some customers have over 200 requirements listed on the COA (certificate of analysis). Additionally, customers’ production lines are realizing increased sensitivity to minor variations in their products, resulting in a tightening of existing expectations. Specification limits and control limits tighten with each generation. A final consideration is that the customer products, and therefore Brewer Science products, become more costly to make to meet the ever-increasing demands, further emphasizing the need for robust scrap/waste reduction and prevention programs.”

Purity and defectivity are rising concerns across a variety of new application areas, as well. This is evident with power electronics, which have largely flown under the radar until very recently because they use older process technologies. But as they make their way into more safety-critical applications such as automotive, they are being subject to the same kinds of scrutiny.

Silicon carbide by its nature can be the substrate, and there are a lot of applications now where gallium nitride is mounted on silicon carbide substrates,” said PDF’s Strojwas. “Because that’s mostly for power electronics, or at least higher-than-usual power requirements, you have to deal with heat propagation.”

Strojwas noted that for high-performance system packages, heat sinks are common. And for the really high-power mainframes, like IBM’s Z series, they are offering now liquid cooling. Integration with the overall packaging is crucial, and all of this data has to be collected and checked, not just at the wafer manufacturing, assembly and test, but also in the field.”

Those kinds of concerns extend across other substrates, as well, particularly in advanced packages where there may be a variety of materials being used. “On the substrate side, there is a lot of creative stuff going on,” said CyberOptics’ Kulkarni. “They are using different kinds of low-k materials, which provides better electrical performance. We have a sample of one of the more advanced packages right now in our lab, and they’re asking us to inspect it. On one hand, you have this extremely shiny copper bump on a pillar that is about 20 microns, and you have this perfect hemisphere on top. And we are using optical technology, so literally you get one pixel that we can look at in the camera, and we are trying to infer the height of that bump with that one pixel. That’s one extreme where it’s perfectly reflective everything. But the whole thing is sitting on a very diffused substrate where there’s no reflectivity whatsoever. So we’re dealing with dealing with these issues of how do you design or project a dynamic range where you have a low-k, highly diffusive substrate, versus a highly shiny, perfectly curved copper mirror. You need to design the dynamic range of a projection scheme so the detectors either don’t get oversaturated or don’t see any signal at all. This is becoming much more of an issue in the substrate world than in the wafer world. This is a new area before the wafer even comes into the picture. The substrates are becoming more sophisticated and dimensions are getting smaller, and this is changing the dynamics for optical inspection.”

More choices, more data, more problems
In the past, when the leading edge was dominated by billion-unit designs that were updated every couple of years, there was enough time and manufacturing volume to be able to solve reliability issues with relative ease. But as designs become more customized, volumes are lower and the devices themselves are unique.

“One of the things that contributes to reliability is how many times you’re doing something,” said Brett Murdock, product marketing manager for memory interface IP at Synopsys. “The fact that we’re doing the same thing for every customer, or almost the same thing, means we’re really good at it, and it’s tried and true. You know it’s worked millions of units, so why would it be any different for this new AI customer that we’re selling HBM to for the first time? We’re not going to have to reinvent anything.”

While multiple choices of which processing elements and memories can be used is great for optimization, but it adds unknowns and lots more data to sift through and correlate. This is evident in the choice of DRAM, for example, where HBM sometimes is chosen based on the fact that there are fewer connectivity options, and less to go wrong.

“You’re going to have lower power with an HBM device, and fewer physical interfaces to deal with,” Murdock said. “With DDR/GDDR/LPDDR, with any of those interfaces, how you implement them physically on the SoC is the Wild West. It’s whatever you want. You can put a full linear PHY on the side of the die, you can wrap around a corner, you can fold it in on itself. There’s just an untold number of ways you can implement that physical interface.”

That raises questions about reliability, which become more challenging to answer the longer these devices are in use in the field.

“There’s a lot of monitors and sensors becoming part of circuit functionalities to give real-time data,” said Sathishkumar Balasubramanian, head of products for AMS verification at Siemens EDA. “And interestingly, we’ve been seeing reliability not just in automotive. We’re seeing it even in the mobile industry. We are dealing with a customer case where the storage they’re using in mobile was supposed to be designed for a certain lifetime, 2.5 to 3 years, but people are holding onto their phones for 4 or 5 years. They haven’t taken that into account, which is mainly on the aging side. In addition, especially on the memory side, the amount of read and write cycles that they go through is beyond what they were designed for, and that impacts reliability, as well.”

In addition to meeting specs for reliability, design teams are finding they often have to exceed those specs. “Designers, both on the fab side and the design side, are learning that we really need to design, not over-design, taking into account how much the use model is going to change and how we can make that more reliable,” said Balasubramanian. “So there are aging effects, and you also have to look at variation. How do you minimize variation? That starts right down from the building block of any circuit design. On a variation of a design, we are seeing a lot of customers making this a requirement for high-sigma requirements. People are starting early, all through the design flow, right from the library components. They want to make sure those components are robust. For example, they want to make sure that for a given standard cell library the process satisfies all different PVTs and an even wider range, even though there’s no way you can verify it.”

Aging adds a whole new level of concern, particularly for safety-critical applications such as cars. “We have a bunch of applications around HTOL (high-temperature operating life) testing, where you can start to see how things age from the inside,” said proteanTecs’ Hutner. “That extends into the field. So as your new chip ages, you’ll start to see those kinds of effects. And if it starts to wear out — and you can predict when that will occur, because you have an aging model and you can see if it’s deteriorating slowly or quickly — then you can say, ‘In three months, or when you bring your car in, you need to replace that module.’ This was never possible before.”

The slowdown in Moore’s Law scaling, coupled with a dramatic rise in the amount of processing that is required everywhere, have pushed chipmakers and systems companies to develop customized devices. Now they are demanding these systems be extremely reliable over longer lifetimes.

These are factors that typically don’t mesh well. As a result, the chip industry has been scrambling to figure out ways to put together more pieces, in smaller volumes, and still turn out increasingly complex solutions. That requires utilizing data gathered from more sources and more process steps.

Put simply, the industry is once again being shaken out of its comfort zone, challenging the old way of doing things, including proprietary attitudes toward data that needs to be shared across a disaggregated supply chain. None of this will happen easily, but it will have to change nonetheless as demand for reliability continues to grow — along with the growing consequences for making mistakes.

Leave a Reply

(Note: This name will be displayed publicly)