Reliability Challenges Grow For 5/3nm

New transistors, materials and higher density are changing the testing paradigm.


Ensuring that chips will be reliable at 5nm and 3nm is becoming more difficult due to the introduction of new materials, new transistor structures, and the projected use of these chips in safety- and mission-critical applications.

Each of these elements adds its own set of challenges, but they are being compounded by the fact that many of these chips will end up in advanced packages or modules, in applications where stress based on heat, vibration and other physical effects are no longer completely predictable. Add to that greatly increased dynamic power density, more physical effects that can affect aging models, and the fact that 5nm and 3nm digital circuits are starting to behave much more like analog circuits, subjecting them to a host of analog problems such as electromagnetic interference and drift that were never considered major issues in digital design.

Beyond that, increased density, smaller features and advanced packaging are making it more difficult to inspect, measure and test chips with sufficient confidence that full coverage has been achieved.

“When finFETs were introduced, there were some very specific differences in behavior,” said Geir Eide, product marketing director for test at Mentor, a Siemens Business. “For gate-all-around, more data needs to be collected to see the full effect of that. Where we are at now, we have the capability to model and generate data patterns at a transistor level. That’s the infrastructure needed to tackle not just finFETs, but also new transistor types. We have the right framework to address a wide range of defect types that can happen also in gate-all-around and other types of transistors. But there’s still more development required as more data and more actual results from that type of technology become available. But just looking at what has happened in the past, in all likelihood we are going to see some amount of unique and new defects. That always happens at each node. The bulk of the defects are similar to what was there in the past, but there’s always some set of unique things that happen.”

And with more unique designs, rather than billion-unit volumes for a single chip, that complexity becomes even greater. Many of these designs are heterogeneous, as well, and include some version of AI/ML/DL, which adds to the uncertainty about what kind of impact a latent defect will have on the functionality of a chip.

“Heterogeneous integration processing is still early in the cycle,” said Amy Leong, chief marketing officer at FormFactor. “Right now, that means more measurements have to be taken. Over time, that number has to come down with more measurement intelligence. We’ve seen that with the start of every technology node. What’s different now is there is no uniform roadmap, and there are quite a few different technologies. There also is an array of ways to put LEGO blocks together.”

That puts a much bigger emphasis on customized probe products — analytical probes for engineering and probe cards for high-volume manufacturing. It also requires deeper inspection and metrology because the devices are more diverse, they are smaller, and they are packed together much more tightly.

“Testing and inspection at advanced nodes presents challenges because sizes are getting so small,” said Subodh Kulkarni, president and CEO of CyberOptics. “The semi industry has historically been able to deal with those challenges and will continue to do so. In addition, the third dimension is coming into play in advanced packaging with processes like stacking and heterogenous integration with a higher degree of complexity, so the need for 100% inspection is increasing rapidly. That’s where some of the more exciting technology developments are happening.”

5nm issues
At 5nm, finFETs begin to run out of steam for a number of reasons, including the ability to move signals through skinny wires, thinner dielectrics to insulate various components, and the decreasing ability to control current leakage through elongated gates. Even manufacturing becomes a bigger issue because with finFETs, the fins need to be taller at each new node.

“One problem is that the fin itself has to be stronger,” said Tomasz Brozek, senior fellow at PDF Solutions. “But with a taller fin, the gate also has to be taller, because the gate sits on the fin on three sides — two sidewalls and the top. That means material has to be removed, replaced, metal has to be deposited inside of the trenches, and the contacts have to be built on a source/drain epitaxial region. So the epi growth needs to be deeper and taller, and the contact itself also would be built around the source and drain to reduce the contact resistance. That adds to the complexity. You can make the fin even taller, but then the current would be flowing more at the top part of the fin. The current distribution will be from the contact at the source and drain, and then through the fin. Increasing the fin height doesn’t buy you the same amount of performance increase as you would expect from channel width increase coming from the fin height growth.”

This becomes particularly important in the automotive market, where the most advanced processes are being used for the AI portion of an increasingly autonomous vehicle. No advanced-node processes ever have been used on a mass scale in chips used under harsh environments such as a car. In server racks, for example, individual servers can shift loads to other servers if the internal temperature is too high. Likewise, if a smart phone is left in the sun and exceeds a certain temperature, for example, it will shut down until the temperature drops below a pre-set limit. This isn’t possible in a car, and heat can have a big impact on everything from latency in memory to acceleration of circuit aging.

“We’ve run for a long time on really leading-edge nodes with consumer-oriented mindsets regarding failure rates in the field,” said Chet Lenox, senior director of industry and customer collaboration at KLA. “And so from our perspective, we’ve got a set of tools that are primarily used for process control. And we’re developing techniques that allow us to utilize data in the fab more for predicting reliability fields in the field. Machine learning is a big part of this. In the past inspection and metrology were used for process control only in the fab. That included control excursions, figuring out what your defect paretos look like to down the leading defects, and keeping CDs in control so parametrics at the end line are good. The final arbitrator was sort/yield and final package test. If that was green, you should have been good. What we’re finding now is those things don’t catch latent defects and reliability failures. So a part can pass, a defect mode can activate later in the car. That’s hole number one. Hole number two is that you have huge test coverage gaps in those portions of the die that you don’t test. With modern digital devices, you can’t test the entire device. So the combination of unactivated latent defects and test coverage gaps is the big hole that needs to be filled for automotive. For systems that are going in cars, that hole is being filled by lots of redundancy. You don’t want to have to do that.”

Alongside of that, there are issues with collecting enough data to provide sufficient coverage.

“At 5nm, defect and metrology data is becoming much harder to collect,” said Doug Elder, vice president and general manager of OptimalPlus. “Therefore, the ability to take action on that is going to be delayed until further down in the process. That affects your ability to get to wafer-level data that allows you to feed that back to determine whether you have a recipe problem or an etch problem, and where that problem is occurring. On one side, visibility — both electrically and what you can actually see — is reduced. On the other side, there may be too much data. If you look at the size of the chips, they’re generating so much data that people aren’t sure what to do with all of that. We’ve been taking machine-vision images and running algorithms on those to determine whether something is good or bad. We can digitize a very sophisticated image, and with frame wrappers and machine vision you can take a very well-defined image, run machine learning on it to determine what are the points that are of interest to the manufacturing process, and then determine if it is good or bad. There are more and more issues in terms of either getting access to that data, or in some cases figuring out what to do with so much data.”

3nm issues
5nm and 3nm both add significant new reliability challenges. But at 3nm, the number and magnitude of those issues is less well understood for a number of reasons. Among them:

  • Gate-all-around transistors will replace finFETs. Samsung already has announced that it will move to nanosheet FETs at 3nm. TSMC plans to use finFETs, at least at first, although it could well change over to some type of gate-all-around transistor, as well.
  • The introduction of new materials, such as cobalt and ruthenium, as well as new thin films, each of which will be thinner, smaller and more difficult to clean, polish, measure and inspect;
  • A subtle shift in digital logic at each new node toward increasingly analog-like behavior, forcing chip engineers to begin wrestling with issues such as signal drift, noise and different stresses.

“Every new material causes problems,” said Andrzej Strojwas, PDF’s chief technologist. “For several generations, shorts between gates and contacts has been the dominant cause of failure. The foundries are responsible for yield, so they will eliminate the shorts, or at least try to. But there also could be leakages, and those leakages will lead to reliability failures. It could be the result of overlay or of some spacer weakness, so now you have those dangers of latent defects. And in markets like automotive, we have to correct hard shorts as well as leakages. This is why we’ve been focused on characterizing these leakages.”

As digital logic begins to become more analog-like, this becomes even more troublesome because analog signals are subject to drift over time as circuits age.

“It’s beyond just drift,” said Strojwas. “You have to look at stresses, which could cause reliability failures. You have to put monitors into the actual die to let you know if you are running into trouble. You have to look at mechanical stresses after you thin the die, which you put into the package or interposer. So now you have to see what’s changing from when you do the wafer sort and burn-in to what’s happening out in the field. You have to produce monitoring data throughout the lifetime of the product.”

At 3nm, this becomes even more critical because second- and third-order effects become first-order effects. This is exactly what is happening with inductance at each new node.

“You can ignore inductance if resistance is low,” said João Geada, chief technologist at Ansys. “But when resistance starts climbing slowly because you’re using more exotic metals at the lower levels of metallization, then inductance starts to be something you need to pay a lot more attention to — and not just for resonators and antennas. Your clock starts acting like an EM emitter and couples to any nearby wire that has the correct orientation. Inductance is not an effect that can be ignored on chip anymore.”

In-circuit testing
As these advanced-node chips are used in more mission- and safety-critical applications, there has been much more focus on in-circuit data. Nearly all of the companies working in the data analytics space offer some version of in-circuit sensors for different phases of development. Increasingly, they are being used well beyond manufacturing, though, to map how these chips perform under real-world conditions. As a result, data collection is now shifting both left and right, effectively setting up a data loop that relays data back to the manufacturer as well as chip design teams to correct defects and make improvements in future devices.

“You cannot test everything,” said John O’Donnell, CEO of yieldHUB. “Generally you test for one parameter or you test for another to get something out of the data you couldn’t do otherwise. But you also can check the database for trends and go beyond what is available at test, and this is becoming very important to customers. In multi-chip packages, you can detect what’s going on in individual die from the data and determine which die is which. And you can trace multiple chips right back to the right fabs.”

This is important for other reasons, as well. As chips become denser and are used for critical applications, it’s important that they continue to work, even if that performance isn’t optimal, until they can be replaced. This is one of the basic principles of ISO 26262 in automotive, where devices need to failover gracefully, whether that’s to a fully redundant circuit or another device or module within a vehicle.

“That’s the underlaying thing that happens each time there is a new advancement in manufacturing,” said Mentor’s Eide. “When something doesn’t work, the way it doesn’t work can be for different reasons and it behaves differently when it’s defective. For example, on old processes if a transistor is broken, it simply stops working. If a finFET transistor doesn’t work, it can still work — think about finFET as a switch that has three connections and these three connections are supposed to work in parallel. If one of them doesn’t work, the switch still works, but now it’s slower. How you have to test for that now is a little bit different. And so for each kind of increment in manufacturing, there’s always a chance for new types of defects that you have to first understand that they’re there and then test in different ways. One of the things that we have done in our industry is to target defects more at the transistor level. ICs are designed with different building blocks, where the transistor is the lowest level. In the past, it simplified the problem to address the slightly bigger building blocks. Now there’s a shift toward looking more at the transistor-level behavior of things to target. So when we’re creating tests to target the transistor-level behavior, what types of defects we’re chasing and how we chase them, over time that becomes more and more complex.”

Test, inspection and metrology are changing at a fundamental level as new nodes and new packaging approaches render previous approaches insufficient to guarantee coverage and ensure quality over time under a wide range of use models and across many different applications. This is transforming test into a continuum of operations that are conducted regularly throughout a product’s lifetime, and it has made good data invaluable to a number of players across the supply chain.

The result is that a number of companies — proteanTecs, PDF, OptimalPlus, yieldHUB, Mentor, Moortec and UltraSoC, among others — are racing to get their sensor technology designed into chips and packages to detect everything from temperature variation to voltage changes and data traffic anomalies. Over the next few years, this could have a big impact on everything from margin in manufacturing processes to predictive analytics and automatic fixes based upon a continuing stream of real-time data, which will add a level of precision never seen before in inspection, metrology and testing for latent and obscured defects.

Related Stories
Grading Chips For Longer Lifetimes
Different use cases, dependencies and testability are making direct comparisons much more difficult.
Failure Analysis Becoming Critical To Reliability
Once confined to analyzing returns, it now is shifting left and right as more data analytics are applied to both digital and analog.
Moving To GAA FETs
Why finFETs are running out of steam, and what happens next.

Leave a Reply

(Note: This name will be displayed publicly)