Making Random Variation Less Random

Limits on margin and a growing emphasis on reliability are changing the economics of unknowns.


The economics for random variation are changing, particularly at advanced nodes and in complex packaging schemes.

Random variation always will exist in semiconductor manufacturing processes, but much of what is called random has a traceable root cause. The reason it is classified as random is that it is expensive to track down all of the various quirks in a complex manufacturing process or in materials or unusual use cases. In the past, most of these have not impacted yield, but the equation is beginning to change for a number of reasons:

  • The cost of developing a chip is going up at each new node, while tolerances are going down. Adding extra margin to offset random variation can have a big impact on power, performance and area (PPA), which limits the benefits of scaling.
  • The number of possible interactions in a heterogeneous design is growing, and classifying some behaviors as random can impact functionality, particularly in AI chips that are supposed to self-optimize over time.
  • Reliability concerns in safety-critical applications require much more attention to potential unknowns, especially in harsh conditions over longer expected lifetimes of chips.
  • Truly random variation becomes more prominent as features shrink and films become thinner, and a difference of a few atoms can have a big impact on functionality and yield. As a result, it becomes economically imperative to identify more sources of variation than in the past.

“You always design as much variation out of a process as possible,” said Suk Lee, senior director of TSMC’s Design Infrastructure Management Division. “Random variation has been a big problem throughout manufacturing. So you do things like channel placement through TCAD, and attack it everywhere. And you want to design out as much random variation as possible, especially for ultra-low voltage and all the different corners. But as you move into these areas, you’re dealing with more natural variation, too.”

In absolute numbers, the semiconductor industry has been reducing what it considers random variation for some time. But at each new node the tolerances become significantly tighter, particularly with finFETs, gate-all-around FETs, and multi-chip packages.

“The margin for random variation goes down more than the amount of random variation goes up, whether those are stoichiometry changes in materials or radiation effects in the electrical performance of the devices,” said Chet Lenox, director of process control solutions for new technology and R&D at KLA. “One of the classic examples involves the stochastics in EUV. Because we are running that at tighter and tighter margins, previous random variation that we used to be able to tolerate, we no longer can tolerate as well. That’s not really new, though. We’ve been working in an environment like that for the past 30 years, where successive process generations had tighter process windows. Process development is just as much about reducing variation as it is about finding the right integration method and tuning your transistors. We have been doing this a long time. Reducing variation is key to our business model.”

This is important because random variants also can interact in unexpected ways with other random variants.

“In fixed amounts, variation has come way down,” said David Fried, vice president of computational products at Lam Research/Coventor. “If you were to compare total variation in channel lengths at 5nm CMOS versus 60nm CMOS, the more advanced technology has much less absolute variation. But if you talk about variation as a function of nominal, we’ve stayed the same or gotten worse. Random variation is a larger percentage of the overall value than it used to be, even though in absolute terms it has gone way down. And there are combinations of all sorts of variation that we have never had before. Each one may have been tightened up dramatically, but there are so many variations interacting with each other in complex ways, and the specifications are so tight, that your reliability requirements can break in ways that you never previously considered.”

One of the reasons for the tighter tolerances is the amount of dielectric insulation, which in some cases is now measured in single-digit numbers of atoms.

“New integration schemes and scaling methods require precise deposition of ultra-thin film stacks,” said Niranjan Khasgiwale, vice president for imaging and process control at Applied Materials. “For example, in the case of MRAM some of the layers must be only 8 to 12 atoms high, with film uniformity variances of less than the height of a single atom. A few missing atoms, which can be random fluctuation, can result in a significant change in electrical properties.”

Fig. 1: The goal is linear behavior, in this case involving an ultra-thin tantalum film. Source: Applied Materials

Economies of scaling
The business case for finding these kinds of problems varies greatly. It depends on the application, the process node, and whether there is a single chip or multiple chips in a package.

“This is why you’re seeing a lot more metrology in higher-end designs, because you want to collect data in fabs for everything from overlay between process controls and CD control,” said Warren Flack, vice president of worldwide applications in Veeco’s Ultratech Division. “But when you add steps, that requires you to buy more equipment and it takes more time, so you need to get higher yield and better reliability to help offset those costs. In lower-end chips there is not as much money being spent on metrology, and there is a lot more error that looks random but which is generally correctable. At some point, it’s not worth solving and it’s justifiable to lose a little yield.”

This is particularly evident with new technologies, where reliability metrics are highly dependent on the end application. This is true even with printed electronics, for which there are no standardized testing procedures yet. As a result, domain expertise needs to be applied to determine what can remain “random” and which variation needs to be traced to a root cause.

“Every time we go off-the-shelf we find deficiencies in products, and then it requires expertise to get us to something that’s more acceptable,” said Will Stone, director of printed electronics integrations and operations at Brewer Science. “We are highly vertically integrated. That allows us to go straight to the designer of a material and say, ‘This is good, and this is what needs to be fixed.’ That can be a real challenge for non-vertically integrated companies.”

Some of what is considered random variation can be explained through more rigorous testing and inspection and either addressed or avoided, but that also takes time to show up in real-world applications.

“There is still a lot of physical examination required,” said Stone. “In the sensor world, we’ll make thousands of samples, but there is still a lot of physical examination and testing. We’re looking at automating that more to make sure that an FHE (flexible hybrid electronics) circuit is assembled correctly and that the epoxy adheres correctly.”

Advanced packaging adds another element in this equation, because even in lower-end applications the cost of variation in multiple chips is significantly higher than a single chip.

“As we expand into advanced packaging, the bar shifts for when you spend too much money and time,” said Ajit Paranjpe, CTO of Veeco. “You may have randomness due to a factor you cannot understand or control, and every factor has natural variation. So random variation may be improved by better heat control or it may be layout-dependent. The question is how much you’re willing to invest to solve these issues.”

Reliability concerns
The use of complex chips in safety-critical applications, such as the logic for autonomous or assisted driving, adds new concerns about random variation, particularly for advanced-node designs. In the past, chips developed at leading-edge nodes never were used under extreme conditions. All of that has changed in the effort to build increasingly autonomous vehicles because AI logic chips require as much die area as possible to process data quickly. On top of that, liability concerns are especially high in this market, which is why automakers are demanding these chips perform to spec for at least 18 years.

“You must find issues before product qualification,” said Lam’s Fried. “You can’t wait 10 years to discover an issue. This is what elevated voltage testing is for. If a chip is designed to run at 1 volt at 85ºC, that’s its normal condition. So you run that part at 1.6 volts at 125ºC for two months in an oven, and that’s going to simulate 15 years of actual in-use testing. That’s how you bring out these defects before you qualify the product. The problem is that it’s becoming very difficult to simulate those acceleration conditions and not do instantaneous damage. As the nodes get more and more advanced, figuring out those conditions becomes increasingly difficult. The application requirements are going up. The technology is getting smaller and more complex. So it makes it more difficult to accelerate defects. There is more data and more processing needed to bring out defects early.”

The crossover between random variation and reliability is not new, but reducing random variation in the context of reliability often has been brushed aside in the past.

“Reliability is the tail of the yield curve,” said Fried. “Think about two conductors separated by a small bit of dielectric, and the fact that the separation of those conductors has some variation. Maybe the nominal thickness of the dielectric is 15nm, and you know that you have a potential reliability issue if the thickness ever goes down to 7nm.  You probably have to set a yield limit somewhere around 9nm at time zero to cover yourself. With reliability requirements in critical applications such as autonomous driving systems, you need to build in a larger yield buffer because the impact of reliability failure is so significant.”

Addressing this becomes tougher at each new node, and as more chips from those advanced nodes are used in safety-critical applications.

“You can trace failures back to all of their individual root causes and how they drive the reliability mechanism,” he noted. “You also can aggressively screen with testing at the output end and accept that there is variation that you won’t be able to trace back into individual process control. So you do things like elevated voltage testing, where you try to drive out the defects early. Increasingly, you have to take both tacks. You trace back to individual variations and tighten them up, but you recognize you’ll never get all of them. You will have to take an aggressive stance on pre-screen, elevated voltage tests, and schemes like that.”

In automotive applications, this is something of a double whammy. Process margin generally is dictated by device performance and scaling, but automakers want to trim costs and reduce power for these systems. The emphasis is on less margin, less redundancy, and ultimately less weight.

“Understanding the sources of random variation is important in order to implement the right strategy to detect and correct for deviations that result in narrow distribution so that the process capability index (Cpk) can be achieved,” said Applied’s Khasgiwale. “Process systems generate lots of data, and in the past most of it went unused either because the windows were larger or there were limited knobs on the process system to react to the information. However, a lot of attention is now being given to reducing the unknowns, and equipment providers are dealing with narrowing process windows in many ways.”

Among the solutions Khasgiwale cited were using first-principle models and new materials and chemistries to achieve a tighter, repeatable process; developing best-known methods to maintain chamber and process conditions within narrow windows, and adding more sensors into equipment, as well as more process knobs, to correlate for finer control of the process.

Machine learning helps in this area, as well, because it can be used to identify patterns in data that are not discernible by humans.

“As the number of sensors and knobs increase on each system, mapping out process space becomes difficult using a simple design of experiments (DOE),” Khasgiwale said. “OEMs are implementing new AI/ML methods to map out multidimensional space and enable the process system to stay within the target sweet spot.”

Cost concerns
All of this comes at a cost, though, and what is acceptable may vary greatly from one vendor to the next, and from one product to the next.

“There is systematic yield loss cause and effect, and parametric limited yield loss, and then there is random yield loss, which is a systematic loss that you haven’t figured out yet,” said John Kiberian, CEO of PDF Solutions. “That may be due to contamination in the chamber or some other element, but there is always a certain amount of yield loss for which the cost to ameliorate that problem is greater than the benefit. There are intrinsically random things. If you start squeezing down the size of the fin, there are a handful of atoms for doping in that fin. If you add one or more atoms into that fin you’re doing to draw a certain level of variation. But generally speaking, you start looking at ways of removing the need for whatever that factor is. It may be shot noise randomness in photo resists. But when you start counting atoms or photons and you get down to a small number of those, you’re going to get a certain level of randomness in the system, and that has to be taken out with materials, process or circuit innovation. So you will live with it at a certain level, and you remove whatever the source of variability is for future nodes.”

And this is where being able to tweak the process is critical.

“Process systems are continually working to reduce variability with sensors, metrology, software and process knobs,” said Khasgiwale. “The random nature of variability means more sampling is needed. New inspection and metrology systems need to be developed, for example, to provision for massive sampling. Similarly, measurement often needs to happen close to the process and in a vacuum, as many of the new films and material stacks quickly degrade when exposed to impurities in the atmosphere.”

And all of this becomes more important at each new node.

“As we move from 10nm to 7nm to 5nm to 3nm, the process variation specs are getting smaller, and the sensors we put into the tools have to support higher accuracy and resolution,” said Fried. “Things that you would typically do with a single-frequency optical detector now have to be done using full-spectrum optical detection. Each of these sensors has to move in a more advanced way to detect at finer resolution. That increases the amount and complexity of the data that you need to find potential reliability issues. There are more data points and more dimensions to the data, as well as different structures and formats for that data. Each increment of sensor technology adds layers of complexity to the decoding of that information.”

Random variation exists in all devices, materials and manufacturing processes. Not everything behaves in ways that are expected. An alpha particle can flip a bit in memory, and different ambient conditions may expose unanticipated behavior, such as the performance boost caused by on-chip heat at 28nm.

But much of what is classified as random is unknown rather than truly random, and there is a cost for identifying the causes of unexpected behavior. In an increasing number of cases, it’s becoming essential to explore those causes in more depth, and that trend that is likely to continue as the cost of getting something wrong continues to rise.

Related Stories
Variation Issues Grow Wider And Deeper
New sources, safety-critical applications and tighter tolerances raise new questions both inside and outside the fab.
Variation At 10/7nm
Why the middle of line is now a major problem.
Variation’s Long Tentacles
What used to be someone else’s problem is now everyone’s problem.
Controlling Variability And Cost At 3nm And Beyond
Lam’s CTO talks about how more data, technology advances and new materials and manufacturing techniques will extend scaling in multiple directions.
Variation Knowledge Center
Top stories, videos, blogs, white papers on Variation

Leave a Reply

(Note: This name will be displayed publicly)