ML will augment existing manufacturing processes, but it won’t replace them.
Amid the shift towards more complex chips at advanced nodes, many chipmakers are exploring or turning to advanced forms of machine learning to help solve some big challenges in IC production.
A subset of artificial intelligence (AI), machine learning, uses advanced algorithms in systems to recognize patterns in data as well as to learn and make predictions about the information. In the fab, machine learning promises to provide faster and more accurate results in select areas, such as finding and classifying defects in chips. Machine learning also is used in other process steps, but there are still some challenges to deploy it.
Machine learning isn’t new. It has been used in computing and other fields for decades. It first appeared in semiconductor production in the 1990s. Some saw it as a way to help automate the steps for some manually-driven fab equipment.
Over time, machine learning has made staggering progress in computing and elsewhere. Recently, many chipmakers have deployed the technology for select applications in inspection, lithography and metrology. But it doesn’t solve all problems in the fab, and it will not replace the traditional methods. So far, more advanced forms of machine learning are not widely deployed throughout the fab, and some gaps remain. In machine learning, for example, a system requires large data sets. If the data sets are insufficient, a system can generate questionable results.
Still, some chipmakers are now either exploring or starting to use more advanced forms of machine learning algorithms in more parts of the fab. In one form or another, machine learning has the potential to help boost some fab processes.
“The majority of what’s deployed in fabs is still single-level machine learning algorithms. Most of the industry is just beginning to make the transition to deep learning with a big gap between leaders and followers,” said Dan Hutcheson, chief executive of VLSI Research. “It varies quite a bit, but I’d say the applications that have reaped the most rewards are preventive maintenance, defect classification, improved sub-wavelength resolution, design-for-manufacturing, optimizing design rule constraints, managing yield and quality, and predicting die yields.”
With that in mind, some chipmakers are looking at machine learning to gain a competitive edge. Potentially, it may accelerate the cycles of learning in the fab and speed up product development times. “Everybody’s looking at it — some people have been looking at it for two or three years, and some people are just starting to look at it today,” said Ted Doros, senior member of the technical staff at Micron Technology. “We know that our competitors are dabbling, or more than dabbling with it, and so we are in a race. We know that everybody is running a similar kind of race.”
Nonetheless, chip customers need to keep an eye on the technology and how it might impact current and future IC designs. Here are just some of the places where machine learning is being used in the fab today:
What is machine learning?
Semiconductor fabs are automated facilities that process wafers using a variety of equipment in a cleanroom. In operation, a batch of wafers is transported to one piece of equipment and then processed based on a given manufacturing flow. Then the wafers are transported to the next equipment and processed, and so on.
In the early days of the IC industry, semiconductor manufacturing was relatively primitive. “The tools themselves, for the most part, were mechanically turned on and off. Someone programmed different parameters for them to function,” VLSI Research’s Hutcheson said. “The first computer controls didn’t arrive until sometime in the 1990s. What I mean is real-time computer process control.”
Generally, the fab equipment was, and still is, designed using mathematical models and conventional programming methods. The mathematical models describe how a system is supposed to operate.
Fab tools were also manually driven in the early days. In semiconductor production, machine learning emerged in the 1990s, when some chipmakers began to look at it as a means to automate some fab tool processes.
Machine learning is different than the traditional math models and programming techniques. It utilizes a neural network to crunch data and identify patterns. It matches certain patterns and learns which of those attributes are important.
There are two main types of machine learning, supervised and unsupervised. In supervised machine learning, “an algorithm uses training data and feedback from humans to learn the relationship of given inputs to a given output,” according to one definition from McKinsey & Co. In unsupervised machine learning, “an algorithm explores input data without being given an explicit output variable.”
In the beginning, though, machine learning was limited. For it to work, the technology requires compute power and large data sets. In the 1990s, the industry lacked both.
That’s all changed. “Machine learning started to get useful in the 1990s,” said Aki Fujimura, chief executive of D2S. “But that has changed in the last six years with the advent of GPUs. GPUs enabled deep learning to happen because there is so much more compute power available.”
Machine learning augments the math models rather than displacing them. Today, equipment makers and fabs use both technologies to automate and speed up a given process.
“A human devises a model, usually through some understanding of physics, chemistry or math,” Fujimura explained. “Machine learning can automate choosing the parameters of the model to make it much faster for the human to explore various model forms.”
Over time, the IC industry has implemented the technology, at least in some apps. “Machine learning includes the traditional linear regression types of curve fitting for modeling,” Fujimura said. “In this sense, machine learning has been in use for all types of model generation, including mask models, wafer lithography and processing models. Fabs and mask shops also use classical machine learning in the ‘big data’ analysis of all the operation data available to look for ways to improve yield and prevent downtime.”
Now, some are exploring or using advanced forms of machine learning in the fab. “I first found chipmakers applying deep learning in the sub-fab about four years ago. When I looked further, I realized it was being heavily used in metrology, inspection and lithography tools,” VLSI Research’s Hutcheson said. “About two years ago, I found it being used by fabless companies for quality control and in EDA tools. But that’s what I have seen, as many companies have kept their work super-secret.”
It isn’t used everywhere. It’s simply one of many technologies that chipmakers have at their disposal. “It’s a tool in the toolbox,” Hutcheson said.
Classifying defects
In semiconductor manufacturing, the goal is to produce chips with no defects. But defects may crop up in chips due to a glitch in the process.
To find those defects, chipmakers use inspection systems in the fab. The inspection system takes images of the chips and compares them to a database in the system to determine if there is a defect.
Then, the images are fed into a separate defect review system, which classifies the defects into pre-defined categories. The goal is to find the cause of the defects.
For years, defect review was a manual process. Using a microscope, human operators located and classified the defects in the system. This was a time-consuming, error-prone process.
That began to change in the 1990s. In 1997, for example, IBM devised an automated defect classification (ADC) technology using third-party inspection tools and arguably the early forms of machine learning.
IBM integrated a defect review system with a machine vision camera. In operation, the camera automatically looked at the images. Then, the software classified the defects into pre-defined categories, which were learned from training samples, according to IBM.
The system, however, wasn’t fast enough, and it did not have enough data to work with. Used for IBM’s 16Mbit DRAMs, the system classified simple defects with an accuracy of 80%. Operators were still required to classify the complex defects.
The world has changed since then. Today, the chips are more complex with tiny features. In logic, chipmakers are ramping up 10nm/7nm, with 5nm in R&D. “Then, there are some new materials. You have cobalt, ruthenium and other things. They come with their own challenges,” said Mohan Iyer, head of marketing for the E-beam division at KLA.
Finding the defects in devices is a challenging task. For this, chipmakers use more powerful inspection tools and ADC review systems today.
ADC with machine learning is also part of the mix. Without machine learning, it can take 6 to 9 months to train people to classify the defects manually with 90% accuracy, according to a recent white paper from Intel. Even after training, a human operator is only 75% to 80% accurate over time, according to Intel.
In the paper, Intel describes how it uses ADC with machine learning in the fab. Using a scanning electron microscope, Intel takes a multitude of images of wafers during the fabrication process, according to the paper.
Using a PC, the images are sent to a classification server. Then, the images are automatically labeled and defects are categorized, according to Intel.
The labeled images are sent back to the PC, where they are moved to a storage unit for analysis. Labeled images are also sent to a separate model building server, where the models are retrained with new information, according to Intel.
On the memory front, meanwhile, Micron also has deployed ADC with machine learning in one fab. Previously, Micron had human operators, who reviewed and manually classified defects at a rate of millions per year.
Recently, Micron has automated the process by deploying what it calls AI-ADC within its Fab 6 facility. Based in Manassas, Va., the fab produces DRAM, NAND, and NOR products. Micron is looking to use AI-ADC in other fabs.
As part of the effort, Micron has moved millions of saved labeled defect images into a Hadoop cluster. Then, the company uses Nvidia’s DGX-1 to train a deep neural network on the large data set. The DGX-1 is a GPU-based deep learning platform.
“We pull the stored images from the Hadoop cluster and run the training on an image training set,” Micron’s Doros said. “We then run model testing on a related but different image set, and score the model. Once a model scores high enough, it is frozen to be used for the active labeling.”
Fig. 1: A neural network is a function that can learn Source: Micron
In operation, Micron processes the wafers in the fab. The inspection tool may find a defect. Then, the separate AI-ADC system reviews and auto-classifies most defects. Less common defects are still classified by operators.
So what’s the bottom line here? “Today, the deep neural net we built now takes the place of the humans for most of the defect classifications or labeling,” Doros said. “What AI brings is consistency and speed. It’s only a fraction of a second for it to determine what it is and it is also very consistent.”
Finding and classifying the defects is only part of the battle. The goal is to find the root cause of defects and then eliminate them.
Advanced ADC is helping here. For example, Applied Materials has developed an ADC technology that finds defects and classifies them. Then, the system uses different imaging techniques to help pinpoint the problem.
“We’ve had automatic detect classification in the field for many years now. In the last few years, we’ve added capabilities using machine learning,” said Rafi Benami, vice president of the Inspection and Review Division at Applied Materials. “We are also using what we call automated defect analysis. For example, we see a defect and say: ‘Let’s try to utilize a different imaging technique.’ Then we might be able to say what is the root cause of the defect. All of this is based on machine learning.”
More ML
Besides ADC, machine learning is appearing in other parts of the fab, such as lithography, metrology and others — at least in select cases.
It’s also appearing in test. In the flow, the wafer is processed in the fab, and then moves into a test phase called wafer sort. In wafer sort, a system called a wafer prober is used to conduct an electrical test on each device on the wafer. The goal is to weed out the bad dies.
Some are putting a new spin on wafer sort. In Fab 6, for example, Micron is using electrical characterization data from probe test to classify wafer maps using supervised and unsupervised machine learning.
For this, Micron takes a probe reading of the wafer. “Then, we will turn it to a black and white image. And we look at the basic macro kind of signatures,” Micron’s Doros said. “Broadly speaking, we ask, “Is that a center wafer defect? Is it the edge of the wafer defect? Is a crescent defect?’”
The next step is to classify the macro patterns using a neural network. “It’s an unsupervised first pass. In other words, you haven’t given it any labels yet,” Doros said. “The neural net is going to start to separate and put them into their own clusters based on the features or the patterns.”
Then, AI experts will classify and label the defect types. At that point, the data is fed into a separate convolutional neural network. “Once you have that, it’s like you have a bunch of mug shots in a catalog,” he said. “Now, here comes a new wafer. Then, you say: ‘Can you find this one in the catalog?’ The system will walk through and look for similarities between the new pattern coming in and the known catalog.”
If the system determines there is a match, it will tag it. If there is no match, the neural network will put it into an anomaly bucket. The more prevalent anomaly patterns may end up in the final catalog.
At times, the AI experts will review the performance of the system. “The expert team will look at it and say, ‘The AI system is doing well, or the AI system needs a little bit more in this one area.’ But overall, it does a very good job of doing the classification,” he added.
Meanwhile, for years, predictive maintenance has been a critical technology in the fab. The idea is to monitor and predict the performance of the equipment to reduce failures and downtimes. For this, chipmakers use conventional fab detection methods to locate and predict failures.
There are some novel approaches, as well. For example, in Fab 6, Micron has deployed what it calls acoustic anomaly detection. For this, Micron places a microphone on a robot near the equipment. Using machine learning, a system is trained to detect acoustic anomalies to identify potential failures and variations in the equipment.
In the fab, the technology is used to alert the maintenance teams. It prevents unscheduled downtimes.
Acoustic technology has evolved, making it useful in the fab, at least for some. “What used to be just too noisy is now a rich pool of data to be extracted,” Micron’s Doros said. “What helps it is just the sheer compute power that we have today, and then adding AI on top of it to help filter these things out.”
Initially, Micron has placed microphones on the robots near the lithography scanners and track systems. Micron plans to insert the technology in other parts of the fab.
Meanwhile, Lam Research has developed another technology in the arena — self-maintaining etch equipment.
Typically, etch process modules are cleaned weekly or monthly to maintain a stable performance. The parts are often replaced due to erosion. This impacts system productivity.
In response, Lam has developed self-maintaining etch systems using machine learning and self-aware hardware. With the technology, Lam’s etch platforms have demonstrated one year of uninterrupted production. This reduces tool downtime and boosts productivity.
“This involves the features of the tool. These are things like self-maintenance and not having to open up the tool. It’s a significant cost benefit to the customer. Being able to adapt the process without the same amount of overhead and cleaning the chambers is another big opportunity,” said Richard Gottscho, CTO at Lam.
Clearly, machine learning is no longer a novelty in the fab. Chipmakers will continue to leverage the technology as it matures. It’s the answer for some problems. But it won’t solve all problems, at least in the near term.
Related Stories
If it takes 6-9 months to deploy a normal ADC, how much would it take for the AI-ADC?