Deep learning and digital twins can help identify patterns and ultimately may be able to fix any problems that arise, but it will take awhile.
Experts at the Table: Semiconductor Engineering sat down to discuss chip scaling, transistors, new architectures, and packaging with Jerry Chen, head of global business development for manufacturing & industrials at Nvidia; David Fried, vice president of computational products at Lam Research; Mark Shirey, vice president of marketing and applications at KLA; and Aki Fujimura, CEO of D2S. What follows are excerpts of that conversation. To view part one of this discussion, click here.
SE: Machine learning is a hot topic. This technology uses a neural network to crunch data and identify patterns, then matches certain patterns and learns which of those attributes are important. Why is it so important for chip manufacturing?
Fujimura: I differentiate the recent advances as deep learning, a subset of machine learning. Machine learning, in turn, is typically characterized as a subset of artificial intelligence. Machine learning techniques started to get used back in the 1990s or even earlier than that. Some of these techniques were AI. Some of them were precursors of deep learning. Some of them were software algorithms that I personally think are not really AI. I see deep learning as being different. When I was going to school as an AI student in the 1970s and 1980s, there were already top computer scientists in AI, trying to write a program that could beat a chess master. They kept trying, but it only became possible in the last decade through deep learning.
SE: What’s the impact of that?
Fujimura: There are things that weren’t practically possible with only non-deep learning, which are now possible with deep learning. With GPU-accelerated deep learning, we are able to see a program that could beat a chess master regularly. That’s the key difference there. Meanwhile, at the Center for Deep Learning in Electronics Manufacturing (CDLe) at D2S, we’re exploring how we can make the same difference in the semiconductor manufacturing segment. While we haven’t gotten to quite the same level of enablement, deep learning clearly will be able to do things that non-deep machine learning or any other software techniques have not been able to do in the past. In our quest, we have conducted over 20 projects in deep learning for semiconductor manufacturing, in particular mask manufacturing, wafer design, FPGA design and PCB assembly automation. In all of these different areas, we found very useful applications. Many of these 20 projects were feasibility studies. They got to a prototype stage and we had good results. Three projects, in particular, are now in the process of being productized. They will be announced later this year or early next year. Deep learning, meanwhile, is a new compute model. I call it ‘useful waste.’ Basically, you choose not to worry about being selective about what you compute and what you don’t compute. You don’t say: ‘Let’s first figure out which things are worth computing and which things are not.’ That used to be the kind of computing we did. There were AI techniques like alpha pruning, which was exactly that. How can you minimize the amount of wasteful computing? That used to be the focus. What changed was computing with a GPU and other single-instruction multiple-data (SIMD) computing. SIMD including GPUs scale by bit-width rather than only by clock speed, and that’s the key. What happened as a result is that computing or processing really did become free. We could say, ‘Just let the machine do it.’ The machine is tireless. Let it compute for a week and come up with a neural network that works. And then, when you actually run it, it doesn’t take very long for what is called inferencing in deep learning terminology. Trying to figure out what your network looks like or how it is programmed, or parameterized, might take a long time, but who cares? Let it run with just brute-force computing. Then, when you use that parameterized network, the inferencing is fast. That model of computing became practical by leveraging the bit-width scaling of SIMD made economically feasible by GPUs. Deep learning techniques and neural network computing just happens to be particularly suitable for SIMD. That really is what broke a dam. I would predict that there would be other kinds of computing approaches that are not deep learning, but something else that’s also taking advantage of this useful waste kind of an approach.
Chen: AI can do astonishing things that were considered grand challenges less than 10 years ago. But in the end, it’s just another HPC (high-performance computing) workload, with unique computational characteristics and tools like any other. Just as we know how to build architectures that are great for graphics and HPC, we’ve also learned how to build great architectures for both AI training and inference. Turns out there’s a lot of architectural leverage for all of these workloads.
Figure 1. Simplified view of an artificial neural network. The connections are modeled as weights. In operation, all inputs are modified by a weight and summed. Then, an activation function controls the amplitude of the output. Source: Wikipedia
SE: At one time, the industry relied on pure physics to solve problems in manufacturing and elsewhere. Can we just continue using pure physics to solve these problems, or do we need technologies like machine learning here?
Chen: First principles-based physics methods are always foundational. But sometimes the physics are not fully understood, or the computational cost to simulate them is not possible or practical. In these cases, researchers are starting to infuse physics-based models with AI models using a hybrid approach to get the best of both worlds. This hybrid approach combines the physics that we understand with a data-driven representation of the physical behavior that’s captured by measured data. In manufacturing operations, this approach helps you to bridge the gap between what your physics-based models are predicting versus what your sensors actually perceive. Lithography is a great example where this approach has become extremely valuable, since your process window is so sensitive to modeling errors. There’s a lot of data on both sides of the equation. Both the physics-driven approaches as well as the data-driven approaches seem to work really well on GPUs. This is the sweet spot where things get interesting.
SE: What are the main applications for machine learning in semiconductor manufacturing?
Shirey: One of the main applications for machine learning is defect detection and classification. The first step is using machine learning to detect actual defects and ignore noise. We are seeing many examples where machine learning is much better at extracting the actual killer defect signal from a noisy background of process and pattern variations. The second step is to leverage machine learning to classify defects. The challenge these days is that when optical inspectors run at high sensitivity to capture the most subtle, critical defects on the most critical layers, other anomalies are also detected. Machine learning is first applied to the inspection results to optimize the defect sample plan sent for review. Then, high-resolution SEM images are taken of those sites and additional machine learning is used to analyze and classify the defects to provide fab engineers with accurate information about the defect population – actionable data to drive process decisions. An emerging application is to make use of machine learning to be more predictive about where to inspect and measure. If you can aggregate more fab data and build correlations, then you can get smarter about where to inspect. That can be a very powerful solution for improving yields and stretching Moore’s Law economics.
Fried: Most of these applications fall into a broad category of process control, but it’s essentially tuning of process recipes to meet targeted on-wafer specifications. It starts with improving wafer-to-wafer control and uniformity, and using applications like etch endpoint control or gas flow adjustments. Every wafer arrives at a controlling process in slightly different conditions, based on variations that occur at previous process operations, and the process equipment is in its current state based on its environment and all previous states. If the equipment can automatically adjust recipe parameters like endpoint time or gas flows for each wafer, uniformity of the post-processed wafers can be improved and pre-existing variations can be reduced. It’s a major win. After that, you can move on to improving cross-wafer uniformity by controlling parameters like the chuck temperature zones. To make these controlling recipe adjustments, you need to have data to help monitor the sensitivity of process results to control parameters, and then model those relationships and implement control schemes.
Fujimura: A recent paper by Canon shows there was at least a trial deployment in a fab for predictive machine maintenance. Accelerating all types of iterative optimization programs, including OPC/ILT (optical proximity correction/inverse lithography technology) have been widely discussed by all vendors. Automatic defect categorization is clearly fit for deep learning. There also have been numerous papers on various aspects of SEM image processing, including de-noising, contour extraction, and digital twins from numerous sources. As a general trend, I am seeing deep learning applications enter the production deployment phase in 2021. Having digital twins be able to generate training data at will without having to actually manufacture all situations is the key to productization of deep learning.
SE: What are the challenges for machine learning? Is it accuracy and being able to obtain enough data? What about noise?
Chen: Accuracy is obviously non-negotiable. As I mentioned earlier, process windows are very sensitive to modeling errors. And the best AI models work precisely because they are good at extracting true signals from noisy data. There’s plenty of data in our industry, and lots of technical talent. My view is that the biggest challenge ahead of us is in integrating these newer data-driven approaches into production tools and workflows. But the leaders in the industry clearly are investing aggressively in machine learning and AI, and the laggards are realizing that this trend is inevitable.
Fujimura: Deep learning is relatively new, so there’s still a lot of opportunity for improvement in every aspect. That’s partly responsible for helping buoy the entire semiconductor industry and accelerating the use of supercomputing for every business, not just for scientific computing. For production deployment of deep learning-based applications, particularly in semiconductor manufacturing, what we found at our work with the CDLe is the need for digital twins. Deep learning needs vast amounts of data. In addition, a deep learning programmer improves the deep learning application mostly by manipulating the training data that is given to the network. For example, if an accuracy improvement of a deep learning network trained to differentiate a dog from a cat is confused by a rare dog, the programmer needs to come up with more pictures of those types of dogs and cats that look similar to those dogs and train the network, saying ‘this is a dog’ and ‘this is a cat’ repeatedly. The training process requires adding data at will as the programmer realizes what the neural network is confused by. For semiconductor manufacturing applications, it is absolutely essential to have either a simulation-based or a deep learning-based digital twin that can generate fake pictures at will so that the programmer is not hampered by the inability to generate training data at will. While a prototype of a promising deep learning application can be created with 10,000 or even 1,000 pictures, a production deployment takes millions. Having a digital twin is the only way to get what you need.
Shirey: Automation can be labor-intensive. Today, machine learning algorithms still need to be trained with labelled data. With respect to inspection, initially it takes an investment of time to build up classified defect libraries. However, once this is done, the algorithms work well in terms of accuracy and purity, and ultimately, by producing better quality data, they reduce the time needed to identify defect sources and take corrective action. It’s exciting to see today’s machine learning algorithms helping to build tomorrow’s AI chips, and we are optimistic that unsupervised machine learning applications will continue to grow throughout the semiconductor ecosystem.
Fried: Using four quadrants is exactly the right way to look at this challenge. In process control applications of machine learning, you have perception, followed by prediction, optimization, and control. What I focus on right now within Lam Research is trying to understand the output of sensors in an etch reactor or a deposition reactor. We are trying to predict the results on the wafer from the sensor data and the recipe information, optimize those results to satisfy a customer specification, and then control the reactor to produce those on-wafer results in high-volume manufacturing. So it’s exactly those four steps. Why can’t I just do this the conventional way using pure physics to predict what is happening in my reactor or on the wafer? We can use pure physics, and for pieces of the problem, people have been doing this for many years. But, if you just think about an etch or deposition reactor, there’s some pretty complex chemistry going on, plasma physics and thermal effects to consider, and temporal effects. For each one of those areas, you have 50 different equations to describe the physics, with a large number of parameters and many unknowns. You can try to code those effects starting from first principal physics. Unfortunately, it becomes basically an intractable task, because each of those equations reference one another, and producing an accurate solution becomes computationally very expensive. Then, there is the other end of the spectrum. If I say, ‘Forget the physics. Let me just use pure data and machine learning to model my reactor,’ I will use every parameter that I can control on my etch or deposition system and every recipe knob, and just use machine learning to predict what’s happening on the wafer. Unfortunately, the problem now becomes intractable for a different reason. If you look at the number of recipe controls (we call them ‘knobs’) on an etch or deposition reactor, and the number of possible recipe combinations, it’s in the e14 range. The point is that you’re never going to have enough data to predict the best recipe out of that many combinations. You’re never going to have enough data to use a pure data solution to understand the sensor data, predict the result, optimize that result, and then control the result. Fortunately, there is a realm in between using just pure physics or just machine learning. That’s where it’s really interesting, and where you’re coupling constraining physics with some of these data science techniques like machine learning, neural networks and deep learning. The data science is still emerging, but the computing power needed to execute these data science techniques is now nearly free. We’re starting to reach this valuable region in the middle, where you can start to combine physical governing equations to data science techniques. Using this technique, you can decrease the data requirements needed for an accurate solution. You also can start to increase the probability of solving these problems with sufficient accuracy at a computational cost that’s reasonable in that middle range. And that’s where the gold rush is right now. It’s where you join an understanding of the underlying physics with advanced computing techniques. We’ve built and have the software, the computing power, and the data science techniques to reach that region in the middle part of the spectrum between pure physics and data science. That’s the most interesting part of our business today.
SE: In a keynote at the recent SPIE Advanced Lithography conference, a speaker from Nvidia listed several applications for deep learning in the fab and mask shop, such as automatic defect classification, OPC repair, model accuracy, place-and-route, and lithography modeling. Where is all this heading?
Chen: We’re starting to apply these techniques to design better devices by integrating them into our own tools. We also believe these techniques can help us to optimize our designs for both performance and for manufacturability, which of course benefits us as well as our foundry partners. We’re already seeing some initiatives bearing fruit, both on the design side as well as on the fabrication side.
Related Stories
Leave a Reply