The impact of deep learning and new technology on scaling to future nodes.
Aki Fujimura, chief executive of D2S, sat down with Semiconductor Engineering to talk about AI and Moore’s Law, lithography, and photomask technologies. What follows are excerpts of that conversation.
SE: In the eBeam Initiative’s recent Luminary Survey, the participants had some interesting observations about the outlook for the photomask market. What were those observations?
Fujimura: In the last couple of years, mask revenues have been going up. Prior to that, mask revenues were fairly steady at around $3 billion per year. Recently, they have gone up beyond the $4 billion level, and they’re projected to keep going up. Luminaries believe a component of this increase is because of the shift in the industry toward EUV. One question in the survey asked participants, ‘What business impact will COVID have on the photomask market?’ Some people think it may be negative, but the majority of the people believe that it’s not going to have much of an effect — or it might have a positive effect. At a recent eBeam Initiative panel, the panelists commented that the reason for a positive outlook might be because of the demand picture in the semiconductor industry. The shelter-in-place and work-from-home environments are creating more need and opportunities for the electronics and semiconductor industries.
SE: How will extreme ultraviolet (EUV) lithography impact mask revenues?
Fujimura: In general, two thirds of the participants in the survey believe that it will have a positive impact. When you go to EUV, you have a fewer number of masks. This is because EUV brings the industry back to single patterning. 193nm immersion with multiple patterning requires more masks at advanced nodes. With EUV, you have fewer masks, but mask costs for each EUV layer is more expensive.
SE: For decades, the IC industry has followed the Moore’s Law axiom that transistor density in chips doubles every 18 to 24 months. At this cadence, chipmakers can pack more and smaller transistors on a die, but Moore’s Law appears to be slowing down. What comes next?
Fujimura: The definition of Moore’s Law is changing. It’s no longer looking at the trends in CPU clock speeds. That’s not changing much. It’s scaling more by bit width than by clock speed. A lot of that has to do with thermal properties and other things. We have some theories on where we can make that better over time. On the other hand, if you look at things like massively parallel computing using GPUs or having more CPU cores and how quickly you can access memory — or how much memory you can access — if you include those things, Moore’s Law is very much alive. For example, D2S supplies computing systems for the semiconductor manufacturing industry, so we are also a consumer of technology. We do heavy supercomputing, so it’s important for us to understand what’s happening on the computing capability side. What we see is that our ability to compute is continuing to improve at about the same rate as before. But as programmers we have to adapt how we take advantage of it. It’s not like you can take the same code and it automatically scales like it did 20 years ago. You have to understand how that scaling is different at any given point in time. You have to figure out how you can take advantage of the strength of the new generation of technology and then shift your code. So it’s definitely harder.
SE: What’s happening with the logic roadmap?
Fujimura: We’re at 5nm in terms of what people are starting to do now. They are starting to plan 3nm and 2nm. And in terms of getting to the 2nm node, people are pretty comfortable. The question is what happens beyond that. It wasn’t too long ago that people were saying: ‘There’s no way we’re going to have 2nm.’ That’s been the general pattern in the semiconductor industry. The industry is constantly re-inventing itself. It is extending things longer than people ever thought possible. For example, look how long 193nm optical lithography lasted at advanced nodes. At one time, people were waiting for EUV. There was once a lot of doom and gloom about EUV. But despite being late, companies developed new processes and patterning schemes to extend 193nm. It takes coordination by a lot of people to make this happen.
SE: How long can we extend the current technology?
Fujimura: There’s no question that there is a physical limit, but we are still good for the next 10 years.
SE: There’s a lot of activity around AI and machine learning. Where do you see deep learning fitting in?
Fujimura: Deep learning is a subset of machine learning. It’s the subset that’s made machine learning revolutionary. The general idea of deep learning is to mimic how the brain works with a network of neurons or nodes. The programmer first determines what kind of a network to use. The programmer then trains the network by presenting it with a whole bunch of data. Often, the network is trained by labeled data. Using defect classification as an example, a human or some other program labels each picture as being a defect or not, and may also label what kind of defect it is, or even how it should be repaired. The deep learning engine iteratively optimizes the weights in the network. It automatically finds a set of weights that would result in the network to best mimic the labels. Then, the network is tried on data that it wasn’t trained on to test to see if the network learned as intended.
SE: What can’t deep learning do?
Fujimura: Deep learning does not reason. Deep learning does pattern matching. Amazingly, it turns out that many of the world’s problems are solvable purely with pattern matching. What you can do with deep learning is a set of things that you just can’t do with conventional programming. I was an AI student in the early 1980s. Many of the best computer scientists in the world back then (and ever since) already were trying hard to create a chess program that could beat the chess masters. It wasn’t possible until deep learning came along. Applied to semiconductor manufacturing, or any field, there are classes of problems that had not been practically possible without deep learning.
SE: Years ago, there wasn’t enough compute power to make machine learning feasible. What changed?
Fujimura: The first publication describing convolutional neural networks was in 1975. The researcher, Dr. Kunihiko Fukushima, called it ‘neocognitron’ back then, but the paper basically describes deep learning. But computational capability simply wasn’t sufficient. Deep learning was enabled with what I call ‘useful waste’ in massive computations by cost-effective GPUs.
SE: What problems can deep learning solve?
Fujimura: Deep learning can be used for any data. For example, people use it for text-to-speech, speech-to-text, or automatic translation. Where deep learning is most evolved today is when we are talking about two-dimensional data and image processing. A GPU happens to be a good platform for deep learning because of its single instruction multiple data (SIMD) processing nature. The SIMD architecture is also good at image processing, so it makes sense that it’s applied in that way. So for any problem in which a human expert can look at a picture without any other background knowledge and tell something with high probability, deep learning is likely to be able to do well.
SE: What about machine learning in semiconductor manufacturing?
Fujimura: We have already started to see products incorporating deep learning both in software and equipment. Any tedious and error-prone process that human operators need to perform, particularly those involving visual inspection, are great candidates for deep learning. There are many opportunities in inspection and metrology. There are also many opportunities in software to produce more accurate results faster to help with the turnaround time issues in leading-edge mask shops. There are many opportunities in correlating big data in mask shops and machine log files with machine learning for predictive maintenance.
SE: What are the challenges?
Fujimura: Deep learning is only as good as the data that is being given, so caution is required in deploying deep learning. For example, if deep learning is used to screen resumes by learning from labels provided by prior hiring practices, deep learning learns the biases that are already built into the past practices, even if unintended. If operators tend to make a type of a mistake in categorizing an image, deep learning that learned from the data labeled by that operator’s past behavior would learn to make the same mistake. If deep learning is used to identify suspected criminal behavior in the street images captured by cameras on the street based on past history of arrests, deep learning will try the best it can to mimic the past behavior. If deep learning is used to identify what a social media user tends to want to see in order to maximize advertising revenues, deep learning will learn to be extremely good at showing the user exactly what the user tends to watch, even if it is highly biased, fake or inappropriate. If misused, deep learning can accentuate and accelerate human addiction and biases. Deep learning is a powerful weapon that relies on the humans wielding it to use it carefully.
SE: Is machine learning more accurate than a human in performing pattern recognition tasks?
Fujimura: In many cases, it’s found that a deep learning-based program can inference better with a higher percentage of accuracy than a human, particularly when you look at it over time. A human might be able to look at a picture and recognize it with a 99% accuracy. But if the same human has to look at a much larger data set, and do it eight hours a day for 200 days a year, the performance of the human is going to degrade. That’s not true for a computer-based algorithm, including deep learning. The learning algorithms process vast amounts of data. They go through small sections at a time and go through every single one without skipping anything. When you take that into account, deep learning programs can be useful for these error prone processes that are visually oriented or can be cast into being visually oriented.
SE: The industry is working on other technologies to replicate the functions of the brain. Neuromorphic computing is one example. How realistic is this?
Fujimura: The brain is amazing. It will take a long time to create a neural network of the actual brain. There are very interesting computing models in the future. Neuromorphic is not a different computing model. It’s a different architecture of how you do it. It’s unclear if neuromorphic computing will necessarily create new kinds of capabilities. It does make some of them more efficient and effective.
SE: What about quantum computing?
Fujimura: The big change is quantum computing. That takes a lot of technology, money and talent. It’s not an easy technology to develop. But you can bet that leading technology countries are working on it, and there is no question in my mind that it’s important. Take security, for example. 256-bit encryption is nothing in basic quantum computing. Security mechanisms would have to be significantly revamped in the world of quantum computing. Quantum computing used in a wrong way can be destructive. Staying ahead of that is a matter of national security. But quantum computing also can be very powerful in solving problems that were considered intractable. Many iterative optimization problems, including deep learning training, will see major discontinuities with quantum computing.
SE: Let’s move back to the photomask industry. Years ago, the mask was simple. Over time, masks have become more complex, right?
Fujimura: At 130nm or around there, you started to see decorations on the mask. If you wanted to draw a circle on the wafer using Manhattan or rectilinear shapes, you actually drew a square on the mask. Eventually, it would become a circle on the wafer. However, starting at around 130nm, that square on the mask had to be written with decorations in all four corners. Then, SRAFs (sub-resolution assist features) started to appear on the mask around 90nm. There might have been some at 130nm, but mostly at 90nm. By 22nm, you couldn’t find a critical layer mask that didn’t have SRAFs on them. SRAFs are features on the mask that are designed explicitly not to print on the wafer. Through an angle, SRAFs project light into the main features that you do want to print on a wafer — enough so that it helps to augment the amount of energy that’s being applied to the resist. Again, this makes the printing of the main features more resilient to manufacturing process variation.
SE: Then multiple patterning appeared around 16nm/14nm, right?
Fujimura: The feature sizes became smaller and more complex. When we reached the limit of resolution for 193i, there was no choice but to go to multiple patterning, where multiple masks printed one wafer layer. You divide the features that you want on a given wafer layer and you put them on different masks. This provided more space for SRAFs for each of the masks. EUV for some layers is projected to go to multiple patterning, too. It costs more to do multiple patterning, but it is a familiar and proven technique for extending lithography to smaller nodes.
SE: To pattern a photomask, mask makers use e-beam mask writer systems based on variable shaped beam (VSB) technology. Now, using thousands of tiny beams, multi-beam mask writers are in the market. How do you see this playing out?
Fujimura: Most semiconductor devices are being patterned using VSB writers for the critical layers. That’s working fine. The write times are increasing. If you look at the eBeam Initiative’s recent survey, the average write times are still around 8 hours. Going forward, we are moving toward more complex processes with EUV masks. Today, EUV masks are fairly simple. Rectangular writing is enough. But you need multi-beam mask writers because of the resist sensitivity. The resists are slow in order to be more accurate. We need to apply a lot of energy to make it work, and that is better with multi-beam mask writers.
SE: What’s next for EUV masks?
Fujimura: EUV masks will require SRAFs, too. They don’t today at 7nm. SRAFs are necessary for smaller features. And, for 193i as well as for EUV, curvilinear masks are being considered now for improvements in wafer quality, particularly in resilience to manufacturing variation. But for EUV in particular, because of the reflective optics, curvilinear SRAFs are needed even more. Because multi-beam mask writing enables curvilinear mask shapes without a write time penalty, the enhanced wafer quality in the same mask write time is attractive.
SE: What are the big mask challenges going forward?
Fujimura: There are still many. EUV pellicles, affordable defect-free EUV mask blanks, high- NA EUV, and actinic or e-beam-based mask inspection — both in the mask shop and in the wafer shop for requalification — are all important areas for advancement. Now, the need to adopt curvilinear mask shapes has been widely acknowledged. Data processing, including compact and lossless data representation that is fast to write and read, is an important challenge. Optical proximity correction (OPC) and inverse lithography technology (ILT), which are needed to produce these curvilinear mask shapes to maximize wafer performance, need to run fast enough to be practical.
SE: What are the challenges in developing curvilinear shapes on masks?
Fujimura: There are two issues. Without multi-beam mask writers, producing masks with curvilinear shapes can be too expensive or may practically take too long to write. Second, controlling the mask variation is challenging. Once again, the reason you want curvilinear shapes on the mask is because wafer quality improves substantially. That is even more important for EUV than in 193nm immersion lithography. EUV masks are reflective. So, there is also a 6-degree incidence angle on EUV masks. And that creates more desire to have curvilinear shapes or SRAFs. They don’t print on wafer. They are printed on the mask in order to help decrease process variation on the wafer.
SE: What about ILT?
Fujimura: ILT is an advanced form of OPC that computes the desired mask shapes in order to maximize the quality of wafer lithography. Studies have shown that ILT — in particular, unconstrained curvilinear ILT — can produce the best results in terms of resilience to manufacturing variation. D2S and Micron recently presented a paper on the benefits of full-chip, curvilinear stitchless ILT with mask-wafer co-optimization for memory applications. This approach enabled more than a 2X improvement in process windows.
SE: Will AI play a big role in mask making?
Fujimura: Yes. In particular, with deep learning, the gap between a promising prototype and a production-level inference engine is very wide. While there was quite a bit of initial excitement over deep learning, the world still has not seen very much in production adoption of deep learning. A large amount of this comes from the need for data. In semiconductor manufacturing, data security is extremely important. So while a given manufacturer would have plenty of data of its own kind, a vendor of any given tool, whether software or equipment, has a difficult time getting enough customer data. Even for a manufacturer, creating new data — say, a SEM picture of a defect — can be difficult and time-consuming. Yet deep learning programming is programming with data, instead of writing new code. If a deep learning programmer wants to improve the success rate of an inference engine from 92% to 95%, that programmer needs to analyze the engine to see what types of data it needs to be additionally trained to make that improvement, then acquire many instances of that type of data, and then iterate. The only way this can be done efficiently and effectively is to have digital twins, a simulated environment that generates data instead of relying only on physical real sample data. Getting to 80% success rate can be done with thousands of collected real data. But getting to 95% success rate requires digital twins. It is the lack of this understanding that is preventing production deployment of deep learning in many potential areas. It is clear to me that many of the tedious and error-prone processes can benefit from deep learning. And it is also clear to me that acceleration of many computing tasks using deep learning will benefit the deployment of new software capabilities in the mask shop.
Related Stories
Leave a Reply