How Do Machines Learn?

How neural networks perform defect classification and many other tasks.


We depend, or hope to depend, on machines, especially computers, to do many things, from organizing our photos to parking our cars. Machines are becoming less and less “mechanical” and more and more “intelligent.” Machine learning has become a familiar phrase to many people in advanced manufacturing. The next natural question people may ask is: How do machines learn?

Recognizing diverse objects is a clear indicator of intelligence. Specific to semiconductors, recognizing various types of defects and categorizing them is an important task that initially was carried out solely by humans. Gradually, this classification process was automated by using computer programs running ever-evolving algorithms. Today, most defects are detected and classified by such systems in advanced facilities.

Before machine learning was widely used, there was a period when system set-up was done purely by humans. After learning about situations for a task through observation and experiments, engineers made rules and implemented them as programs for computers to run. In this implementation scenario, the machine does not learn, it just keeps repeating the process programmed, making decisions based on the embedded rules. This is a very labor-intensive approach—to extract the rules from human classifiers, create the programmatic logic to implement these rules, and to verify the result. Sometimes it’s very difficult, or impossible, to translate a decision-making process that humans do, often subconsciously, into computer language.

Machine learning

After machines started to really “learn,” two types of algorithm for the purpose of classification emerged: unsupervised vs. supervised.

  • An unsupervised classification analyzes the natural distribution of the data. Do the data form any distinctive clusters? If they do, what are these clusters? What’s the relationship among them? Once an unsupervised routine is put together, it can be applied to any incoming data sets as they are. There is no more human intervention than parameter adjustments to the algorithm.
  • For a supervised classification, labeled data are needed in addition to the implemented algorithm. Through the labeled data, the program is shown the characteristics of each of the pre-defined classes. Once the program has learnt properly about the classes, when a new sample comes in, it will be able to put it into the right class, without the need for an engineer to program the logic line by line.

Class definition is typically a very subjective process. It’s hard to expect a natural pattern in the data where each cluster, extracted by an unsupervised method, represent a class a human has in mind. As a result, between these two types of methods, supervised classification is the one that is predominantly used today. Among the plethora of such “supervised” methods, algorithms based on neural networks have indisputably formed a super star that has brightened a wide spectrum of application areas.

How is a neural network trained to learn?

Most neural networks for classification have a multi-layer feed-forward structure that is trained by back-propagating the error observed at the output. In another words, this is an iterative process with trial-and-error elements including adjustment between each iteration until the desired results are achieved. A signal is fed to the input and pushed through the network to get the output. Because this is a supervised process, the corresponding expected output, or the “label,” of the input is available and compared with the actual output. At the beginning of the training process, the output and the label are usually very different. This difference, or error, is “propagated” backward from the output layer of the network to the input layer. Along the way, adjustments are made to the parameters of the network, mostly in the form of “weights.” Ideally following these adjustments, the same signal goes through the network again and the difference at the output will be smaller. After several iterative processing runs where the same set of labeled samples were used to train the network, hopefully the errors at the output will be minimized or eliminated. When this happens, it means the network will make few or no error in putting an incoming sample into its right class, at least for the samples found in the pool used for training. Assuming the training samples are representative of the general image population, a properly trained machine learning model will produce the right label or classification on samples from outside the training pool as well.


As a summary, machines learn by being trained with high-quality labeled data that embody a good representation of the target classes. Subsequently, what determines the accessibility of a machine learning system is how to make it easy to organize data (defect images, for example), to pre-process them, to label them, to present them in a good format to any network from a diverse gamut, and to have the network trained efficiently and used effectively. Like a string that connects all the jewels into a necklace, automatic defect classification (ADC) has been designed to facilitate all of these steps so that a user can go through the whole length of the defect classification process with ease.

Leave a Reply

(Note: This name will be displayed publicly)