How Neural Networks Think (MIT)

General-purpose technique sheds light on inner workings of neural nets trained to process language.


Source: MIT’s Computer Science and Artificial Intelligence Laboratory, David Alvarez-Melis and Tommi S. Jaakkola

Technical paper link
MIT article

General-purpose neural net training
Artificial-intelligence research has been transformed by machine-learning systems called neural networks, which learn how to perform tasks by analyzing huge volumes of training data, reminded MIT researchers. During training, a neural net continually readjusts thousands of internal parameters until it can reliably perform some task, such as identifying objects in digital images or translating text from one language to another. But on their own, the final values of those parameters say very little about how the neural net does what it does. Understanding what neural networks are doing can help researchers improve their performance and transfer their insights to other applications, and computer scientists have recently developed some clever techniques for divining the computations of particular neural networks.

But recently, at the 2017 Conference on Empirical Methods on Natural Language Processing, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory presented a new general-purpose technique for making sense of neural networks that are trained to perform natural-language-processing tasks, in which computers attempt to interpret freeform texts written in ordinary, or “natural,” language (as opposed to a structured language, such as a database-query language).

They said the technique applies to any system that takes text as input and produces strings of symbols as output, such as an automatic translator, and because its analysis results from varying inputs and examining the effects on outputs, it can work with online natural-language-processing services, without access to the underlying software.

In fact, the technique works with any black-box text-processing system, regardless of its internal machinery. In their experiments, the researchers show that the technique can identify idiosyncrasies in the work of human translators, too.

The team explained that the technique is analogous to one that has been used to analyze neural networks trained to perform computer vision tasks, such as object recognition. Software that systematically perturbs — or varies — different parts of an image and resubmits the image to an object recognizer can identify which image features lead to which classifications. But adapting that approach to natural language processing isn’t straightforward.