Integrating Memristors For Neuromorphic Computing

The latest research on memory, variability, and compute architectures—and what comes next.

popularity

Much of the current research on neuromorphic computing focuses on the use of non-volatile memory arrays as a compute-in-memory component for artificial neural networks (ANNs).

By using Ohm’s Law to apply stored weights to incoming signals, and Kirchoff’s Laws to sum up the results, memristor arrays can accelerate the many multiply-accumulate steps in ANN algorithms.

ANNs are being deployed commercially now. Any improvement in their performance is worth pursuing. But it’s important to remember that ANNs are significantly different from their biological antecedents. Improving the performance of a single step in a single type of algorithm does little to advance machine learning or neuromorphic computing generally.

A series of presentations at the recent Materials Research Society Spring Meeting looked at memristor devices, as well as the challenges of integrating them into larger systems.

At the device level, variability remains a significant issue for all memristor types. Scaling feature sizes down is likely to make variability worse. Geoffrey Burr, principal research staff member at IBM, said in a keynote address (and in previously published research) that neural networks still can give good results with significant numbers of failed devices, or with random variations in device characteristics.


Fig. 1: Neuro-inspired non–Von Neumann computing, in which neurons activate each other through dense networks of programmable synaptic weights that can be implemented using dense crossbar arrays of NVM and 2-terminal or 3-terminal selector device-pairs. Source: IBM

In real circuits, though, variability introduces practical challenges. Memristors that encode the same state should have the same “read” current. A given “write” voltage should store the same state to all devices. Storing analog weights and compensating for variable device characteristics is possible, but greatly increases circuit complexity. Preliminary weights can be stored by the manufacturer, but it will still be necessary to update devices in the field and to reconfigure networks in light of improved algorithms. Effective updates require predictable device characteristics.

The need to control absolute weight values is an artifact of supervised remote learning applications, where there is a known “right” answer that the network is expected to produce on demand. Variability may be less important in unsupervised learning, where the algorithm is expected to define its own decision boundaries between classes.

Alexander Serb, a research fellow at the University of Southampton, and his colleagues demonstrated that in unsupervised learning, variability in the device characteristics can help stabilize the network. If the response to a signal pulse depends on previous pulses—for example, if the conductance change depends on a phase-change memory element’s prior state—then the network may converge to a stable value more quickly. It could be said to “remember” the association between the initial and final states.

Further complications ensue as individual devices are aggregated into circuits. “Crossbar” arrays are appealing in part because they can support a very high density of “synaptic” memristor connections. However, most proposed designs envision a much smaller number of CMOS “neurons” providing input to and reading results from the array.

IBM researchers have analyzed the engineering tradeoffs inherent in the massively parallel read and write operations that such designs will require. Writing individual memory elements sequentially would simply replicate the data transfer bottleneck that non-von Neumann architectures seek to avoid. Yet these parallel operations must remain within the circuit’s overall current and power dissipation specifications. Melting a conductive filament in a single phase-change memory element may not introduce excessive amounts of heat, but melting thousands of them at once is a different matter. The early stages of in situ training, in particular, are likely to change many values across the array in a single step. Applying large numbers of programming pulses at once can push the limits of available drive current.

From neural networks to neuromorphic computing
Though computation in memory can improve ANN performance, Burr pointed out that making a better ANN is not the same as building a neuromorphic architecture.

Artificial neural networks, as currently implemented, are easily confused. In fact, they confidently draw nonsensical conclusions from previously unknown examples, or even from random noise. They require enormous amounts of data to make even limited inferences about the world, exactly the opposite of biological learning.

Dileep George, co-founder of Vicarious AI, described the behavior of artificial neural networks as more like stimulus-response conditioning than learning as humans understand it. In his MRS keynote, he noted that these limits will not be surpassed by better RRAM or PCM devices alone. They also require new algorithms and approaches to system design. For example, probabilistic graphical models seek to capture the high-level invariant characteristics that humans use to put lions and house cats in the same category, or to recognize that a chair made of ice and a chair made of wood are both chairs, with similar contours in spite of their very different surfaces.

Alice Mizrahi, a student at Universite Paris-Sud, observed that biological brains can tolerate very high levels of damage and still function, in part because the structure has built-in redundancy. Discussing work done at CNRS in France, she said that information is not stored in a single neuron or synapse, but in a population of neurons and their connections to each other and to the rest of the brain. In the visual cortex, a different group of neurons is optimized for each of several wavelength bands. Yet even though individual neurons may respond to only a narrow set of stimuli, the visual cortex as a whole has a very wide range.

These researchers demonstrated similar behavior in hardware, with an array of superparamagnetic tunnel junction memories. In these devices, two ferromagnets are separated by a non-magnetic tunneling barrier layer. One of the magnets is pinned, while the other can orient itself either parallel or anti-parallel to the first.

Because the ferromagnets are polycrystalline, devices do not have identical characteristics. Rather, the group identified a unique tuning curve for each device, with the array as a whole forming a basis set spanning a range of signals. Any individual signal can be encoded as a weighted sum of elements from the basis set. Even if one device or range of devices fails, its contributions can be extrapolated from its neighbors.

Angel Yanguas-Gil, principal mmaterials scientist and fellow at Argonne National Laboratory, discussed how researchers drew inspiration from honey bees and other insects. Their brains are much smaller and simpler than human brains, with no neocortex and only about a million neurons — a number within the capabilities of current neuromorphic hardware. Yet a honey bee’s sophisticated pathfinding ability allows it to return to its hive from a pollen or nectar source several kilometers away and communicate the location to other bees. In the insect brain, only a small subset of neural units actually learn, while the other units modulate the strength of training signals. While spike-timing dependent plasticity — one model of human learning — only considers interactions between pairs of neurons, this kind of dynamic plasticity depends on a third neuron to control the interaction.

Suhas Kumar, a researcher at Hewlett Packard Labs, observed that the brain is not random, but deterministic and chaotic. Neural oscillations ultimately converge to a long-term, deterministic attractor. This group used memristors to store synaptic weights, with chaotic NbO2 oscillators serving as neurons. Minimizing the energy of this assembly as a whole produced solutions to the traveling salesman problem, while the oscillations of individual neurons kept the array from being trapped in a local energy minimum. The technique resembles adiabatic optimization, also used in quantum annealing.

Conclusion
Important problems in non-volatile memory device performance and compute-in-memory array design remain unsolved. There is plenty of work for materials scientists to do. For the bigger challenge of neuromorphic computing, though, as Burr said, “We have enough RRAM papers.” The next step forward will come from algorithms and system designs.

Related Stories
3D Neuromorphic Architectures
What’s Next In Neuromorphic Computing
Planes, Birdhouses And Image Recognition
Neuromorphic Computing: Modeling The Brain
How To Program A Quantum Computer



1 comments

Gil Russell says:

Katherine,

Suggest you get in touch with Pentti Kanerva at Redwood Neurological Institute and DeepMind. DeepMind just published a paper titled “The Kanerva Machine”.

Leave a Reply