Neural Networks Without Matrix Math

A different approach to speeding up AI and improving efficiency.

popularity

The challenge of speeding up AI systems typically means adding more processing elements and pruning the algorithms, but those approaches aren’t the only path forward.

Almost all commercial machine learning applications depend on artificial neural networks, which are trained using large datasets with a back-propagation algorithm. The network first analyzes a training example, typically assigning it to a classification bin. This result is compared to the known “correct” answer, and the difference between the two is used to adjust the weights applied to the network nodes.

The process repeats for as many training examples as needed to (hopefully) converge to a stable set of weights that gives acceptable accuracy. This standard algorithm requires two distinct computational paths — a forward “inference” path to analyze the data, and a backward “gradient descent” path to correct node weights.

In biological brains, the strength of synaptic connections increases and decreases as associated neurons fire (or fail to), but there is no evidence of a separate synaptic update process. Critics of back-propagation argue that it is biologically implausible for this reason. Jack Kendall, co-founder and CTO of Rain Neuromorphics, said that errors also accumulate during back-propagation, undermining overall performance.

Nonetheless, one thread of research and development in the AI community seeks to implement back-propagation algorithms more efficiently. This can be done using less precise weights, dedicated accelerator chips, or devices that allow more network nodes to fit in a given circuit footprint.

Another research thread argues that the back-propagation approach is inherently limited. Neural network training is time consuming and expensive. In particular, the need for large training sets with pre-labeled data is especially problematic for applications wuch as autonomous vehicles, which need to be able to adapt to the environment in real time. In this view, further advances require new learning models and new training algorithms.

Spiking neural networks are one frequently discussed alternative, and spike-timing dependent plasticity is often proposed as a learning rule. Spike-based approaches seek to model the dynamics of learning in biological brains, with chains of signal spikes corresponding to incoming stimuli.

Finding answers in equilibrium
Electrical circuits are not biological neurons, though. They have different physics and face different engineering constraints. They also can draw on an existing library of well-characterized circuit elements, both analog and digital.

Kendall explained that his company’s new machine learning paradigm, equilibrium propagation, is based on a re-statement of Kirchoff’s Law. Equilibrium propagation defines an “energy” function in terms of the nodes of a neural network. Physically, this “total energy,” F, is a measure of the total pseudo-power of the network. It is the sum two terms, E and C. E is a measure of internal interactions between nodes, while C measures the difference between the network’s target and actual output values, weighted by a parameter β.

The change in the total energy with time is defined by the evolution of a state variable, s:

The output of the model is given by the components of E at the fixed point

“Solving” the model means identifying the components of E — the network node values — that minimize F. A “good” solution is one in which this configuration of node values also produces the classification bin expected by the training data.

Yoshua Bengio, a Turing Award winner and founder of Mila, the Quebec Artificial Intelligence Institute, said equilibrium propagation does not depend on computation in the sense of the matrix operations that are the hallmark of conventional neural networks. Rather, the network “learns” through a series of Ising model-like annealing steps. A solution is found by first setting β to 0, allowing the network to relax to a fixed point, and measuring the resulting “free” output values.

Then, in the second “nudged” phase, a small change in β pushes the observed output values in the direction of the target values. Perturbing the outputs changes the dynamics of the system — it is no longer in equilibrium — and the network is allowed to relax to a new fixed point, with new values of E. A mathematically rigorous treatment shows that the network relaxation corresponds to the propagation of error derivatives in conventional back-propagation, and repeated adjustments give rise to stochastic gradient descent.

Rather than providing an explicit prediction as conventional algorithms do, the model produces an implicit result defined by the components of E. Though the theory underlying the equilibrium propagation is applicable to any non-linear resistive network, implementing it with digital hardware requires extra steps. To obtain an explicit solution, a digital architecture would need to numerically optimize the energy function.

Analog hardware for analog solutions
Instead, Gordon Wilson, CEO of Rain Neuromorphics, pointed to the development of the memristor as the key to implementing equilibrium propagation in commercially interesting analog networks. The company’s proposed architecture stores node values in arrays of memristor elements, whose conductances play the role of synaptic weights. After each “nudged” phase, voltage or current pulses modify the conductances.

Pairs of diodes, each followed by a linear amplifier, act as “neurons” to transfer values between layers. “Bidirectional amplifiers” use voltage sources to prevent signal decay between the input and output nodes, while current sources ensure propagation of the reverse error correction signals.

While simulation results are promising, actually implementing such a network in hardware still poses additional challenges. In particular, device researchers are still learning how to achieve reliable conductance changes in memristor networks. Still, Kendall said that the equilibrium propagation approach applies the mathematical techniques of electronics directly to neural network problems, simplifying both programming and circuit design.

Related Stories
Spiking Neural Networks Place Data In Time
How efficiently can we mimic biological spiking process of neurons and synapses, and is CMOS a good choice for neural networks?
Spiking Neural Networks: Research Projects Or Commercial Products?
Opinions differ widely, but in this space that isn’t unusual.
Scaling Up Compute-In-Memory Accelerators
New research points to progress and problems in a post-von Neumann world.
Compute-In Memory Accelerators Up-End Network Design Tradeoffs
Compute paradigm shifting as more data needs to be processed more quickly.
Are Better Machine Training Approaches Ahead?
Why unsupervised, reinforcement and Hebbian approaches are good for some things, but not others.
What’s A Mott FET?
Strange physics and future devices.



Leave a Reply


(Note: This name will be displayed publicly)