Spiking Neural Networks: Research Projects or Commercial Products?

Opinions differ widely, but in this space that isn’t unusual.

popularity

Spiking neural networks (SNNs) often are touted as a way to get close to the power efficiency of the brain, but there is widespread confusion about what exactly that means. In fact, there is disagreement about how the brain actually works.

Some SNN implementations are less brain-like than others. Depending on whom you talk to, SNNs are either a long way away or close to commercialization. The varying definitions of SNNs leads to differences in how the industry is seen.

“A few startups are doing their own SNNs,” said Ron Lowman, strategic marketing manager of IP at Synopsys. “It’s being driven by guys that have expertise in how to train, optimize, and write software for them.”

On the other hand, Flex Logix Inference Technical Marketing Manager Vinay Mehta said that, “SNNs are out further than reinforcement learning,” referring to a machine-learning concept that’s still largely in the research phase.

The entire notion of a “neural network” is motivated by attempts to model how the brain works. But current neural networks — like the convolutional neural networks (CNNs) that are so prevalent today — don’t follow the design of the brain. Instead, they rely on matrix multiplication for incorporating synaptic weights and gradient-descent algorithms for supervised training.

Those working on SNNs often refer to these as “classical” networks or “artificial” neural networks (ANNs). That said, Alexandre Valentian, head of advanced technologies and system-on-chip laboratory for CEA-Leti, noted that CNNs reflect more of an approach or type of application, while SNNs reflect an implementation. “CNNs can be implemented in spikes — it’s not CNN vs. SNN.”

Mimicking the brain
The notion of an SNN originates in the fact that the brain uses spikes to relay information. An important question, however, is how information is coded onto those spikes. Several ways are used in both research and development stages. This category of neural network is sometimes referred to as “neuromorphic,” in that it reflects the way the brain works. Classical networks are not neuromorphic, but some SNNs are more neuromorphic than others. As noted in a BrainChip paper, “… Today’s technology… is, at best, only loosely related to how the brain functions.”

Many of the SNN ideas are still in the exploration stage in academic institutions. Several papers at the 2019 IEDM conference dealt with implementations of SNNs with novel circuit techniques to achieve the goals of lower power. But there are also commercial companies working on SNNs. As identified at the recent Linley Spring Processor Conference, Intel has a serious research program going, while BrainChip and GrAI Matter Labs are readying commercial chips. The reason for this wide range between early research and commercial viability reflects a range of interpretations as to how an SNN can be implemented.

Some of the projects underway involve literal spikes, which are an analog phenomenon. But others abstract the notion of a “spike” into that of an “event,” and they implement them digitally as packets traveling through a network from neuron to neuron. The high-level effect, then, is to move from measuring everything all the time, as in a classical CNN, to dealing only with events. The power savings expected from SNNs is often thought to relate to the spikes themselves, but part of the gain comes from dealing with events. In other words, work happens only when there’s an interesting event to work with. Otherwise, no work (or less work) is done, keeping power low.

“If you don’t achieve [a neuron’s] activation threshold, no event is generated,” said Roger Levinson, COO of BrainChip. This corresponds to a high level of sparsity, which is coveted in classical networks.

Another feature of SNNs is the fact that events can excite or suppress a neuron. Events then can compete with each other, with some having an excitatory effect while others have an inhibitory effect. With classical networks, negative weights can reduce the magnitude of the resulting activations, but that’s more of a static representation of a video frame (or other data set) being evaluated rather than events pushing and pulling on the outcomes.

Coding values in spikes
One of the major distinctions between SNN implementations relates to what is referred to as “coding” – how a value is transformed into a stream of spikes. While there are several ways to do this, two appear to predominate many of the discussions: rate coding and temporal coding.

Rate coding takes a value and transforms it into a constant spike frequency for the duration of that value. The benefit of this approach is that classical training techniques can be used, with the resulting values then being transcoded for an SNN inference engine. Classical networks use an enormous amount of multiplication, which is energy-intensive. Spikes, by contrast, are simply accumulated, with no multiplication necessary. That said, each spike results in a synaptic-weight lookup, which also burns power, prompting Valentian to caution that it’s not clear that this approach is lower in power.

Temporal coding is said by some to be closer to what happens in the brain, although there are differing opinions on that, with some saying that that’s the case only for a small set of examples: “It’s actually not that common in the brain,” said Jonatha Tapson, GrAI Matter’s chief scientific officer. An example where it is used is in owl’s ears. “They use their hearing to hunt at night, so their directional sensitivity has to be very high.” Instead of representing a value by a frequency of spikes, the value is encoded as the delay between spikes. Spikes then represent events, and the goal is to identify meaningful patterns in a stream of spikes.

A major challenge, however, is training, because classical training results cannot be transcoded into this type of SNN. There is no easily-obtained derivative of the spike train, making it impossible to use the gradient-descent approach to training. In general, Tapson said, “Temporal coding is horrible for electronics. It makes it hard to know if a calculation completes, and it is very slow.”

Temporally coded SNNs can be most effective when driven by sensors that generate temporal-coded data – that is, event-based sensors. Dynamic vision sensors (DVS) are examples. They don’t generate full frames of data on a frames-per-second basis. Instead, each pixel reports when its illumination changes by more than some threshold amount. This generates a “change” event, which then propagates through the network. Valentian said these also can be particularly useful in AR/VR applications for “visual odometry,” where inertial measurement units are too slow.

It’s possible that temporally-coded SNNs could work with shallower networks than the 50 to 100 (or more) layers we’re seeing with classical networks. “The visual cortex is only six layers deep, although that system isn’t purely feed-forward,” Valentian said. “There’s some feedback, as well.” Still, he noted that what’s lacking here is a killer application that will provide the energy and funding required to push temporal coding forward.

Meanwhile, BrainChip started with rate coding, but decided that wasn’t commercially viable. Instead, it uses rank coding (or rank-order coding), which uses the order of arrival of spikes (as opposed to literal timing) to a neuron as a code. This is a pattern-oriented approach, with arrivals in the prescribed order (along with synaptic weighting) stimulating the greatest response and arrivals in other orders providing less stimulation.

All of these coding approaches aside, GrAI Matter uses a more direct approach. “We encode values directly as numbers – 8- or 16-bit integers in GrAI One or Bfloat16 in our upcoming chip. This is a key departure from other neuromorphic architectures, which have to use rate or population or time or ensemble codes. We can use those, too, but they are not efficient,” said Tapson.

Neurons
SNN neurons typically are implemented in one of two ways. The approaches are motivated by analog implementations, although they can be abstracted into digital equivalents. Arteris IP fellow and chief architect Michael Frank refers to this as “emulation.” He points to several challenges for an analog implementation: “With analog, you would need to customize the model to the specific chip for inference. No two transistors are the same. And at 7 nm, you can’t do analog.”

Tapson concurs. “For a large circuit, you need to be digital,” he said.

The idea behind the two abstract neural approaches is that a neuron evaluates a signal by accumulating spikes. The simplest implementation is called “integrate-and-fire” (IF). Each spike is accumulated in the neuron until a threshold is reached, at which point the neuron fires an output spike – that is, it creates an event that propagates downstream in the network (at least for a feed-forward configuration). Many of the academic projects ongoing implement this as a literal analog circuit, and in operation it’s philosophically similar to sigma-delta modulation.

The challenge here, especially for temporal coding, is that patterns may inadvertently appear over a long time period. What are two events separated in time may be interpreted as a single pattern, since early accumulation remains in place as new spikes arrive.

In order to neutralize older “obsolete” results as newer ones arrive, a “leaky integrate-and-fire” (LIF) circuit can be used. This means that accumulations gradually dissipate over time so that, given enough time between events, accumulation restarts from a low level.

Another element that can reverse accumulation is an inhibitory event. Accumulation assumes excitatory events that add to the accumulation, but inhibitory events accumulate negative values, reducing the level of accumulation.


Fig. 1: IF and LIF neuron behavior, idealized for illustration. Note that, in the second case, the threshold is never reached due to the leakage. Neurons may also have a refractory period during which they can accumulate but not fire. Source: Bryon Moyer/Semiconductor Engineering

Synapses
Synapse implementation will depend strongly on how a specific network is implemented. For analog implementations, a spike will result in a certain amount of current injected into or out of the neuron. The amount of current depends on the synaptic weight.

A team from CEA-Leti discussed an analog SNN using RRAM in a paper presented at the 2019 IEDM conference. While RRAM has been used in classical networks as a way of implementing in-memory computation of multiply-accumulate functions, its usage here is different. Eight cells are used, four each for excitation and inhibition, with anywhere from 0 to 4 of the resistors being programmed in a low-resistance state. Low resistance means more current and, hence, a stronger weight. The more cells in a low-resistance state, the greater the overall synaptic current. The following image shows the Leti synapse design.


Fig. 2: Leti’s synapse implementation. “HRS” stands for “high-resistance state”; “LRS” stands for “low-resistance state.” Source: CEA-Leti

An array of these cells is shown in Figure 3. Each synapse gets its own word line; currents are sensed through the bit lines.


Fig. 3: Leti’s synaptic array. Source: CEA-Leti

The currents are summed into the neuron as shown in Figure 4. The capacitor acts as the accumulator as the membrane voltage varies with the injected currents. Note that there are both positive and negative thresholds, meaning that the neuron can fire an excitation spike or an inhibition spike.


Fig. 4: Neuron accumulation in the presence of excitatory and inhibitory spikes. Source: CEA-Leti

In a digital implementation, the notion of a spike is an abstraction, and multiplication is still required to scale an incoming spike by a synaptic weight. GrAI Matter’s approach is shown in Figure 5.


Fig. 5: GrAI Matter’s digital neuron core. Source: GrAI Matter Labs

NoCs in the Circuit
For digital SNN emulations, the routing of spikes often happens through a network-on-chip, or NoC. NoCs are common in sophisticated systems-on-chip (SoCs), but those networks often carry large payloads. By contrast, spike data is very small. In fact, Arteris IP’s Frank said the packet headers may be longer than the payload itself.

Packets can be broadcast to the destination neurons with an identifying tag. Then receiving neurons will know which tag to pay attention to, giving the effect of multi-cast. In this way, spikes arrive at the intended neurons for processing, while other neurons ignore them. This gives the input side of the neuron a many-to-one relationship, while the output has a one-to-many relationship.

Frank indicated there should not be issues with collisions on the network. Sensor data is generated at a rate of around 500 samples per second, while the network is clocked at hundreds of megahertz. This leaves plenty of room for time-sharing data so that individual spike deliveries can appear to be concurrent. If there is any issue with collisions, Frank noted that the network can be divided into domains to reduce their impact.

Timing also has a role here. Frank noted that Intel’s Loihi network is asynchronous. “If you use a synchronous approach, it’s probably too high power for a large network.”

A selection of projects
The range of approaches to SNNs is illustrated by reviewing several of the more prominent ones. There are many more projects underway at academic institutions and possibly at other commercial companies as well, so this list will by no means be exhaustive.

We’ve already seen some of what CEA-Leti has been working on. Their IEDM paper claims this is the first full network implementation using spikes, analog neurons, and RRAM synapses. It’s a single-layer, fully-connected network with 10 output neurons corresponding to the 10 classes used for MNIST image classification. Inference is considered complete when the difference between the highest-spiking output and the next-highest-spiking one exceeds a threshold. They’ve shown an equivalence between this and the classical tanh activation function.

BrainChip has an all-digital implementation, which allows it to be implemented on any CMOS process (unlike analog). A conceptual view of their architecture is shown in Figure 6.


Fig. 6: BrainChip’s architecture. The Akida array is conceptual. It does not reflect the true number and arrangement of NPUs. Source: BrainChip

The neural fabric is fully configurable for different applications. Each node in the array contains four neural processing units (NPUs), and each NPU can be configured for event-based convolution (supporting standard or depthwise convolution) or for other configurations, including fully connected. Events are carried as packets on the network.

While NPU details or images are not available, BrainChip did further explain that each NPU has digital logic and SRAM, providing something of a processing-in-memory capability, but not using an analog-memory approach. An NPU contains eight neural processing engines that implement the neurons and synapses. Each event is multiplied by a synaptic weight upon entering a neuron.

The company noted that its use of event-domain convolution allows it to use IF neurons rather than LIF, since this approach results in much simpler hardware. In order to deal with the issue of straggling spikes creating an inadvertent pattern, BrainChip frames the time so that, once that frame is completed, subsequent spikes will start afresh.

Training is a topic the company does not talk much about. It refers to training as “semi-supervised.” BrainChip bases its proprietary learning algorithms on a training notion referred to as Spike Timing-Dependent Plasticity, or STDP, as well as some reinforcement learning concepts. It does the training with fully connected layers in a feed-forward manner that it says is orders of magnitude faster than what is typical with classical networks. The company also is working on unsupervised learning — that is, the ability to train a network without giving it pre-labeled samples — for its next generation architectures.

Unusually, BrainChip has the ability to do some further training in the field on a deployed device. It refers to this as “incremental training,” which leverages the existing training model but allows for the device to be further trained in the field. This is done by removing the last network layer (which does classification) and replacing it with a fully connected layer. The device can then “relearn” the existing classes (the last layer only, as prior layers remain unchanged) while adding new classes to the capabilities of the network. The company does this with labeled samples, but it can add new classes with a single image instead of hundreds or thousands of images.

GrAI Matter also is doing an all-digital implementation. It uses an on-chip packet-switched network to route the “spikes.” GrAI Matter’s overall architecture is shown below (the node implementation is shown above in Figure 5). The company trains its chip using classical techniques, converting the result to the GrAL Matter format for implementation.


Fig. 7: GrAI Matter’s architecture. Source: GrAI Matter Labs

Even though this is an event-based engine, the network has been optimized to deal with standard video streams instead of DVS event streams. In a manner similar to the ISSCC paper discussed in a prior article, these operate on the differences between frames rather than the full frames. That “diff” is taken both at the input and at each activation layer, creating an enormous amount of sparsity entering and flowing through the network.


Fig. 8: GrAI Matter processes only changed pixels in each successive layer. Source: GrAI Matter Labs

Finally, Intel has a sizable research project underway under the direction of Mike Davies, director of their Neuromorphic Computing Lab. Intel called the chip Loihi (lo-EE-hee), and other players in this space appear to be paying close attention.

This is an advanced project, and it operates very differently from the prior projects, appearing to be truly neuromorphic. Details on the architecture aren’t available, but the chip currently has 128 cores, which can be scaled to 4,096. Chips also can be scaled out to a maximum of 16,384 chips. Intel uses LIF neurons, routing spikes as packets on a NoC.

“We are continuing to work on advancing neuromorphic software and hardware, with the goal of eventual commercialization,” Davies said. “Because neuromorphic technology is still at a basic research stage, it’s hard to make firm predictions on the time frame for mainstream use. We hope to have some initial niche applications providing business value in the next few years and would be happy if our neuromorphic systems were starting to be sold commercially to a broad range of customers within a five-year time frame.”

State of the industry
In general, SNNs generate divided opinions. The amount of ongoing research is indicative of the level of industry interest, but not everyone has been quite so enthusiastic. Yann LeCun, a Facebook AI researcher, noted in a 2019 ISSCC presentation, “I’m very skeptical of this [SNNs].”

Others expressed concern, as well. “[Research] papers are aimed at much simpler models [than what are implemented with classical networks],” said Geoff Tate, CEO of Flex Logix. “It’s far from commercialization.”

It’s also not necessarily an either-or situation: “You could have a network that’s partly classical and partly SNN. An example is sensor fusion, with video as classical and sound as SNN,” said Leti’s Valentian.

Arteris IP’s Frank sees a future for SNNs. “SNNs have their domain where they will outrun a standard network. Even a digital emulation of an SNN is better than a classical CNN,” he said.

The success of early commercial entrants, as well as Intel’s Loihi research project, will be indicators of whether SNNs eventually can bring their much-anticipated power savings into the market for good.

Related Material
New Ways To Optimize Machine Learning
Different approaches for improving performance and lowering power in ML systems.
Memory Issues For AI Edge Chips
In-memory computing becomes critical, but which memory and at what process node?
The Challenges Of Building Inferencing Chips
As the field of AI continues to advance, different approaches to inferencing are being developed. Not all of them will work.



1 comments

Martijn says:

Thanks , for this amazing good summarized article on the field of SNNs

I miss the point or better said the comparison between analog and digital SNNs. The comparison can be made that digital is better, however there are many more constrains that actually makes analog a better choice. The reason in the article choice digital, because you can go to 7nm is this really the main benefit? Also, no real comparison is being made how much area a analog spiking neuron needs compared to the digital version (big multipliers more higher frequencies).

Many AI models can be implemented in a small model. If you would like to use a big model, is a bigger/more chip(s) then not better?

Leave a Reply


(Note: This name will be displayed publicly)