中文 English

Developers Turn To Analog For Neural Nets

Replacing digital with analog circuits and photonics can improve performance and power, but it’s not that simple.

popularity

Machine-learning (ML) solutions are proliferating across a wide variety of industries, but the overwhelming majority of the commercial implementations still rely on digital logic for their solution.

With the exception of in-memory computing, analog solutions mostly have been restricted to universities and attempts at neuromorphic computing. However, that’s starting to change.

“Everyone’s looking at the fact that deep neural networks are so energy-intensive when you implement them in digital, because you’ve got all these multiply-and-accumulates, and they’re so deep, that they can suck up enormous amounts of power,” said Elias Fallon, software engineering group director for the Custom IC & PCB Group at Cadence.

Some suggest we’re reaching a limit with digital. “Digital architectural approaches have hit the wall to solve the deep neural network MAC (multiply-accumulate) operations,” said Sumit Vishwakarma, product manager at Siemens EDA. “As the size of the DNN increases, weight access operations result in huge energy consumption.”

The current analog approaches aren’t attempting to define an entirely new ML paradigm. Instead, they’re leveraging the existing infrastructure and ecosystems. But rather than doing a digital implementation, they’re doing an analog one.

“The last 50+ years have all been focused on digital processing, and for good reason,” said Thomas Doyle, CEO and co-founder of Aspinity. “Digital was easy to design, manufacture, and program. It’s very flexible. Analog has been mostly relegated to comms.”

That’s being re-evaluated. “It’s back to the good old days,” said Venki Venkatesh, director of R&D, AI & ML solutions, digital design group at Synopsys. “We moved from analog to digital, which does consume more power, yet it gives more stability. But now we are coming full circle.”

There’s no question about whether it’s possible to do things with analog. “As long as one can produce elements with programmable response, that’s all that’s needed,” said Michael Kotelyanski, senior member of technical staff at Onto Innovation. The question is really about whether it can be done more efficiently than is possible digitally.

Focus on ANNs
The vast majority of neural networks in commercial use are so-called “artificial neural networks,” or “ANNs.” These stand in contrast to neuromorphic networks, which attempt to mimic the brain. ANNs have no biological analog, but they present a computing paradigm that allows for effective machine learning.

By and large, ANN implementations have been digital. That allows scaling with technology, and it offers the simplification that digital abstraction brings to a lot of different areas. The one big challenge, however, is the amount of energy used both in training and in inference — particularly at the edge.

While efforts continue to develop miserly architectures better suited to battery-powered applications, the fact remains that further power reductions are still a goal. And one way of pushing power down is through the use of analog implementations instead of digital.

There are two areas where analog is already being explored — in-memory computing (IMC) and neuromorphic. But IMC involves an island of analog in an otherwise digital sea. Neuromorphic implementations, meanwhile, tend to diverge from the frameworks and ecosystems already in place, creating a bigger go-to-market challenge. In addition, spiking neural networks, like those offered by BrainChip and being researched by Intel, still use digital techniques.

Instead, what we’re talking about is a full-on analog implementation of what is otherwise a standard ANN, using many of the same model development tools as are in common use today. The difference is that the trained model is implemented in an analog fashion instead of digitally. “We’re doing traditional deep learning,” said Nick Harris, CEO of Lightmatter. “You could literally run the same algorithms that you run on an Nvidia GPU. And we do.”

“You build the analog ML model similarly to how you would build it for a digital solution,” said Aspinity’s Doyle. “We have hooks into PyTorch to help our machine learning teams build models that are then compiled onto our hardware. When you do design your model in PyTorch, it has our technology in mind.”

There are two analog approaches being taken. One is the more obvious electronic approach, where digital circuits are replaced by analog circuits. The other approach is photonic, where electronic circuits are replaced by photonic structures.

Both approaches reduce power. But that’s where the similarities end, as the implementations are wildly different.

Electronic analog ANNs
Analog implementations are particularly useful when the raw data on which they operate is also in the analog domain. That would apply to many sensors, like microphones with an analog output.

Aspinity has turned to analog for its always-on wake-word detection scheme. It’s predicated on the fact that, 89% of the time, there’s no speech, much less a wake-word being spoken. “80% to 90% of the data at the edge is digitized and looked at in a digital world,” said Doyle. “And it’s irrelevant for the task at hand.”

Fig. 1: Most of the sound presented to a voice-activation unit is non-voice. Source: Aspinity

Fig. 1: Most of the sound presented to a voice-activation unit is non-voice. Source: Aspinity

Aspinity recently released a window-break detection algorithm. “It’s very rare that you actually have a window break,” he said. “So you’re wasting a lot of energy for something that rarely happens” if you digitize the input first.

While not replacing the entire audio chain, an analog block serves as the always-on circuit that scans for a wake word. Most of the time, the digital circuits can sleep. When the analog block detects a wake-word, it then can awaken the rest of the audio chain, which is digital, for processing the full contents of the command, either locally or in the cloud.

“The digital components continue to improve how much power they consume, but they’re still in the milliamp range — still too much for always-on processing,” said Doyle. “On a battery, we have to be near zero.”

Fig. 2: The top image shows a typical configuration, where all sounds are digitized. In the bottom image, all sounds are first run through the analog block. Only relevant data is sent to the digital domain, which remains asleep until summoned. Source: Aspinity

Fig. 2: The top image shows a typical configuration, where all sounds are digitized. In the bottom image, all sounds are first run through the analog block. Only relevant data is sent to the digital domain, which remains asleep until summoned. Source: Aspinity

Separating feature extraction
That analog block contains an analog neural network, but there’s a significant difference from the more common neural nets in the digital domain. Most common networks do their own feature identification, and those features are fed down the network.

“We can do the feature extraction with the neural net,” explained Graham. “But we have found in a lot of instances that it’s a little bit faster and easier to get those features in a systematic way.”

In Aspinity’s case, they do the feature extraction through a non-neural analog block. “First thing we do is feature extraction within the analog domain explicitly,” said Doyle. “Then we feed that into a neural network as needed.”

Fig. 3: Features are extracted in non-neural analog circuitry before being fed into the analog neural network. The sounds can also be compressed and sent to memory as “pre-roll” — the sounds just prior to utterance of the wake-word. Source: Aspinity

Fig. 3: Features are extracted in non-neural analog circuitry before being fed into the analog neural network. The sounds can also be compressed and sent to memory as “pre-roll” — the sounds just prior to utterance of the wake-word. Source: Aspinity

The analog block appears to be more of a Field Programmable Analog Array (FPAA). “We have a reconfigurable analog and mixed-signal framework with which we can create any kind of circuitry that we want,” said Aspinity’s Graham.

It doesn’t consist of a uniform set of resources for implementation, however. Instead, the resources are laid out in something of a graduated approach, based on the nature of the computing to be done. So as one moves down the pipeline, the resources change to accommodate the changing nature of the processing further downstream.

“Instead of saying that, ‘We’re just going to have filters and amplifiers or op amps or capacitor circuits,’ we change up what we’re doing along the way so that we can progressively make more decisions as we go through the path,” Graham explained. “We have, in certain places along the route, small neural networks that we can put together to create larger neural nets.”

Configuration of these resources is done by design software. For the non-neural parts, it’s comparable to the design software used for the more common FPGAs. For the neural nets, the standard design paradigm is followed, where the network is developed in a framework like TensorFlow. When that network is ready for implementation, then, rather than it being adapted to some digital platform, it’s adapted into the analog platform.

Exactly how the multiply-accumulate function is implemented wasn’t disclosed, but it’s not the same as what’s done for IMC. It’s still based on Ohm’s Law, but involves op amps rather than memory cells. It is highly tunable through software commands.

Custom flash — but not for IMC
Aspinity is sensitive to the fact that their approach may be confused with IMC. “We’re doing the multiply-accumulate in the analog domain, but the difference is that that our framework stays analog the entire time,” said David Graham, CSO and co-founder of Aspinity.

Possibly adding to the confusion, they have also developed their own proprietary variant of flash that allows them to place precise values in the floating gate. Flash-based IMC uses similar cells, but, unlike IMC, Aspinity’s cells aren’t directly involved in the computation. They’re used to tune the configuration of the analog circuits that do the computation.

“You can adjust for things like offsets, you can adjust for PVT variations, you can tune out mismatch from the get-go,” said Graham.

The cells have the standard 10-year data retention. Given that they’re configured either in manufacturing, to trim values, or when a new design is downloaded, endurance isn’t an important consideration. But the application can be updated in the field if necessary.

What makes this application compelling for analog is the fact that sound is a rapid periodic function. If a sensor were tracking something like ambient temperature or air pressure, that changes slowly enough to where one could duty-cycle a digital implementation, putting it to sleep except when it wakes up to sample the current value. Sound and other acoustics and vibrations change too quickly for this to be effective. That’s where it’s worth doing analog.

As with many analog circuits, this lends itself to more mature nodes. But Aspinity has said that there isn’t a hard barrier to going down to more aggressive nodes if necessary. It hasn’t figured out yet whether the future would be finFET, FD-SOI, or something else, but it said there’s no fundamental limitation.

In its current version, the company has been able to do always-on voice detection with 10 µA or 25 µA if storing for pre-roll.

Analog issues
Still, there is much work to be done on analog designs. Alric Althoff, senior hardware security engineer at Tortuga Logic, noted a number of concerns regarding how vulnerable analog circuits might be to side-channel and other attacks. While important considerations in general, Aspinity responded that these are concerns primarily in the radio-frequency range, which is far above the frequencies it uses. Aspinity claims there would be no radiation from its signals detectable outside the chip.

When asked why everyone doesn’t do this, Doyle answered with a simple statement: “It’s hard.”

It’s also hard to build tools that are significantly better than what a good analog engineer can do already. “The tooling and infrastructure that digital has brought is not easy to mimic with analog.” Ashutosh Pandey, systems engineer and senior member of the technical staff at Infineon.

Ramesh Chettuvetty, senior director of applications engineering and product marketing for memory solutions in Infineon’s RAM product line, also noted that analog circuits are optimized for the application and data rate. If designed for a high rate, higher power is required, so there’s less efficiency if you want to run that circuit at a lower data rate. “If you are running the system at a lower data rate, then the analog is going to be burning that power,” he said.

Finally, there is some energy give-back when you inevitably transition back into the digital domain. “At some point, you will need power to convert the analog back to digital,” said Keith Schaub, vice president of technology and strategy at Advantest. “It’s not free.”

Photonic analog ANNs
While analog electronics may provide a power reduction, the circuits themselves still must be powered in order to run the active components and push electrons through the wires interconnecting the whole thing. This is not the case with photonics.

Given a strong enough laser and low-loss waveguides, computation can be done photonically with no energy required in the “circuits.” (With photonics, one doesn’t have true circuits, but it’s helpful to refer to them as such for comparisons to electronics.)

This is an approach being taken by at least three companies — Lightmatter, Lightelligence, and Luminous.

Silicon photonics is discussed most frequently for its value in high-speed interconnects. In that case, the only goal is that the signal input at one end of the waveguide makes it to the other end intact, which is easier with photonics.

“In photonics, there is no concept of resistance, inductance, or capacitance,” said Harris. “If you put a square wave into the waveguide, it will give you a square wave back.”

With ML, however, we’re doing computation, not just interconnect. “We’re trying to harness both the properties of efficient data transfer as well as doing computation,” said Maurice Steinman, vice president of engineering at Lightelligence.

Computation is more complicated, because light will be split off in different directions at different parts of the circuit. “You have to have fairly high optical power to supply the photons across chip,” said Anthony Yu, vice president of silicon photonics at GlobalFoundries.

That laser energy, however, makes up all of the energy that’s required. No further energy is required for the computing path that the light will take. That makes for a low-power system.

“You also need extremely low-loss waveguides,” added Yu. “You’re talking about [loss] requirements that are maybe one-tenth of what you would use for optical interconnects.” New materials are being evaluated for such applications in order to improve power performance.

Configuring the network does require energy, but not a large amount. “The settings that control the modulation consume an insignificant amount of energy on an ongoing basis,” Harris said.

Parallelism without replicated hardware
Color can provide computing parallelism. “In each color, we encode an entirely separate data set,” said Harris. That mixed light will traverse the network, with each color getting its own result at the same time using the same circuits. “We can use a single multiply-accumulate array and amortize the energy and area cost of that across multiple data sets, simultaneously. If I have two colors, it’s twice as fast; three colors, three times as fast; 64 colors, 64 times as fast.”

This can serve to batch up large data sets or to serve multiple users at the same time. “Imagine I have three users that want to turn sentences from English to German,” continued Harris. “Each user will concurrently get the calculation done through the array.”

That adds huge efficiency since it’s highly parallel without needing parallel circuitry. Asked whether the color channels could ever interfere with each other, Harris responded that, “If you have three colors going through the same exact object, it doesn’t mix up the colors at all.”

The multiply-accumulate in this case is done in a very different fashion from electronic versions. While there are many ways to implement a multiplier, Mach-Zehnder interferometers (MZIs) appear to be most popular at present. An MZI splits a light path into two, adding different phase shifts on each of the paths. When recombined, the interference can provide the multiplication.

 

Fig. 4: A Mach-Zehnder interferometer provides the basic multiplication function based on the phase difference between the two paths. Source: Lightmatter

Fig. 4: A Mach-Zehnder interferometer provides the basic multiplication function based on the phase difference between the two paths. Source: Lightmatter

Varying the phase differences becomes a part of that multiplication. How that happens isn’t obvious, but Lightmatter has written an explanation of how it works. They implement the phase shift using “a standard, solid state device as the array compute element,” said Harris.

“This, along with other component upgrades and design optimizations, is what enables the commercial product to be 2 to 10 times faster (depending on workload), and as much as 15 times more energy efficient than the test chip we reported on at Hot Chips [2020],” he added.

Yu noted the speed along these these paths. “When you do vector multiplication with Mach-Zehnder modulators, you’ve got an extremely low-power photonic computer with really low latency — like some number of picoseconds across the entire thing,” he said.

An ASIC accompaniment
Lightmatter’s full system requires an accompanying ASIC that operates in the digital domain. The two chips are packaged together, with the digital chip residing atop the photonic chip. “The 12nm ASIC’s job is to act like an orchestrator for the photonic computer,” said Harris. “The weights live right above the MAC that they go to, so you can very rapidly reset the weights to different values as you go through the layers.”

In addition, non-linear activation functions are performed in the ASIC. “There is a lot of academic work toward analog and optoelectronic activation units — things that look like a sigmoid function. But it’s not that useful. It represents a very small number of the operations,” said Harris. “So we take the MAC result, convert it back into a digital bitstream, and then we can operate on that with the digital circuit.”

Huaiyu Meng, co-founder and CTO at Lightelligence, agreed. “Optic non-linearity does exist,” Meng said. “But for a usable product, it’s better to do that part back in the digital electronic domain because different people use different non-linearities.”

 

Fig. 5: An ASIC is packaged atop Lightmatter’s photonic IC. On the left, the weights are in the ASIC and drive the photonic IC through a DAC. In the middle, the ASIC provides accumulation and other activation functions. The right shows the 3D packaging of the digital and optical chips. Source: Lightmatter

Fig. 5: An ASIC is packaged atop Lightmatter’s photonic IC. On the left, the weights are in the ASIC and drive the photonic IC through a DAC. In the middle, the ASIC provides accumulation and other activation functions. The right shows the 3D packaging of the digital and optical chips. Source: Lightmatter

Photonics also can provide a stability benefit. “Optics have some dramatic advantages regarding mismatch and environmental fluctuations,” noted Harris. “It’s the principal reason why we’re able to achieve nearly 99% 32-bit floating-point accuracy across huge arrays of neural networks.”

The noise situation is also different. “If you build an analog electronic processor, you have to deal with thermal noise everywhere,” said Harris. “With optics, you don’t have that problem. You have to deal with photonic shot noise — the quantization of the electromagnetic field — but it’s really a lot better. You also don’t get crosstalk with photonics, because you’re not going to have capacitive coupling between photonic wires.”

In addition, it’s easier to dial up power if needed. “With photonics, you can just keep increasing the light intensity,” observed Harris. “With analog electronics or digital, you can’t go above one volt, or you’ll blow the transistors out.”

Minimal process changes
One of the enabling factors for these photonic circuits is the fact that they’re not built on a new photonic platform developed from the ground up. Lightmatter worked with GlobalFoundries, requiring modest development of some new structures and possibly some new materials. The fact that they stayed close to the existing photonics PDK meant it was easier to bring out a chip that will be commercially viable quickly.

Fig. 6: The left shows the ASIC layout; the right shows the photonic IC layout. Source: Lightmatter

Fig. 6: The left shows the ASIC layout; the right shows the photonic IC layout. Source: Lightmatter

“We’re building a computer not out of some low-level pilot line,” said Yu. “We’re building it out of a 300mm factory that builds computing parts for our other heavy-duty industrialized computers. A lot of people want to invent a whole new photonic architecture from scratch. But the challenge is that we have to find compromises between a high-yielding part that we’re doing for optical interconnect, plus the introduction of new structures.”

While staying close to existing manufacturing flows, select improvements could be helpful. “There are a couple of new things that we want to see, like a higher level of integration, and more efficient phase shifters that require either new processing possibilities or new materials,” noted Meng.

Given that photonic features are measured in microns rather than nanometers, it might seem that a photonic chip would be much larger than an electronic chip. That isn’t necessarily the case. The color multiplexing ability on its own provides great compute density. But even circuit-for-circuit, it may not be that bad. “Our photonic multiply-accumulate is about the same area as a 7nm equivalent,” said Harris.

Meng agreed: “The actual comparison in size is really not 1,000 times.”

There are also some practical limits to circuit scaling. “There is a natural limitation where the wavelengths of light is on the order of a single micron, depending on what kind of media you’re in,” cautioned Meng. “So it doesn’t make sense to make optic components much smaller than a single micron.”

To be clear, photonics is not a panacea for all circuits. “When our computation is based on logic, electrons are a better carrier,” said Meng. “However, most of the machine learning workload is not logic. It’s linear operations.”

All of this analog activity doesn’t mean the end of digital in AI, of course, even for a solution leveraging analog. “You can optimize your digital and analog portions in a way that overall power consumption will be lower,” said Infineon’s Pandey, noting that both will be present.

Still, it remains to be seen how much digital turf analog will claim.

Related
Compiling And Optimizing Neural Nets
Inferencing with lower power and improved performance.
Neural Networks Without Matrix Math
A different approach to speeding up AI and improving efficiency.
Making Sense Of New Edge-Inference Architectures
How to navigate a flood of confusing choices and terminology.



1 comments

Kevin Cameron says:

It’s a shame that Verilog-AMS is such a neglected standard, left languishing at Accellera while the IEEE P1800 effort (SystemVerilog) continues to ignore all things analog – it might actually be useful if you want to build giant analog (AI) chips, or even use AI to design them for you…

Leave a Reply


(Note: This name will be displayed publicly)