Challenges For Compute-In-Memory Accelerators

The key is to build these with real devices because the application has to work.

popularity

A compute-in-memory (CIM) accelerator does not simply replace conventional logic. It’s a lot more complicated than that.

Regardless of the memory technology, the accelerator redefines the latency and energy consumption characteristics of the system as a whole. When the accelerator is built from noisy, low-precision computational elements, the situation becomes even more complex.

Tzu-Hsiang Hsu and colleagues at Taiwan’s National Tsing Hua University proposed a hierarchy of computing technologies to maximize energy efficiency in edge devices. For example, a security camera might have three modes — movement detection, object identification, and face recognition.

Movement detection is a low-power task, requiring only minimal computation. It can be used to “wake up” the system when more complex analysis is required. Once alerted, the system might use low resolution object identification to decide whether the movement is due to a pet, a curtain blowing in the breeze, or a human. Still, this first level of analysis might be within the abilities of a neural network built into the camera, with low precision weights.

Only if a human is detected is high-resolution facial recognition needed to identify the person as either authorized or not. Facial recognition might require a more capable computer, either locally or at a remote data center. But in image-driven applications, researchers found that this transmission of data from the camera to a backend processor accounts for the majority of energy use.

Conducting the initial object identification locally can substantially reduce the overall energy cost. Object identification tasks can use static weights, which can be developed using a training database and written when the device is initialized. Endurance and programming accuracy are significant issues for ReRAM-based accelerators, and static weights reduce both concerns. In particular, Qiwen Wang and colleagues at the University of Michigan noted that write-verify schemes to confirm the accuracy of programmed ReRAM values are easier to implement if the weights rarely change. The facial recognition weights, on the other hand, might need to be updated, for instance to add or remove employees or to allow individual households to define their own authorized visitors. While dynamic weights can be used with CIM designs, they are more compatible with some device technologies than others. More mature memory elements, like DRAM and flash, generally have better endurance and store values more accurately.

Accelerators and precision
Using lower precision weights gives efficiency advantages, but Tayfun Gokmen and colleagues at IBM Research observed that any accelerator necessarily reduces the precision of the calculation. Analog/digital converters, and even the memory array itself, use finite numbers of bits. Weights calculated to full floating point precision by a training algorithm on a supercomputer may be reduced to only four or five bits, sometimes less, for a local inference task. Even single-bit binary weights can give excellent results for simple tasks.

However, it’s essential that the training procedure consider the number of bits that actually will be available to the target design. The IBM group found that even 9-bit ADCs introduced errors when models were trained with floating point arithmetic. When lower precision training was used, though, a 7-bit ADC gave results comparable to the floating point model.

Similarly, accelerators based on non-ideal devices like ReRAMs must account for noise and programming errors. Tien-Ju Yang and Vivienne Sze of MIT modeled non-ideal memory behavior as noise and compared the responses of 13 different network designs. While most designs can tolerate a certain amount of noise, they did not find a clear link between a model’s accuracy under ideal conditions and its noise tolerance. The “best” and “worst” networks changed with the standard deviation of the applied noise. Designers must consider the real, non-ideal behavior of their target hardware in network designs.

Non-ideal devices introduce a number of potential issues, starting with the possibility that the calculated weight and the weight actually written to the memory element may not be the same. ReRAM devices tend to have variable switching characteristics. A given voltage pulse (or pulses) won’t necessarily store the same value in every device. Recalculating pre-determined weights once the characteristics of the actual devices are known is expensive. In some situations, the original training data may be proprietary or otherwise inaccessible, and adjusting the weights to accommodate new information about the characteristics of the network may not be possible. A mechanism to verify what weight value is actually stored can compensate for inaccuracies in the write step.

Other sources of variation are inherent in the physics of ReRAM devices. For instance, the University of Michigan group noted that typically the “zero” value of an ReRAM memory element is assigned to the high resistance state. This element will still pass current. It has a high resistance, but is not a true open circuit. A design that assumes true zero values will accumulate potentially large errors, but a ReRAM-aware design can avoid the issue altogether. They used a pair of devices and mapped weights to the difference between them to get better results than defining weights relative to some “zero” point.

Design for the hardware you have
Reduced precision is often suggested as a way to improve the computational efficiency of both conventional and CIM systems. It’s important to remember, though, that the application ultimately defines the required precision. Above all other considerations, the application has to work. To make sure it does, designers of practical systems need to design for real hardware, not idealized models.

Related Stories
Compute-In Memory Accelerators Up-End Network Design Tradeoffs
Compute paradigm shifting as more data needs to be processed more quickly.
Scaling Up Compute-In-Memory Accelerators
New research points to progress and problems in a post-von Neumann world.
In-Memory Computing Knowledge Center
Top stories, special reports, videos and more about In-Memory Computing
In-Memory Computing
Why this approach is so interesting today, and what it really entails.
Will In-Memory Processing Work?
Changes that sidestep von Neumann architecture could be key to low-power ML hardware.
Pushing Memory Harder
Can the processor/memory bottleneck be closed, or do applications need to be re-architected to avoid it?



1 comments

Frederick Chen says:

ECC is used with Flash to combat errors, as well as DRAM requiring no error tolerance. So it is expected to be used for ReRAM, MRAM, etc. Maybe it is expected to some degree even for CIM, e.g., https://projet.liris.cnrs.fr/imagine/pub/proceedings/ICPR-2010/data/4109e291.pdf

Leave a Reply


(Note: This name will be displayed publicly)