While CIM can speed up multiplication operations, it comes with added risk and complexity.
Compute-in-memory (CIM) is not necessarily an Artificial Intelligence (AI) solution; rather, it is a memory management solution. CIM could bring advantages to AI processing by speeding up the multiplication operation at the heart of AI model execution. However, for that to be successful, an AI processing system would need to be explicitly architected to use CIM. The change would entail a shift from all-digital design workflows into a mixed-signal approach which would require deep design expertise and specialized semiconductor fabrication processes.
Compute-in-memory eliminates weight coefficient buffers and streamlines the primitive multiply operations, striving for increased AI inference throughput. However, it does not perform neural network processing by itself. Other functions like input data streaming, sequencing, accumulation buffering, activation buffering, and layer organization may become more vital factors in overall performance as model hardware mapping unfolds and complexity increases – more robust NPUs (Neural Processing Units) incorporate all those functions.
Fundamentally, compute-in-memory embeds a multiplier unit in a memory unit. A conventional digital multiplier takes two operands as digital words and produces a digital result, handling signing and scaling. Compute-in-memory uses a different approach, storing a weight coefficient as analog values in a specially designed transistor cell sub-array with rows and columns. The incoming digital data words enter the rows of the array, triggering analog voltage multiplies, then analog current summations occur along columns. An analog-to-digital converter creates the final digital word outputs from the summed analog values.
An individual memory cell can be straightforward in theory, such as these candidates:
Still, operating these cells presents mixed-signal challenges and a technology gap that is not closing anytime soon. So, why the intense interest in compute-in-memory for AI inference chips?
First, it can be fast. This is because analog multiplication happens quickly as part of the memory read cycle, transparent to the rest of the surrounding digital logic. It can also be lower power since fewer transistors switch at high frequencies. But there are some limitations from a system viewpoint. Additional steps needed for programming the analog values into the memory cells are a concern. Inaccuracy of the analog voltages, which may change over time, can inject bit errors into results – showing up as detection errors or false alarm rates.
Aside from its analog nature, the biggest concern for compute-in-memory may be bit precision and AI training requirements. Researchers seem confident in 4-bit implementations; however, more training cycles must be run for reliable inference at low precision. Raising the precision to 8-bit lowers training demands. It also increases the complexity of the arrays and the analog-to-digital converter for each array, offsetting area and power savings and worsening the chance for bit errors in the presence of system noise.
So is compute-in-memory worthy of consideration? There likely are niche applications where it could speed up AI inference. A more critical question: is the added risk and complexity of compute-in-memory worth the effort? A well-conceived NPU strategy and implementation may nullify any advantage of moving to compute-in-memory. We can contrast the tradeoffs for AI inference in four areas: power/performance/area (PPA), flexibility, quantization, and memory technology.
PPA
Flexibility
Quantization
Memory Technology
The answer to the original question might be that designers should consider CIM only if other, more established AI inference platforms (NPUs) cannot meet their requirements. Since CIM is riskier, costlier, and harder to implement, many should only consider it a last-resort solution.
Expedera explores this topic in much more depth in a recent white paper, which can be found at: https://www.expedera.com/architectural-considerations-for-compute-in-memory-in-ai-inference/
Leave a Reply