Knowledge Center
Knowledge Center

Near-Memory Computing

Moving compute closer to memory to reduce access costs.


Also called computational memory

Memory is used to store instructions and data within a computer. Most computers employ the von Neumann architecture, where a single contiguous memory region can be accessed via an index, called an address. There is a cost associated with moving memory contents back and forth to the processor and that cost increases with the relative distance that separates them. Cost can be measured in both performance and power. The lowest cost solution is to place memory on the same die as the processor. This limits the amount of memory that can exist and the types of memory that can be used.

As soon as you go off chip, larger amounts of memory can be connected, but the transfer cost rises significantly. There is a general relationship between the amount of memory and transfer costs. In most computers, cache is generally less than 1Mbyte and implemented as SRAM. DRAM, which exists as separate chips and boards connected through a socket, provides storage generally less than 1TByte. Disk storage can be many times this size and one could view the Cloud as the being the largest pool of storage with the highest access costs.

For many computers, several types of memory are utilized in a hierarchy so that data accessed most frequently is closer to the processor. Cache is one example of this implementation. This becomes more complex when multiple processors all share the same address space. This type of design could be described as moving memory closer to compute.

As systems have become more distributed and heterogeneous, it is becoming a lot more common to take the reverse strategy and to move compute closer to memory. In the simplest of terms, this could be a microcontroller doing processing on a data stream before it even reaches the main compute system. It can also exist in a much larger context such as Cloud applications, where instead of doing all computation in the Cloud, some of it is conducted at the edge. In this case, computing in location is cheaper than the transmission of data.

A specific application often knows more about memory access patterns or has different requirements compared to general purpose computing. In those cases, dedicated memory with an optimized interface and management over that data may provide significant advantages in terms of performance and power. This is common for Graphic Processors (GPUs) which have a high-speed channel to a relatively small amount of memory where the transfers are optimally organized for that application. Most of this memory is not accessible to the general computer environment that it is connected to.

Today, we are seeing memory chips being created that have processing capabilities built into them. An example is Solid State Disks (SSDs) that contain flash memory. These memories have some problems that can be overcome with a significant amount of processing for things like wear leveling and garbage collection. It is not much of a leap to either add a dedicated applications processor to these devices, or to utilize the existing processing power when it is not performing a management function. In these cases, the application to be performed is often loaded into the memory chip by the main processor. Similarly, other memory chips are being produced that add processing to DRAMs.


In Memory And Near-Memory Compute


Memory Subsystems In Edge Inferencing Chips


New Memory Options