High-Performance Memory At Low Cost Per Bit

Emerging applications drive the need for high memory capacity.


Hardware developers of deep learning neural networks (DNN) have a universal complaint – they need more and more memory capacity with high performance, low cost and low power. As artificial intelligence (AI) techniques gain wider adoption, their complexity and training requirements also increase. Large and complex DNN models do not fit on the small on-chip SRAM caches near the processor. This is where off-chip DRAM comes in handy to store large amounts of data. The AI processors and GPUs in datacenters rely on fast access to off-chip memory to perform complex computations in real time. In short, memory is becoming a real bottleneck in these applications.

Meanwhile, data generation from edge devices is exponentially increasing and the exploding data volumes place great pressure on data centers. Big data analytics on user data collected from social media and other sources drive sales and marketing strategies in many industries. As usual, more data implies bigger DRAM capacity to store all that data for quick analysis.

In the finance industry, fraud detection is becoming increasingly important as customers adopt new forms of digital currency and payment. The Payments Forum report (“CNP Fraud around the world” March 2017) predicts that the smart chip card implementation will lead to an increase of CNP (card not present) fraud in the U.S. from $3.1 billion in 2015 to $6.4 billion in 2018. When it comes to fraud detection and prevention, as you can well imagine, time is money. The memory hierarchy in datacenters typically consists of different types of memory as shown in Figure 1.

Figure 1: Memory Hierarchy

The on-chip SRAM memory at the top of the pyramid is the fastest, but also the most expensive and hence provisioned in limited quantity. Flash and hard disk drives are much cheaper and offer plenty of storage capacity, but their latency is unacceptable for direct use as main memory. DRAM offers a nice compromise between the two with fast access times at reasonable cost and high capacity. Fraud detection software typically relies on in-memory databases (IMDB) for faster decision making. The part of the database that fits within DRAM can be accessed with low latency, whereas the part that does not fit within the DRAM ends up getting stored in the cheaper storage with slow access. This means that real-time fraud detection will have to rely on the limited user history data that fits within the DRAM main memory. The bank behind a credit card is required to approve or reject a transaction from any user at any given moment in time, which requires them to quickly access large amounts of information for every transaction. Having decision making information directly available in-memory (versus disk storage) will allow the bank to make quicker, more accurate decisions and could help prevent fraud from occurring. Such applications can benefit tremendously from having access to large memory capacity in close proximity to the processor at low cost.

Reality: DRAM scaling is reaching physical limits, driving the price up
While applications continue clamoring for higher capacities of low cost high performance memory, the reality is that the end of Moore’s law is slowing down the DRAM scaling. The basic storage element in DRAM is a capacitor that is hard to shrink much further without affecting its retention capability. This is very a different story from the past several decades where DRAM technology scaling has steadily delivered affordable capacity increases. As a direct consequence, the DRAM prices have also been rising in recent years. Meanwhile, vertical stacking has allowed Flash devices to scale capacity at relatively lower cost, so the cost gap between DRAM and Flash continues to increase as shown in Figure 2.

Figure 2: DRAM versus Flash ASP (average selling price) (Source: IDC)

Solution: Hybrid memory that combines a small amount of DRAM with a large amount of cheap emerging memory
While DRAM technology is struggling to scale capacity, many emerging memories such as enhanced Fast Flash, RRAM (Resistive RAM), STT-MRAM (Spin Torque Transfer Magnetic RAM) and PCM (Phase Change Memory) offer the promise of affordable capacity scaling in future, even though these emerging memories are not in high volume production yet. To satisfy market needs, the practical solution is a hybrid memory architecture that combines a small amount of DRAM with a large proportion of emerging memory.

Challenges with emerging memory
Now these emerging memories come with their own set of challenges – high read and write latencies compared to DRAM, low bandwidth, low endurance, high write energy and large access size granularity in some cases. Left unmanaged, these attributes result in performance degradation and high replacement costs due to wear out.

Smart management techniques
To achieve high performance at low cost per bit, we need smart management techniques. In the hybrid memory system, this management is completely handled by hardware without needing software, operating system or application modifications. This approach is different from heterogeneous systems where software manages the data placement and movement between various tiers of memory. Rambus is building a modular and flexible hardware platform to enable research on smart management techniques that enable performance comparable to that of pure DRAM systems. In collaboration with IBM, the goal is to run real workloads on the Power 9 server and measure the performance of the hybrid memory subsystem, thus confirming the promising results observed in simulation.

In summary, applications demand increasing memory capacity at low cost and high performance. Hybrid memory solutions that combine DRAM with emerging memory offer the optimum cost per performance metric. Smart management is the key to making this architecture successful in meeting market demands.