Pooling CPU Memory for LLM Inference With Lower Latency and Higher Throughput (UC Berkeley)


A new technical paper titled "Pie: Pooling CPU Memory for LLM Inference" was published by researchers at UC Berkeley. Abstract "The rapid growth of LLMs has revolutionized natural language processing and AI analysis, but their increasing size and memory demands present significant challenges. A common solution is to spill over to CPU memory; however, traditional GPU-CPU memory swapping ofte... » read more

Memory Fundamentals For Engineers


Memory is one of a very few elite electronic components essential to any electronic system. Modern electronics perform extraordinarily complex duties that would be impossible without memory. Your computer obviously contains memory, but so does your car, your smartphone, your doorbell camera, your entertainment system, and any other gadget benefiting from digital electronics. This eBook prov... » read more

CXL: The Future Of Memory Interconnect?


Momentum for sharing memory resources between processor cores is growing inside of data centers, where the explosion in data is driving the need to be able to scale memory up and down in a way that roughly mirrors how processors are used today. A year after the CXL Consortium and JEDEC signed a memorandum of understanding (MOU) to formalize collaboration between the two organizations, suppor... » read more

Memory Disaggregation Research And Making It Practical With Hardware Trends (U. of Michigan)


A new technical paper titled "Memory Disaggregation: Advances and Open Challenges" was published by researchers at University of Michigan. Abstract "Compute and memory are tightly coupled within each server in traditional datacenters. Large-scale datacenter operators have identified this coupling as a root cause behind fleet-wide resource underutilization and increasing Total Cost of Owners... » read more

CXL-Based Memory Pooling System Meets Cloud Performance Goals And Significantly Reduces DRAM Cost


A technical paper titled "Pond: CXL-Based Memory Pooling Systems for Cloud Platforms" was published by researchers at Virginia Tech, Intel, Microsoft Azure, Google, and Stone Co. Abstract "Public cloud providers seek to meet stringent performance requirements and low hardware cost. A key driver of performance and cost is main memory. Memory pooling promises to improve DRAM utilization and t... » read more

CXL 3.0: From Expansion To Scaling


At the Flash Memory Summit in August, the CXL Consortium released the latest, and highly anticipated, version 3.0 of the Compute Express Link (CXL) specification. This new version of the specification builds on previous generations and introduces several compelling new features that promise to increase data center performance and scalability, while reducing the total cost of ownership (TCO). ... » read more

Rambus To Buy Hardent


Rambus inked a deal to buy Hardent, an engineering services company, in order to accelerate Rambus' push into the CXL arena. Compute Express Link (CXL), developed primarily by Intel before being turned into an open industry standard, allows memory to be disaggregated within a data center and shared across multiple servers. This, in turn, lets data centers control how critical resources are a... » read more

Improving Memory Efficiency And Performance


This is the second of two parts on CXL vs. OMI. Part one can be found here. Memory pooling and sharing are gaining traction as ways of optimizing existing resources to handle increasing data volumes. Using these approaches, memory can be accessed by a number of different machines or processing elements on an as-needed basis. Two protocols, CXL and OMI, are being leveraged to simplify thes... » read more

Changing Server Architectures In The Data Center


Data centers are undergoing a fundamental shift to boost server utilization and improve efficiency, optimizing architectures so available compute resources can be leveraged wherever they are needed. Traditionally, data centers were built with racks of servers, each server providing computing, memory, interconnect, and possibly acceleration resources. But when a server is selected, some of th... » read more