Pooling CPU Memory for LLM Inference With Lower Latency and Higher Throughput (UC Berkeley)


A new technical paper titled "Pie: Pooling CPU Memory for LLM Inference" was published by researchers at UC Berkeley. Abstract "The rapid growth of LLMs has revolutionized natural language processing and AI analysis, but their increasing size and memory demands present significant challenges. A common solution is to spill over to CPU memory; however, traditional GPU-CPU memory swapping ofte... » read more

Tech Talk: HBM vs. GDDR6


Frank Ferro, senior director of product management at Rambus, talks about memory bottlenecks and why both GDDR6 and high-bandwidth memory are gaining steam and for which markets. https://youtu.be/CPqdZZooS2g     Related Video GDDR6 – HBM2 Tradeoffs (2019) What type of DRAM works best where. » read more