Pooling CPU Memory for LLM Inference With Lower Latency and Higher Throughput (UC Berkeley)


A new technical paper titled "Pie: Pooling CPU Memory for LLM Inference" was published by researchers at UC Berkeley. Abstract "The rapid growth of LLMs has revolutionized natural language processing and AI analysis, but their increasing size and memory demands present significant challenges. A common solution is to spill over to CPU memory; however, traditional GPU-CPU memory swapping ofte... » read more

A Comprehensive Guide to Understanding AI Inference on the CPU


As AI continues to revolutionize industries, new workloads, like generative AI, inspire new use cases, the demand for efficient and scalable AI-based solutions has never been greater. While training often garners attention, inference—the process of applying trained models to new data—is essential for AI workloads, whether they are running in the cloud, or enabling real-world applications at... » read more

Lightweight, High-Performance CPU Extension for Protected Key Handles with CPU-Enforced Usage (CISPA, Ruhr Univ. Bochum)


A new technical paper titled "KeyVisor -- A Lightweight ISA Extension for Protected Key Handles with CPU-enforced Usage Policies" was published by researchers at CISPA Helmholtz Center for Information Security and Ruhr University Bochum. Abstract "The confidentiality of cryptographic keys is essential for the security of protection schemes used for communication, file encryption, and outsou... » read more

New AI Processors Architectures Balance Speed With Efficiency


Leading AI systems designs are migrating away from building the fastest AI processor possible, adopting a more balanced approach that involves highly specialized, heterogeneous compute elements, faster data movement, and significantly lower power. Part of this shift revolves around the adoption of chiplets in 2.5D/3.5D packages, which enable greater customization for different workloads and ... » read more

CPU Performance Bottlenecks Limit Parallel Processing Speedups


Multi-core processors theoretically can run many threads of code in parallel, but some categories of operation currently bog down attempts to raise overall performance by parallelizing computing. Is it time to have accelerators for running highly parallel code? Standard processors have many CPUs, so it follows that cache coherency and synchronization can involve thousands of cycles of low-le... » read more

Fantastical Creatures


In my day job I work in the High-Level Synthesis group at Siemens EDA, specifically focusing on algorithm acceleration. But on the weekends, sometimes, I take on the role of amateur cryptozoologist. As many of you know, the main Siemens EDA campus sits in the shadow of Mt. Hood and the Cascade Mountain range. This is prime habitat for Sasquatch, also known as “Bigfoot”. This weekend, ar... » read more

Navigating The GPU Revolution


Experts at the Table: Semiconductor Engineering sat down to discuss the impact of GPU acceleration on mask design and production and other process technologies, with Aki Fujimura, CEO of D2S; Youping Zhang, head of ASML Brion; Yalin Xiong, senior vice president and general manager of the BBP and reticle products division at KLA; and Kostas Adam, vice president of engineering at Synopsys. What f... » read more

Generative AI On Mobile Is Running On The Arm CPU


By Adnan Al-Sinan and Gian Marco Iodice 2023 was the year that showcased an impressive number of use cases powered by generative AI. This disruptive form of artificial intelligence (AI) technology is at the heart OpenAI's ChatGPT and Google’s Gemini AI model, with it demonstrating the opportunity to simplify work and advance education through generating text, images, or even audio content ... » read more

SoC Telemetry & Performance Analysis Using Statistical Profiling Extension


The Arm Statistical Profiling Extension (SPE) is an architectural feature designed for enhanced instruction execution profiling within Arm CPUs. This feature has been available since the introduction of the Neoverse N1 CPU platform in 2019, along with performance monitor units (PMUs) generally available in Arm CPUs. An important step in extracting value from capabilities like SPE and PMUs is th... » read more

Flipping Processor Design On Its Head


AI is changing processor design in fundamental ways, combining customized processing elements for specific AI workloads with more traditional processors for other tasks. But the tradeoffs are increasingly confusing, complex, and challenging to manage. For example, workloads can change faster than the time it takes to churn out customized designs. In addition, the AI-specific processes may ex... » read more

← Older posts