Agentic AI Is Changing Data Center Architectures


Key Takeaways: The rise of agentic AI is shifting data centers from GPU-centric number crunching to CPU-driven orchestration, where managing long-running reasoning loops and context is just as important as raw compute. Integrating CPUs, GPUs, and stacked memory into tightly coupled multi-die architectures with varying workloads makes it much harder to ensure they will be reliable and ef... » read more

A New Era For Co-Processing


Key Takeaways: There is no single processor capable of executing everything efficiently, meaning that multiple processors are required. Maximum efficiency is gained by minimizing the movement of data. Architects must maximize efficiency for today's workloads, while also adding enough flexibility to handle tomorrow's. New processor architectures are rapidly evolving thanks to... » read more

Systematic Analysis of CPU-Induced Slowdowns in Multi-GPU LLM Inference (Georgia Tech)


A new technical paper, "Characterizing CPU-Induced Slowdowns in Multi-GPU LLM Inference," was published by the Georgia Institute of Technology. Abstract "Large-scale machine learning workloads increasingly rely on multi-GPU systems, yet their performance is often limited by an overlooked component: the CPU. Through a detailed study of modern large language model (LLM) inference and servin... » read more

Changes In Chip Architectures At The Edge


Edge computing is all about low latency, within a tight power budget, and with sufficient performance. This is very different from an AI data center, where the real focus is on data throughput between processor and memory. Achieving those goals requires a focus on what different processing elements bring to the table. Nigel Drego, co-founder and CTO of Quadric, talks about how these different c... » read more

Ultra-low-bit LLM Inference Allows AI-PC CPUs And Discrete Client GPUs To Approach High-end GPU-Level (Intel)


A new technical paper titled "Pushing the Envelope of LLM Inference on AI-PC and Intel GPUs" was published by researcher at Intel. Abstract "The advent of ultra-low-bit LLM models (1/1.58/2-bit), which match the perplexity and end-task performance of their full-precision counterparts using the same model size, is ushering in a new era of LLM inference for resource-constrained environments... » read more

Getting Real About AI Processors


There’s a lot of confusion and hype around AI. Nearly every service, product or subject area in the technology industry now has an AI label. A lot of this is valid and there’s no doubt that AI is opening up new capabilities and higher productivity across all industries. This white paper categorises AI and related hardware options, with a particular focus on on-device (i.e. edge) AI, givi... » read more

Pooling CPU Memory for LLM Inference With Lower Latency and Higher Throughput (UC Berkeley)


A new technical paper titled "Pie: Pooling CPU Memory for LLM Inference" was published by researchers at UC Berkeley. Abstract "The rapid growth of LLMs has revolutionized natural language processing and AI analysis, but their increasing size and memory demands present significant challenges. A common solution is to spill over to CPU memory; however, traditional GPU-CPU memory swapping ofte... » read more

A Comprehensive Guide to Understanding AI Inference on the CPU


As AI continues to revolutionize industries, new workloads, like generative AI, inspire new use cases, the demand for efficient and scalable AI-based solutions has never been greater. While training often garners attention, inference—the process of applying trained models to new data—is essential for AI workloads, whether they are running in the cloud, or enabling real-world applications at... » read more

Lightweight, High-Performance CPU Extension for Protected Key Handles with CPU-Enforced Usage (CISPA, Ruhr Univ. Bochum)


A new technical paper titled "KeyVisor -- A Lightweight ISA Extension for Protected Key Handles with CPU-enforced Usage Policies" was published by researchers at CISPA Helmholtz Center for Information Security and Ruhr University Bochum. Abstract "The confidentiality of cryptographic keys is essential for the security of protection schemes used for communication, file encryption, and outsou... » read more

New AI Processors Architectures Balance Speed With Efficiency


Leading AI systems designs are migrating away from building the fastest AI processor possible, adopting a more balanced approach that involves highly specialized, heterogeneous compute elements, faster data movement, and significantly lower power. Part of this shift revolves around the adoption of chiplets in 2.5D/3.5D packages, which enable greater customization for different workloads and ... » read more

← Older posts