The Rise Of AI Co-Processors


Figuring out the best kinds of processors to use for different AI workloads is a challenge. AI algorithms are undergoing rapid and frequent changes, and the workloads tied to them can vary by data type, by user, and sometimes because of software/firmware updates. On top of that, AI computations tend to require much higher utilization rates than traditional computing, and that will only become m... » read more

Analog IMC Attention Mechanism For Fast And Energy-Efficient LLMs (FZJ, RWTH Aachen)


A new technical paper titled "Analog in-memory computing attention mechanism for fast and energy-efficient large language models" was published by researchers at Forschungszentrum Jülich and RWTH Aachen. Abstract "Transformer networks, driven by self-attention, are central to large language models. In generative transformers, self-attention uses cache memory to store token projec... » read more

Optimizing LLM Training Under GPU Memory Constraints (Argonne, RIT)


A new technical paper titled "MLP-Offload: Multi-Level, Multi-Path Offloading for LLM Pre-training to Break the GPU Memory Wall" was published by researchers at Argonne National Laboratory and Rochester Institute of Technology. Abstract "Training LLMs larger than the aggregated memory of multiple GPUs is increasingly necessary due to the faster growth of LLM sizes compared to GPU memory. To... » read more

What Do LLMs Want from Hardware


Figure 1: Noam Shazeer, Google Gemini vice president, presented this in his Hot Chips 2025 talk. Noam Shazeer is Google’s vice president of engineering for Gemini, their LLM competitor to ChatGPT. He talked recently at Hot Chips: “Predictions for the Next Phase of AI." He has worked on LLMs for a decade since inventing the transformer model in 2017. As his slide says, LLMs can take adv... » read more

Power Stabilization To Allow Continued Scaling Of AI Training Workloads (Microsoft, OpenAI, NVIDIA)


A new technical paper titled "Power Stabilization for AI Training Datacenters" was published by researchers at Microsoft, OpenAI, and NVIDIA. Abstract "Large Artificial Intelligence (AI) training workloads spanning several tens of thousands of GPUs present unique power management challenges. These arise due to the high variability in power consumption during the training. Given the synchron... » read more

LLM Inference: Core Bottlenecks Imposed By Memory, Compute Capacity, Synchronization Overheads (NVIDIA)


A new technical paper titled "Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need" was published by NVIDIA. Abstract "This paper presents a limit study of transformer-based large language model (LLM) inference, focusing on the fundamental performance bottlenecks imposed by memory bandwidth, memory capacity, and synchronization overhead in distributed ... » read more

GPU Acceleration Of Rigorous Lithography Simulations


Producing modern semiconductor devices is an immensely challenging process. Successful execution entails advanced process nodes, novel device architectures, new materials, and many fabrication steps. One especially challenging area is lithography, in which light is sent through a photomask, passes through a projection system of lenses and mirrors, and strikes the substrate to create the device ... » read more

Data Center CPU Dominance Is Shifting To AMD And Arm


Fig. 1: Created by ChatGPT from a text prompt. The data center processor market has seen two major tectonic shifts in the last decade. It used to be that all data center compute was x86, and well more than 90% of that was Intel. GPUs first appeared in the data center in 2016 (Pascal GPU). Now, the majority of computation is done on GPUs. AMD is looking to pass Intel in x86 share, and... » read more

Solving The Mixed Criticality Challenge In Automotive Controllers


Car users want immersive and interactive in-vehicle experiences, especially as autonomous driving technology takes over more of our driving responsibilities. To satisfy this demand, automotive manufacturers are deploying multiple display environments in their cars, from cockpits to heads-up displays and from central infotainment screens to rear-seat entertainment. All of these displays are powe... » read more

AI Pushes High-End Mobile From SoCs To Multi-Die


Advanced packaging is becoming a key differentiator for the high end of the mobile phone market, enabling higher performance, more flexibility, and faster time to market than systems on chip. Monolithic SoCs likely will remain the technology of choice for low-end and midrange mobile devices because of their form factor, proven record, and lower cost. But multi-die assemblies provide more fle... » read more

← Older posts Newer posts →