GPUs: Bandit Based Framework To Dynamically Reduce Energy Consumption


A new technical paper titled "Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach" was published by researchers at Illinois Institute of Technology, Argonne National Lab and Emory University. Abstract "Energy consumption has become a critical design metric and a limiting factor in the development of future computing architectures, from small wearable devices to large-scale lea... » read more

New AI Processors Architectures Balance Speed With Efficiency


Leading AI systems designs are migrating away from building the fastest AI processor possible, adopting a more balanced approach that involves highly specialized, heterogeneous compute elements, faster data movement, and significantly lower power. Part of this shift revolves around the adoption of chiplets in 2.5D/3.5D packages, which enable greater customization for different workloads and ... » read more

GPU Microarchitecture Integrating Dedicated Matrix Units At The Cluster Level (UC Berkeley)


A new technical paper titled "Virgo: Cluster-level Matrix Unit Integration in GPUs for Scalability and Energy Efficiency" was published by UC Berkeley. Abstract "Modern GPUs incorporate specialized matrix units such as Tensor Cores to accelerate GEMM operations central to deep learning workloads. However, existing matrix unit designs are tightly coupled to the SIMT core, limiting the size a... » read more

Characterizing Three Supercomputers: Multi-GPU Interconnect Performance


A new technical paper titled "Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects" was published by researchers at Sapienza University of Rome, University of Trento, Vrije Universiteit Amsterdam, ETH Zurich, CINECA, University of Antwerp, IBM Research Europe, HPE Cray, and NVIDIA. Abstract "Multi-GPU nodes are increasingly common in the rapidly evolving landscape... » read more

Fantastical Creatures


In my day job I work in the High-Level Synthesis group at Siemens EDA, specifically focusing on algorithm acceleration. But on the weekends, sometimes, I take on the role of amateur cryptozoologist. As many of you know, the main Siemens EDA campus sits in the shadow of Mt. Hood and the Cascade Mountain range. This is prime habitat for Sasquatch, also known as “Bigfoot”. This weekend, ar... » read more

Speed AND Accuracy: First-Of-Its-Kind Broad-Spectrum CFD Solver Built Natively on GPUs


Since the advent of computational methods to solve physics problems, especially in the realm of fluid dynamics, scientists and engineers have had to balance the need for accurate simulations with faster times to solution — with available computing resources affecting this balance. Now, we introduce to you a new broad-spectrum native GPU solver created by developers at Ansys. Find more i... » read more

Navigating The GPU Revolution


Experts at the Table: Semiconductor Engineering sat down to discuss the impact of GPU acceleration on mask design and production and other process technologies, with Aki Fujimura, CEO of D2S; Youping Zhang, head of ASML Brion; Yalin Xiong, senior vice president and general manager of the BBP and reticle products division at KLA; and Kostas Adam, vice president of engineering at Synopsys. What f... » read more

DRAM Cache for GPUs With SCM And High Bandwidth


A new technical paper titled "Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory" was published by researchers at POSTECH and Songsil University. Abstract "We propose overcoming the memory capacity limitation of GPUs with high-capacity Storage-Class Memory (SCM) and DRAM cache. By significantly increasing the memory capacity with SCM, the GPU can capture a larger fraction o... » read more

LLM Inference on GPUs (Intel)


A technical paper titled “Efficient LLM inference solution on Intel GPU” was published by researchers at Intel Corporation. Abstract: "Transformer based Large Language Models (LLMs) have been widely used in many fields, and the efficiency of LLM inference becomes hot topic in real applications. However, LLMs are usually complicatedly designed in model structure with massive operations and... » read more

Analysis Of Accel-Sim GPGPU Simulator And Model Improvements


A technical paper titled “Analyzing and Improving Hardware Modeling of Accel-Sim” was published by researchers at Universitat Politècnica de Catalunya. Abstract: "GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU... » read more

← Older posts