RISC-V High Performance Multicore and GPU SoC Platform For Safety Critical System


A new technical paper titled "A RISC-V Multicore and GPU SoC Platform with a Qualifiable Software Stack for Safety Critical Systems" published by researchers at Universitat Politecnica de Catalunya and Barcelona Supercomputing Center. Abstract "In the context of the Horizon Europe project, METASAT, a hardware platform was developed as a prototype of future space systems. The platform is bas... » read more

Speeding Up Computational Lithography With The Power And Parallelism Of GPUs


There are so many challenges in producing modern semiconductor devices that it’s amazing for the industry to pull it off at all. From the underlying physics to fabrication processes to the development flow, there is no shortage of tough issues to address. Some of the biggest arise in lithography for deep submicron chips. A recent post outlined the major trends in lithography and summarized a ... » read more

HW-Aligned Sparse Attention Architecture For Efficient Long-Context Modeling (DeepSeek et al.)


A new technical paper titled "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention" was published by DeepSeek, Peking University and University of Washington. Abstract "Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention... » read more

Uncore Frequency Scaling For Energy Optimization In Heterogeneous Systems (UIC, Argonne)


A new technical paper titled "Exploring Uncore Frequency Scaling for Heterogeneous Computing" was published by researchers at University of Illinois Chicago and Argonne National Laboratory. Abstract "High-performance computing (HPC) systems are essential for scientific discovery and engineering innovation. However, their growing power demands pose significant challenges, particularly as sys... » read more

Transforming Industrial IoT With Edge AI And AR


The Internet of Things (IoT) has evolved significantly from its early days of centralized cloud processing. Initially, IoT applications relied heavily on cloud-based data processing, where data from various devices was collected, processed, and analyzed in the cloud before insights were sent back to the devices. While effective, this approach has limitations, particularly in environments requir... » read more

The Use Of GPU Compute In Automotive


The pace of innovation in automotive is accelerating. Electrification, advanced driver assistance systems (ADAS) and vehicle connectivity are revolutionizing the in-car experience, which is now largely determined by the capabilities of the car’s software and electronic hardware. When a vehicle can receive software upgrades while it is on the road, the electronic control units (ECUs) that a... » read more

GPUs: Bandit Based Framework To Dynamically Reduce Energy Consumption


A new technical paper titled "Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach" was published by researchers at Illinois Institute of Technology, Argonne National Lab and Emory University. Abstract "Energy consumption has become a critical design metric and a limiting factor in the development of future computing architectures, from small wearable devices to large-scale lea... » read more

New AI Processors Architectures Balance Speed With Efficiency


Leading AI systems designs are migrating away from building the fastest AI processor possible, adopting a more balanced approach that involves highly specialized, heterogeneous compute elements, faster data movement, and significantly lower power. Part of this shift revolves around the adoption of chiplets in 2.5D/3.5D packages, which enable greater customization for different workloads and ... » read more

GPU Microarchitecture Integrating Dedicated Matrix Units At The Cluster Level (UC Berkeley)


A new technical paper titled "Virgo: Cluster-level Matrix Unit Integration in GPUs for Scalability and Energy Efficiency" was published by UC Berkeley. Abstract "Modern GPUs incorporate specialized matrix units such as Tensor Cores to accelerate GEMM operations central to deep learning workloads. However, existing matrix unit designs are tightly coupled to the SIMT core, limiting the size a... » read more

Characterizing Three Supercomputers: Multi-GPU Interconnect Performance


A new technical paper titled "Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects" was published by researchers at Sapienza University of Rome, University of Trento, Vrije Universiteit Amsterdam, ETH Zurich, CINECA, University of Antwerp, IBM Research Europe, HPE Cray, and NVIDIA. Abstract "Multi-GPU nodes are increasingly common in the rapidly evolving landscape... » read more

← Older posts