DRAM Cache for GPUs With SCM And High Bandwidth


A new technical paper titled "Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory" was published by researchers at POSTECH and Songsil University. Abstract "We propose overcoming the memory capacity limitation of GPUs with high-capacity Storage-Class Memory (SCM) and DRAM cache. By significantly increasing the memory capacity with SCM, the GPU can capture a larger fraction o... » read more

LLM Inference on GPUs (Intel)


A technical paper titled “Efficient LLM inference solution on Intel GPU” was published by researchers at Intel Corporation. Abstract: "Transformer based Large Language Models (LLMs) have been widely used in many fields, and the efficiency of LLM inference becomes hot topic in real applications. However, LLMs are usually complicatedly designed in model structure with massive operations and... » read more

Analysis Of Accel-Sim GPGPU Simulator And Model Improvements


A technical paper titled “Analyzing and Improving Hardware Modeling of Accel-Sim” was published by researchers at Universitat Politècnica de Catalunya. Abstract: "GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU... » read more

Flipping Processor Design On Its Head


AI is changing processor design in fundamental ways, combining customized processing elements for specific AI workloads with more traditional processors for other tasks. But the tradeoffs are increasingly confusing, complex, and challenging to manage. For example, workloads can change faster than the time it takes to churn out customized designs. In addition, the AI-specific processes may ex... » read more

A Study Of LLMs On Multiple AI Accelerators And GPUs With A Performance Evaluation


A technical paper titled “A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators” was published by researchers at Argonne National Laboratory, State University of New York, and University of Illinois. Abstract: "Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (L... » read more

Scalable And Compact Multi-Bit CAM Designs Using FeFETs


A technical paper titled “SEE-MCAM: Scalable Multi-bit FeFET Content Addressable Memories for Energy Efficient Associative Search” was published by researchers at Zhejiang University, China, Georgia Institute of Technology, University of California Irvine, Rochester Institute of Technology, University of Notre Dame, and Laboratory of Collaborative Sensing and Autonomous Unmanned Systems of ... » read more

CXL’s Protection Mechanisms And How They Handle Real-World Security Problems


A technical paper titled “How Flexible is CXL's Memory Protection?: Replacing a sledgehammer with a scalpel” was published by researchers at University of Cambridge. Abstract: "CXL, a new interconnect standard for cache-coherent memory sharing, is becoming a reality - but its security leaves something to be desired. Decentralized capabilities are flexible and resilient against malicious a... » read more

Hyperscale HW Optimized Neural Architecture Search (Google)


A new technical paper titled "Hyperscale Hardware Optimized Neural Architecture Search" was published by researchers at Google, Apple, and Waymo. "This paper introduces the first Hyperscale Hardware Optimized Neural Architecture Search (H2O-NAS) to automatically design accurate and performant machine learning models tailored to the underlying hardware architecture. H2O-NAS consists of three ... » read more

Efficiently Process Large RM Datasets In Underlying Memory Pool, Disaggregated Over CXL (KAIST)


A technical paper titled "Failure Tolerant Training with Persistent Memory Disaggregation over CXL" was published (preprint) by researchers at KAIST and Panmnesia. "TRAININGCXL can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead," states the paper. Find the technical paper here. or here (IEE... » read more

Vulnerability of Neural Networks Deployed As Black Boxes Across Accelerated HW Through Electromagnetic Side Channels


This technical paper titled "Can one hear the shape of a neural network?: Snooping the GPU via Magnetic Side Channel" was presented by researchers at Columbia University, Adobe Research and University of Toronto at the 31st USENIX Security Symposium in August 2022. Abstract: "Neural network applications have become popular in both enterprise and personal settings. Network solutions are tune... » read more

← Older posts