Analysis Of Accel-Sim GPGPU Simulator And Model Improvements


A technical paper titled “Analyzing and Improving Hardware Modeling of Accel-Sim” was published by researchers at Universitat Politècnica de Catalunya. Abstract: "GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU... » read more

Flipping Processor Design On Its Head


AI is changing processor design in fundamental ways, combining customized processing elements for specific AI workloads with more traditional processors for other tasks. But the tradeoffs are increasingly confusing, complex, and challenging to manage. For example, workloads can change faster than the time it takes to churn out customized designs. In addition, the AI-specific processes may ex... » read more

A Study Of LLMs On Multiple AI Accelerators And GPUs With A Performance Evaluation


A technical paper titled “A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators” was published by researchers at Argonne National Laboratory, State University of New York, and University of Illinois. Abstract: "Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (L... » read more

Scalable And Compact Multi-Bit CAM Designs Using FeFETs


A technical paper titled “SEE-MCAM: Scalable Multi-bit FeFET Content Addressable Memories for Energy Efficient Associative Search” was published by researchers at Zhejiang University, China, Georgia Institute of Technology, University of California Irvine, Rochester Institute of Technology, University of Notre Dame, and Laboratory of Collaborative Sensing and Autonomous Unmanned Systems of ... » read more

CXL’s Protection Mechanisms And How They Handle Real-World Security Problems


A technical paper titled “How Flexible is CXL's Memory Protection?: Replacing a sledgehammer with a scalpel” was published by researchers at University of Cambridge. Abstract: "CXL, a new interconnect standard for cache-coherent memory sharing, is becoming a reality - but its security leaves something to be desired. Decentralized capabilities are flexible and resilient against malicious a... » read more

Hyperscale HW Optimized Neural Architecture Search (Google)


A new technical paper titled "Hyperscale Hardware Optimized Neural Architecture Search" was published by researchers at Google, Apple, and Waymo. "This paper introduces the first Hyperscale Hardware Optimized Neural Architecture Search (H2O-NAS) to automatically design accurate and performant machine learning models tailored to the underlying hardware architecture. H2O-NAS consists of three ... » read more

Efficiently Process Large RM Datasets In Underlying Memory Pool, Disaggregated Over CXL (KAIST)


A technical paper titled "Failure Tolerant Training with Persistent Memory Disaggregation over CXL" was published (preprint) by researchers at KAIST and Panmnesia. "TRAININGCXL can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead," states the paper. Find the technical paper here. or here (IEE... » read more

Vulnerability of Neural Networks Deployed As Black Boxes Across Accelerated HW Through Electromagnetic Side Channels


This technical paper titled "Can one hear the shape of a neural network?: Snooping the GPU via Magnetic Side Channel" was presented by researchers at Columbia University, Adobe Research and University of Toronto at the 31st USENIX Security Symposium in August 2022. Abstract: "Neural network applications have become popular in both enterprise and personal settings. Network solutions are tune... » read more

Techniques For Improving Energy Efficiency of Training/Inference for NLP Applications, Including Power Capping & Energy-Aware Scheduling


This new technical paper titled "Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models" is from researchers at MIT and Northeastern University. Abstract: "The energy requirements of current natural language processing models continue to grow at a rapid, unsustainable pace. Recent works highlighting this problem conclude there is an urgent need ... » read more

Using GPUs to Speed Up DFIT Analysis


Researchers at National University of Singapore and an independent researcher presented a new technical paper titled "FlowMatrix: GPU-Assisted Information-Flow Analysis through Matrix-Based Representation" at the USENIX Security Symposium in Boston in August 2022. Abstract: "Dynamic Information Flow Tracking (DIFT) forms the foundation of a wide range of security and privacy analyses. The ... » read more

← Older posts Newer posts →