Flexible AI-MCU For Fast Inference of Transformer Models At The Ultra-Low-Power Edge (ETH Zurich, U. Bologna)


Researchers from ETH Zurich and University of Bologna have released “CHIMERA: A Flexible and Scalable 3.1 TOPS/W AI-MCU with Transformer Accelerator and 563 Gb/s Shared-L2 Memory Subsystem with QoS Guarantees”. Abstract “We present Chimera, a flexible and scalable Microcontroller Unit (MCU) designed to accelerate real-time inference of rapidly evolving transformer-based models a... » read more

HW-Based Image Generation Using FTJs (SNU, Sungkyunkwan U., SK hynix et al.)


A new technical paper, "CMOS-compatible ferroelectric tunnel junctions integrate stochastic sampling and deterministic computing for image generation," was published by researchers at Seoul National University, Sungkyunkwan University, Hanyang University, Sogang University, and SK Hynix. Abstract "Recent progress in generative modeling has intensified the need for compact, energy-efficien... » read more

Accelerator Architecture For In-Memory Computation of CNN Inferences Using Racetrack Memory


A new technical paper titled "Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems" was published by researchers at National University of Singapore, A*STAR, Chinese Academy of Sciences, and Hong Kong University of Science and Technology. Abstract "Deep neural networks generate and process large volumes of data, posing challe... » read more

All-In-One Analog AI Accelerator With CMO/HfOx ReRAM Integrated Into The BEOL (IBM Research-Europe)


A new technical paper titled "All-in-One Analog AI Hardware: On-Chip Training and Inference with Conductive-Metal-Oxide/HfOx ReRAM Devices" was published by researchers at IBM Research-Europe. Abstract "Analog in-memory computing is an emerging paradigm designed to efficiently accelerate deep neural network workloads. Recent advancements have focused on either inference or training accelera... » read more

RISC-V High Performance Multicore and GPU SoC Platform For Safety Critical System


A new technical paper titled "A RISC-V Multicore and GPU SoC Platform with a Qualifiable Software Stack for Safety Critical Systems" published by researchers at Universitat Politecnica de Catalunya and Barcelona Supercomputing Center. Abstract "In the context of the Horizon Europe project, METASAT, a hardware platform was developed as a prototype of future space systems. The platform is bas... » read more

Survey of Energy Efficient PIM Processors


A new technical paper titled "Survey of Deep Learning Accelerators for Edge and Emerging Computing" was published by researchers at University of Dayton and the Air Force Research Laboratory. Abstract "The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI compu... » read more

Scheduling Multi-Model AI Workloads On Heterogeneous MCM Accelerators (UC Irvine)


A technical paper titled “SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators” was published by researchers at University of California Irvine. Abstract: "Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware. To address such increasing demands, designin... » read more

New Ways To Optimize GEMM-Based Applications Targeting Two Leading AI-Optimized FPGA Architectures


A technical paper titled “Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs” was published by researchers at The University of Texas at Austin and Arizona State University. Abstract: "FPGAs are a promising platform for accelerating Deep Learning (DL) applications, due to their high performance, low power consumption, and reconfigurability. Recently, the leading FPGA... » read more

Embrace The New!


The ResNet family of machine learning algorithms was introduced to the AI world in 2015. A slew of variations was rapidly discovered that at the time pushed the accuracy of ResNets close to the 80% threshold (78.57% Top 1 accuracy for ResNet-152 on ImageNet). This state-of-the-art performance at the time, coupled with the rather simple operator structure that was readily amenable to hardware ac... » read more

Hardware-Based Methodology To Protect AI Accelerators


A technical paper titled “A Unified Hardware-based Threat Detector for AI Accelerators” was published by researchers at Nanyang Technological University and Tsinghua University. Abstract: "The proliferation of AI technology gives rise to a variety of security threats, which significantly compromise the confidentiality and integrity of AI models and applications. Existing software-based so... » read more

← Older posts