Gemmini: Open-source, Full-Stack DNN Accelerator Generator (DAC Best Paper)


This technical paper titled "Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration" was published jointly by researchers at UC Berkeley and a co-author from MIT.  The research was partially funded by DARPA and won DAC 2021 Best Paper. The paper presents Gemmini, "an open-source, full-stack DNN accelerator generator for DNN workloads, enabling end-to-e... » read more

Edge-AI Hardware for Extended Reality


New technical paper titled "Memory-Oriented Design-Space Exploration of Edge-AI Hardware for XR Applications" from researchers at Indian Institute of Technology Delhi and Reality Labs Research, Meta. Abstract "Low-Power Edge-AI capabilities are essential for on-device extended reality (XR) applications to support the vision of Metaverse. In this work, we investigate two representative XR w... » read more

HW/SW Co-Design to Configure DNN Models On Energy Harvesting Devices


New technical paper titled "EVE: Environmental Adaptive Neural Network Models for Low-Power Energy Harvesting System" was published by researchers at UT San Antonio, University of Connecticut, and Lehigh University. According to the abstract: "This paper proposes EVE, an automated machine learning (autoML) co-exploration framework to search for desired multi-models with shared weights for... » read more

Novel H2H mapping algorithm with both computation and communication awareness


New research paper "H2H: Heterogeneous Model to Heterogeneous System Mapping with Computation and Communication Awareness" from University of Pittsburgh, Georgia Tech. Abstract: "The complex nature of real-world problems calls for heterogeneity in both machine learning (ML) models and hardware systems. The heterogeneity in ML models comes from multi-sensor perceiving and multi-task lear... » read more

Wavelength Multiplexed Ultralow-Power Photonic Edge Computing


Abstract "Advances in deep neural networks (DNNs) are transforming science and technology. However, the increasing computational demands of the most powerful DNNs limit deployment on low-power devices, such as smartphones and sensors -- and this trend is accelerated by the simultaneous move towards Internet-of-Things (IoT) devices. Numerous efforts are underway to lower power consumption, but ... » read more

Mapping Transformation Enabled High-Performance and Low-Energy Memristor-Based DNNs


Abstract: "When deep neural network (DNN) is extensively utilized for edge AI (Artificial Intelligence), for example, the Internet of things (IoT) and autonomous vehicles, it makes CMOS (Complementary Metal Oxide Semiconductor)-based conventional computers suffer from overly large computing loads. Memristor-based devices are emerging as an option to conduct computing in memory for DNNs to make... » read more

Toward Software-Equivalent Accuracy on Transformer-Based Deep Neural Networks With Analog Memory Devices


Abstract:  "Recent advances in deep learning have been driven by ever-increasing model sizes, with networks growing to millions or even billions of parameters. Such enormous models call for fast and energy-efficient hardware accelerators. We study the potential of Analog AI accelerators based on Non-Volatile Memory, in particular Phase Change Memory (PCM), for software-equivalent accurate i... » read more

Enabling Training of Neural Networks on Noisy Hardware


Abstract:  "Deep neural networks (DNNs) are typically trained using the conventional stochastic gradient descent (SGD) algorithm. However, SGD performs poorly when applied to train networks on non-ideal analog hardware composed of resistive device arrays with non-symmetric conductance modulation characteristics. Recently we proposed a new algorithm, the Tiki-Taka algorithm, that overcomes t... » read more

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator


Abstract: "Recent work demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication—the intensive and key computation in deep neural networks (DNNs). One key problem is the weights that are signed values. However, in a ReRAM crossbar, weights are stored as conductance of... » read more

REDUCT: Keep It Close, Keep It Cool – Scaling DNN Inference on Multi-Core CPUs with Near-Cache Compute


Abstract—"Deep Neural Networks (DNN) are used in a variety of applications and services. With the evolving nature of DNNs, the race to build optimal hardware (both in datacenter and edge) continues. General purpose multi-core CPUs offer unique attractive advantages for DNN inference at both datacenter [60] and edge [71]. Most of the CPU pipeline design complexity is targeted towards optimizin... » read more

← Older posts Newer posts →