Toward Software-Equivalent Accuracy on Transformer-Based Deep Neural Networks With Analog Memory Devices

Abstract:  "Recent advances in deep learning have been driven by ever-increasing model sizes, with networks growing to millions or even billions of parameters. Such enormous models call for fast and energy-efficient hardware accelerators. We study the potential of Analog AI accelerators based on Non-Volatile Memory, in particular Phase Change Memory (PCM), for software-equivalent accurate i... » read more

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Abstract: "Recent work demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication—the intensive and key computation in deep neural networks (DNNs). One key problem is the weights that are signed values. However, in a ReRAM crossbar, weights are stored as conductance of... » read more

REDUCT: Keep It Close, Keep It Cool – Scaling DNN Inference on Multi-Core CPUs with Near-Cache Compute

Abstract—"Deep Neural Networks (DNN) are used in a variety of applications and services. With the evolving nature of DNNs, the race to build optimal hardware (both in datacenter and edge) continues. General purpose multi-core CPUs offer unique attractive advantages for DNN inference at both datacenter [60] and edge [71]. Most of the CPU pipeline design complexity is targeted towards optimizin... » read more

RaPiD: AI Accelerator for Ultra-low Precision Training and Inference

Abstract—"The growing prevalence and computational demands of Artificial Intelligence (AI) workloads has led to widespread use of hardware accelerators in their execution. Scaling the performance of AI accelerators across generations is pivotal to their success in commercial deployments. The intrinsic error-resilient nature of AI workloads present a unique opportunity for performance/energy i... » read more

Ten Lessons From Three Generations Shaped Google’s TPUv4i

Source: Norman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, Sheng Li, Peter Ma, Xiaoyu Ma, Nishant Patil, Sushma Prasad, Clifford Young, Zongwei Zhou (Google); David Patterson (Google / Berkeley) Find technical paper here. 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) Abstract–"Google de... » read more

Developers Turn To Analog For Neural Nets

Machine-learning (ML) solutions are proliferating across a wide variety of industries, but the overwhelming majority of the commercial implementations still rely on digital logic for their solution. With the exception of in-memory computing, analog solutions mostly have been restricted to universities and attempts at neuromorphic computing. However, that’s starting to change. “Everyon... » read more

Improving the Performance Of Deep Neural Networks

Source: North Carolina State University. Authors: Xilai Li, Wei Sun, and Tianfu Wu Abstract: "In state-of-the-art deep neural networks, both feature normalization and feature attention have become ubiquitous. They are usually studied as separate modules, however. In this paper, we propose a light-weight integration between the two schema and present Attentive Normalization (AN). Instead of l... » read more

Power/Performance Bits: Nov. 17

NVMe controller for research Researchers at the Korea Advanced Institute of Science and Technology (KAIST) developed a non-volatile memory express (NVMe) controller for storage devices and made it freely available to universities and research institutions in a bid to reduce research costs. Poor accessibility of NVMe controller IP is hampering academic and industrial research, the team argue... » read more

System Bits: June 18

Another win for aUToronto Photo credit: University of Toronto The University of Toronto’s student-led self-driving car team racked up its second consecutive victory last month at the annual AutoDrive Challenge in Ann Arbor, Mich. The three-year challenge goes out to North American universities, offering a Chevrolet Bolt electric vehicle to outfit with autonomous driving technology.... » read more

Bridging Machine Learning’s Divide

There is a growing divide between those researching [getkc id="305" comment="machine learning"] (ML) in the cloud and those trying to perform inferencing using limited resources and power budgets. Researchers are using the most cost-effective hardware available to them, which happens to be GPUs filled with floating point arithmetic units. But this is an untenable solution for embedded infere... » read more

← Older posts Newer posts →