中文 English

Toward Software-Equivalent Accuracy on Transformer-Based Deep Neural Networks With Analog Memory Devices


Abstract:  "Recent advances in deep learning have been driven by ever-increasing model sizes, with networks growing to millions or even billions of parameters. Such enormous models call for fast and energy-efficient hardware accelerators. We study the potential of Analog AI accelerators based on Non-Volatile Memory, in particular Phase Change Memory (PCM), for software-equivalent accurate i... » read more

Enabling Training of Neural Networks on Noisy Hardware


Abstract:  "Deep neural networks (DNNs) are typically trained using the conventional stochastic gradient descent (SGD) algorithm. However, SGD performs poorly when applied to train networks on non-ideal analog hardware composed of resistive device arrays with non-symmetric conductance modulation characteristics. Recently we proposed a new algorithm, the Tiki-Taka algorithm, that overcomes t... » read more

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator


Abstract: "Recent work demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication—the intensive and key computation in deep neural networks (DNNs). One key problem is the weights that are signed values. However, in a ReRAM crossbar, weights are stored as conductance of... » read more

REDUCT: Keep It Close, Keep It Cool – Scaling DNN Inference on Multi-Core CPUs with Near-Cache Compute


Abstract—"Deep Neural Networks (DNN) are used in a variety of applications and services. With the evolving nature of DNNs, the race to build optimal hardware (both in datacenter and edge) continues. General purpose multi-core CPUs offer unique attractive advantages for DNN inference at both datacenter [60] and edge [71]. Most of the CPU pipeline design complexity is targeted towards optimizin... » read more

Xilinx AI Engines And Their Applications


This white paper explores the architecture, applications, and benefits of using Xilinx's new AI Engine for compute intensive applications like 5G cellular and machine learning DNN/CNN. 5G requires between five to 10 times higher compute density when compared with prior generations; AI Engines have been optimized for DSP, meeting both the throughput and compute requirements to deliver the hig... » read more

Taming Non-Predictable Systems


How predictable are semiconductor systems? The industry aims to create predictable systems and yet when a carrot is dangled, offering the possibility of faster, cheaper, or some other gain, decision makers invariably decide that some degree of uncertainty is warranted. Understanding uncertainty is at least the first step to making informed decisions, but new tooling is required to assess the im... » read more

Manufacturing Bits: April 23


Sorting nuclei CERN and GSI Darmstadt have begun testing the first of two giant magnets that will serve as part of one of the largest and most complex accelerator facilities in the world. CERN, the European Organization for Nuclear Research, recently obtained two magnets from GSI. The two magnets weigh a total of 27 tons. About 60 more magnets will follow over the next five years. These ... » read more

Deep Learning Hardware: FPGA vs. GPU


FPGAs or GPUs, that is the question. Since the popularity of using machine learning algorithms to extract and process the information from raw data, it has been a race between FPGA and GPU vendors to offer a HW platform that runs computationally intensive machine learning algorithms fast and efficiently. As Deep Learning has driven most of the advanced machine learning applications, it is r... » read more

Machine Learning Shifts More Work to FPGAs, SoCs


A wave of machine-learning-optimized chips is expected to begin shipping in the next few months, but it will take time before data centers decide whether these new accelerators are worth adopting and whether they actually live up to claims of big gains in performance. There are numerous reports that silicon custom-designed for machine learning will deliver 100X the performance of current opt... » read more

High-Performance Memory At Low Cost Per Bit


Hardware developers of deep learning neural networks (DNN) have a universal complaint – they need more and more memory capacity with high performance, low cost and low power. As artificial intelligence (AI) techniques gain wider adoption, their complexity and training requirements also increase. Large and complex DNN models do not fit on the small on-chip SRAM caches near the processor. This ... » read more

← Older posts