Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology


Abstract: "Emerging applications such as deep neural network demand high off-chip memory bandwidth. However, under stringent physical constraints of chip packages and system boards, it becomes very expensive to further increase the bandwidth of off-chip memory. Besides, transferring data across the memory hierarchy constitutes a large fraction of total energy consumption of systems, and the ... » read more

IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors


Find technical paper link here. Abstract: "To operate efficiently across a wide range of workloads with varying power requirements, a modern processor applies different current management mechanisms, which briefly throttle instruction execution while they adjust voltage and frequency to accommodate for power-hungry instructions (PHIs) in the instruction stream. Doing so 1) reduces the pow... » read more

SpZip: Architectural Support for Effective Data Compression In Irregular Applications


Technical paper link is here. Published in: 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) Yifan Yang (MIT); Joel Emer (MIT / NVIDIA); Daniel Sanchez (MIT) Abstract: "Irregular applications, such as graph analytics and sparse linear algebra, exhibit frequent indirect, data-dependent accesses to single or short sequences of elements that cause high ma... » read more

Communication Algorithm-Architecture Co-Design for Distributed Deep Learning


"Abstract—Large-scale distributed deep learning training has enabled developments of more complex deep neural network models to learn from larger datasets for sophisticated tasks. In particular, distributed stochastic gradient descent intensively invokes all-reduce operations for gradient update, which dominates communication time during iterative training epochs. In this work, we identify th... » read more

Ten Lessons From Three Generations Shaped Google’s TPUv4i


Source: Norman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, Sheng Li, Peter Ma, Xiaoyu Ma, Nishant Patil, Sushma Prasad, Clifford Young, Zongwei Zhou (Google); David Patterson (Google / Berkeley) Find technical paper here. 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) Abstract–"Google de... » read more

HyperRec: Efficient Recommender Systems with Hyperdimensional Computing


A group of researchers are taking a different approach to AI. The University of California at San Diego, the University of California at Irvine, San Diego State University and DGIST recently presented a paper on a new hardware algorithm based on hyperdimensional (HD) computing, which is a brain-inspired computing model. The new algorithm, called HyperRec, uses data that is modeled with bina... » read more

Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud


SOURCE: Shulin Zeng, Guohao Dai, Hanbo Sun, Kai Zhong, Guangjun Ge, Kaiyuan Guo, Yu Wang, Huazhong Yang(Tsinghua University, Beijing, China).  Published on arXiv:2003.12101 [cs.DC])   ABSTRACT: "FPGAs have shown great potential in providing low-latency and energy-efficient solutions for deep neural network (DNN) inference applications. Currently, the majority of FPGA-based DNN accel... » read more

Plasticine: A Reconfigurable Architecture For Parallel Patterns (Stanford)


Source: Stanford University Stanford University has been developing Plasticine, which allows parallel patterns to be reconfigured. "ABSTRACT Reconfigurable architectures have gained popularity in recent years as they allow the design of energy-efficient accelerators. Fine-grain fabrics (e.g. FPGAs) have traditionally suffered from performance and power inefficiencies due to bit-level ... » read more

Understanding the Interactions of Workloads and DRAM Types: A Comprehensive Experimental Study


Abstract "It has become increasingly difficult to understand the complex interaction between modern applications and main memory, composed of DRAM chips. Manufacturers are now selling and proposing many different types of DRAM, with each DRAM type catering to different needs (e.g., high throughput, low power, high memory density). At the same time, the memory access patterns of prevalent and... » read more

Silicon CMOS Architecture For A Spin-based Quantum Computer


Source: UNSW Sydney Authors: M. Veldhorst (1,2),  H.G.J. Eenink (2,3) , C.H. Yang (2), and A.S. Dzurak (2) 1 Qutech, TU Delft, The Netherlands 2 Centre for Quantum Computation and Communication Technology, School of Electrical Engineering and Telecommunications,UNSW, Sydney, Australia 3 NanoElectronics Group, MESA+ Institute for Nanotechnology,University of Twente, The Netherlands Te... » read more

← Older posts Newer posts →