Vector Runahead


Abstract: "The memory wall places a significant limit on performance for many modern workloads. These applications feature complex chains of dependent, indirect memory accesses, which cannot be picked up by even the most advanced microarchitectural prefetchers. The result is that current out-of-order superscalar processors spend the majority of their time stalled. While it is possible to bui... » read more

Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology


Abstract: "Emerging applications such as deep neural network demand high off-chip memory bandwidth. However, under stringent physical constraints of chip packages and system boards, it becomes very expensive to further increase the bandwidth of off-chip memory. Besides, transferring data across the memory hierarchy constitutes a large fraction of total energy consumption of systems, and the ... » read more

Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers


Harini Muthukrishnan (U of Michigan); David Nellans, Daniel Lustig (NVIDIA); Jeffrey A. Fessler, Thomas Wenisch (U of Michigan). Abstract—"Despite continuing research into inter-GPU communication mechanisms, extracting performance from multiGPU systems remains a significant challenge. Inter-GPU communication via bulk DMA-based transfers exposes data transfer latency on the GPU’s critical... » read more

PF-DRAM: A Precharge-Free DRAM Structure


Authors: Nezam Rohbani † (IPM); Sina Darabii § (Sharif); Hamid Sarbazi-Azad † i §(Sharif / IPM): † School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran § Department of Computer Engineering, Sharif University of Technology, Tehran, Iran Abstract: "Although DRAM capacity and bandwidth have increased sharply by the advances in technology ... » read more

A Novel PUF Using Stochastic Short-Term Memory Time of Oxide-Based RRAM for Embedded Applications


Abstract: "RRAM suffers from poor retention with short-term memory time when using low compliance current for programing. However, the short-term memory time exhibits ideal randomness, which can be exploited as an entropy source for physically unclonable function (PUF). In this work, we demonstrated a novel PUF utilizing the stochastic short-term memory time of oxide-based RRAM. The proposed P... » read more

A Novel Complementary Architecture of One-time-programmable Memory and Its Applications as Physical Unclonable Function (PUF) and One-time Password


Abstract "For the first time, we proposed a 2T complementary architecture of one-time-programmable memory (OTP) in a foundry logic CMOS chip. It was then used to realize the PUF (Physical unclonable function), and the combination with the AI technology to provide a one-time password capability. At first, an OTP was developed based on a novel 2T CMOS unit cell. The experimental results show t... » read more

A Machine-Learning-Resistant 3D PUF with 8-layer Stacking Vertical RRAM and 0.014% Bit Error Rate Using In-Cell Stabilization Scheme for IoT Security Applications


Abstract: "In this work, we propose and demonstrate a multi-layer 3-dimensional (3D) vertical RRAM (VRRAM) PUF with in-cell stabilization scheme to improve both cost efficiency and reliability. An 8-layer VRRAM array was manufactured with excellent uniformity and good endurance of >10 7 . Apart from the variation in RRAM resistance, enhanced randomness is obtained thanks to the parasitic IR... » read more

Shared-Write-Channel-Based Device for High-Density Spin-Orbit-Torque Magnetic Random-Access Memory


ABSTRACT "Spin-orbit-torque (SOT) devices are promising candidates for the future magnetic memory landscape, as they promise high endurance, low read disturbance, and low read error, in comparison with spin-transfer torque devices. However, SOT memories are area intensive due to the requirement for two access transistors per bit. Here, we report a multibit SOT cell that has a single write chan... » read more

Revealing DRAM Operating GuardBands through Workload-Aware Error Predictive Modeling


Abstract Abstract—Improving the energy efficiency of DRAMs becomes very challenging due to the growing demand for storage capacity and failures induced by the manufacturing process. To protect against failures, vendors adopt conservative margins in the refresh period and supply voltage. Previously, it was shown that these margins are too pessimistic and will become impractical due to high ... » read more

Efficient Spin-Orbit Torque Switching with Non-Epitaxial Chalcogenide Heterostructures


Abstract: "The spin–orbit torques (SOTs) generated from topological insulators (TIs) have gained increasing attention in recent years. These TIs, which are typically formed by epitaxially grown chalcogenides, possess extremely high SOT efficiencies and have great potential to be employed in next-generation spintronics devices. However, epitaxy of these chalcogenides is required to ensure the... » read more

← Older posts Newer posts →