Scalable Chiplet System for LLM Training, Finetuning and Reduced DRAM Accesses (Tsinghua University)


A new technical paper titled "Hecaton: Training and Finetuning Large Language Models with Scalable Chiplet Systems" was published by researchers at Tsinghua University. Abstract "Large Language Models (LLMs) have achieved remarkable success in various fields, but their training and finetuning require massive computation and memory, necessitating parallelism which introduces heavy communicat... » read more

DL Compiler for Efficiently Utilizing Inter-Core Connected AI Chips (UIUC, Microsoft)


A new technical paper titled "Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor" was published by researchers at UIUC and Microsoft Research. Abstract "As AI chips incorporate numerous parallelized cores to scale deep learning (DL) computing, inter-core communication is enabled recently by employing high-bandwidth and low-latency interconnect links on th... » read more

A HW-Based Correct Execution Environment Supporting Virtual Memory (Korea U., KAIST)


A new technical paper titled "A Hardware-Based Correct Execution Environment Supporting Virtual Memory" was published by researchers at Korea University, Korea Advanced Institute of Science and Technology and other universities. Abstract "The rapid increase in data generation has led to outsourcing computation to cloud service providers, allowing clients to handle large tasks without inve... » read more

Data Memory-Dependent Prefetchers Pose SW Security Threat By Breaking Cryptographic Implementations


A technical paper titled “GoFetch: Breaking Constant-Time Cryptographic Implementations Using Data Memory-Dependent Prefetchers” was presented at the August 2024 USENIX Security Symposium by researchers at University of Illinois Urbana-Champaign, University of Texas at Austin, Georgia Institute of Technology, University of California Berkeley, University of Washington, and Carnegie Mellon U... » read more

Freeing Up Near-Memory Capacity For Cache Using Compression Techniques In A Flat Hybrid-Memory Architecture


A technical paper titled “HMComp: Extending Near-Memory Capacity using Compression in Hybrid Memory” was published by researchers at Chalmers University of Technology and ZeroPoint Technologies. Abstract: "Hybrid memories, especially combining a first-tier near memory using High-Bandwidth Memory (HBM) and a second-tier far memory using DRAM, can realize a large and low cost, high-bandwi... » read more

Electrochemical RAM Cross-Point Arrays For An Analog DL Accelerator


A technical paper titled “Retention-aware zero-shifting technique for Tiki-Taka algorithm-based analog deep learning accelerator” was published by researchers at Pohang University of Science and Technology, Korea University, and Kyungpook National University. "We present the fabrication of 4 K-scale electrochemical random-access memory (ECRAM) cross-point arrays for analog neural network... » read more

Data Filtering Directly Within A NAND Flash Memory Chip


A technical paper titled “Search-in-Memory (SiM): Reliable, Versatile, and Efficient Data Matching in SSD's NAND Flash Memory Chip for Data Indexing Acceleration” was published by researchers at TU Dortmund, Academia Sinica, and National Taiwan University. "This paper introduces the Search-in-Memory (SiM) chip, which demonstrates the feasibility of performing data filtering directly with... » read more

Secure Low-Cost In-DRAM Trackers For Mitigating Rowhammer (Georgia Tech, Google, Nvidia)


A new technical paper titled "MINT: Securely Mitigating Rowhammer with a Minimalist In-DRAM Tracker" was published by researchers at Georgia Tech, Google, and Nvidia. Abstract "This paper investigates secure low-cost in-DRAM trackers for mitigating Rowhammer (RH). In-DRAM solutions have the advantage that they can solve the RH problem within the DRAM chip, without relying on other parts of ... » read more

MTJ-Based CRAM Array


A new technical paper titled "Experimental demonstration of magnetic tunnel junction-based computational random-access memory" was published by researchers at University of Minnesota and University of Arizona, Tucson. Abstract "The conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence because ... » read more

Co-optimizing HW Architecture, Memory Footprint, Device Placement And Per-Chip Operator Scheduling (Georgia Tech, Microsoft)


A technical paper titled “Integrated Hardware Architecture and Device Placement Search” was published by researchers at Georgia Institute of Technology and Microsoft Research. Abstract: "Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy. This is the first work to explore the co-optimization ... » read more

← Older posts Newer posts →