Novel NorthPole Architecture Enables Low-Latency, High-Energy-Efficiency LLM inference (IBM Research)


A new technical paper titled "Breakthrough low-latency, high-energy-efficiency LLM inference performance using NorthPole" was published by researchers at IBM Research. At the IEEE High Performance Extreme Computing (HPEC) Virtual Conference in September 2024, new performance results for their AIU NorthPole AI inference accelerator chip were presented on a 3-billion-parameter Granite LLM. ... » read more

Designing AI Hardware To Deal With Increasingly Challenging Memory Wall (UC Berkeley)


A new technical paper titled "AI and Memory Wall" was published by researchers at UC Berkeley, ICSI, and LBNL. Abstract "The availability of unprecedented unsupervised training data, along with neural scaling laws, has resulted in an unprecedented surge in model size and compute requirements for serving/training LLMs. However, the main performance bottleneck is increasingly shifting to memo... » read more

Research Bits: Dec. 18


Stacking 2D layers for AI processing Researchers from Washington University in St. Louis, MIT, Yonsei University, Inha University, Georgia Institute of Technology, and the University of Notre Dame demonstrated monolithic 3D integration of layered 2D material, creating a novel AI processing hardware that integrates sensing, signal processing, and AI computing functions into a single chip. Th... » read more

Performance Of Analog In-Memory Computing On Imaging Problems


A technical paper titled "Accelerating AI Using Next-Generation Hardware: Possibilities and Challenges With Analog In-Memory Computing" was published by researchers at Lund University and Ericsson Research. Abstract "Future generations of computing systems need to continue increasing processing speed and energy efficiency in order to meet the growing workload requirements under stringent en... » read more

Co-Design View of Cross-Bar Based Compute-In-Memory


A new review paper titled "Compute in-Memory with Non-Volatile Elements for Neural Networks: A Review from a Co-Design Perspective" was published by researchers at Argonne National Lab, Purdue University, and Indian Institute of Technology Madras. "With an over-arching co-design viewpoint, this review assesses the use of cross-bar based CIM for neural networks, connecting the material proper... » read more

New Uses For AI In Chips


Artificial intelligence is being deployed across a number of new applications, from improving performance and reducing power in a wide range of end devices to spotting irregularities in data movement for security reasons. While most people are familiar with using machine learning and deep learning to distinguish between cats and dogs, emerging applications show how this capability can be use... » read more

AI At The Edge: Optimizing AI Algorithms Without Sacrificing Accuracy


The ultimate measure of success for AI will be how much it increases productivity in our daily lives. However, the industry has huge challenges in evaluating progress. The vast number of AI applications is in constant churn: finding the right algorithm, optimizing the algorithm, and finding the right tools. In addition, complex hardware engineering is rapidly being updated with many different s... » read more

MIT: Stackable AI Chip With Lego-style Design


New technical paper titled "Reconfigurable heterogeneous integration using stackable chips with embedded artificial intelligence" from researchers at MIT, along with Harvard University, Tsinghua University, Zhejiang University, and others. Partial Abstract: "Here we report stackable hetero-integrated chips that use optoelectronic device arrays for chip-to-chip communication and neuromorphic... » read more

Toward Software-Equivalent Accuracy on Transformer-Based Deep Neural Networks With Analog Memory Devices


Abstract:  "Recent advances in deep learning have been driven by ever-increasing model sizes, with networks growing to millions or even billions of parameters. Such enormous models call for fast and energy-efficient hardware accelerators. We study the potential of Analog AI accelerators based on Non-Volatile Memory, in particular Phase Change Memory (PCM), for software-equivalent accurate i... » read more

Accelerating Inference of Convolutional Neural Networks Using In-memory Computing


Abstract: "In-memory computing (IMC) is a non-von Neumann paradigm that has recently established itself as a promising approach for energy-efficient, high throughput hardware for deep learning applications. One prominent application of IMC is that of performing matrix-vector multiplication in (1) time complexity by mapping the synaptic weights of a neural-network layer to the devices of a... » read more

← Older posts