Novel NorthPole Architecture Enables Low-Latency, High-Energy-Efficiency LLM inference (IBM Research)


A new technical paper titled "Breakthrough low-latency, high-energy-efficiency LLM inference performance using NorthPole" was published by researchers at IBM Research. At the IEEE High Performance Extreme Computing (HPEC) Virtual Conference in September 2024, new performance results for their AIU NorthPole AI inference accelerator chip were presented on a 3-billion-parameter Granite LLM. ... » read more

IBM’s Energy-Efficient NorthPole AI Unit


At this point it is well known that from an energy efficiency standpoint, the biggest bang for the back is to be found at the highest levels of abstraction. Fitting the right architecture to the task at hand i.e., an application specific architecture, will lead to benefits that are hard or impossible to claw back later in the design and implementation flow.  With the huge increase in the inter... » read more

Scalable, Shared-L1-Memory Manycore RISC-V System


A new technical paper titled "MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory" was published by researchers at ETH Zurich and University of Bologna. Abstract: "Shared L1 memory clusters are a common architectural pattern (e.g., in GPGPUs) for building efficient and flexible multi-processing-element (PE) engines. However, it is a common belief that these tightly... » read more

HW Accelerator Architecture for MI Computation With Low Latency, Energy Efficient (MIT)


A new technical paper titled "Efficient Computation of Map-scale Continuous Mutual Information on Chip in Real Time" was published by researchers at MIT. Find the technical paper here. "In this paper, we introduce a new hardware accelerator architecture for MI computation that features a low-latency, energy-efficient MI compute core and an optimized memory subsystem that provides sufficie... » read more

Memory and Energy-Efficient Batch Normalization Hardware


A new technical paper titled "LightNorm: Area and Energy-Efficient Batch Normalization Hardware for On-Device DNN Training" was published by researchers at DGIST (Daegu Gyeongbuk Institute of Science and Technology). The work was supported by Samsung Research Funding Incubation Center. Abstract: "When training early-stage deep neural networks (DNNs), generating intermediate features via con... » read more