Home
TECHNICAL PAPERS

Novel NorthPole Architecture Enables Low-Latency, High-Energy-Efficiency LLM inference (IBM Research)

popularity

A new technical paper titled “Breakthrough low-latency, high-energy-efficiency LLM inference performance using NorthPole” was published by researchers at IBM Research.

At the IEEE High Performance Extreme Computing (HPEC) Virtual Conference in September 2024, new performance results for their AIU NorthPole AI inference accelerator chip were presented on a 3-billion-parameter Granite LLM.

Research summary can be found here and link to paper is here.

 

Authors

Rathinakumar Appuswamy†, Michael V. Debole†, Brian Taba, Steven K. Esser, Andrew S. Cassidy, Arnon Amir, Alexander Andreopoulos, Deepika Bablani, Pallab Datta, Jeffrey A. Kusnitz, Nathaniel J. McClatchey, Neil McGlohon, Jeffrey L. McKinstry, Tapan K. Nayak, Daniel F. Smith, Rafael Sousa, Ignacio Terrizzano, Filipp Akopyan, Peter J. Carlson, Rajamohan Gandhasri, Guillaume J. Garreau, Nelson M. Gonzalez, Megumi Ito, Jennifer L. Klamo, Yutaka Nakamura, Carlos Ortega Otero, William P. Risk, Jun Sawada, Kai Schleupen, Jay Sivagnaname, Matthew Stallone, Takanori Ueda, Myron D. Flickner, John V. Arthur, Rameswar Panda, David D. Cox, Dharmendra S. Modha.

IBM Research, *[email protected], †Contributed equally.



Leave a Reply


(Note: This name will be displayed publicly)