Novel NorthPole Architecture Enables Low-Latency, High-Energy-Efficiency LLM inference (IBM Research)


A new technical paper titled "Breakthrough low-latency, high-energy-efficiency LLM inference performance using NorthPole" was published by researchers at IBM Research. At the IEEE High Performance Extreme Computing (HPEC) Virtual Conference in September 2024, new performance results for their AIU NorthPole AI inference accelerator chip were presented on a 3-billion-parameter Granite LLM. ... » read more

IBM’s Energy-Efficient NorthPole AI Unit


At this point it is well known that from an energy efficiency standpoint, the biggest bang for the back is to be found at the highest levels of abstraction. Fitting the right architecture to the task at hand i.e., an application specific architecture, will lead to benefits that are hard or impossible to claw back later in the design and implementation flow.  With the huge increase in the inter... » read more