A new technical paper titled “Benchmarking Ultra-Low-Power μNPUs” was published by researchers at Imperial College London and University of Cambridge.
Abstract
“Efficient on-device neural network (NN) inference has various advantages over cloud-based processing, including predictable latency, enhanced privacy, greater reliability, and reduced operating costs for vendors. This has sparked the recent rapid development of microcontroller-scale NN accelerators, often referred to as neural processing units (μNPUs), designed specifically for ultra-low-power applications.
In this paper we present the first comparative evaluation of a number of commercially-available μNPUs, as well as the first independent benchmarks for several of these platforms. We develop and open-source a model compilation framework to enable consistent benchmarking of quantized models across diverse μNPU hardware. Our benchmark targets end-to-end performance and includes model inference latency, power consumption, and memory overhead, alongside other factors. The resulting analysis uncovers both expected performance trends as well as surprising disparities between hardware specifications and actual performance, including μNPUs exhibiting unexpected scaling behaviors with increasing model complexity. Our framework provides a foundation for further evaluation of μNPU platforms alongside valuable insights for both hardware designers and software developers in this rapidly evolving space.”
Find the technical paper here. March 2025.
Millar, Josh, Yushan Huang, Sarab Sethi, Hamed Haddadi, and Anil Madhavapeddy. “Benchmarking Ultra-Low-Power $\mu $ NPUs.” arXiv preprint arXiv:2503.22567 (2025).
Leave a Reply