REDUCT: Keep It Close, Keep It Cool – Scaling DNN Inference on Multi-Core CPUs with Near-Cache Compute

Efficient Scaling of DNN Inference on Multi-core CPUs with Near-Cache Compute


Abstract—”Deep Neural Networks (DNN) are used in a variety of applications and services. With the evolving nature of DNNs, the race to build optimal hardware (both in datacenter and edge) continues. General purpose multi-core CPUs offer unique attractive advantages for DNN inference at both datacenter [60] and edge [71]. Most of the CPU pipeline design complexity is targeted towards optimizing general-purpose single thread performance, and is overkill for relatively simpler, but still hugely important, data parallel DNN inference workloads. Addressing this disparity efficiently can enable both raw performance scaling and overall performance/Watt improvements for multi-core CPU DNN inference.

We present REDUCT, where we build innovative solutions that bypass traditional CPU resources which impact DNN inference power and limit its performance. Fundamentally, REDUCT’s “Keep it close” policy enables consecutive pieces of work to be executed close to each other. REDUCT enables instruction delivery/decode close to execution and instruction execution close to data.”


Find technical paper here.

Technical paper presented at 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture.


Anant Nori (Intel Labs); Rahul Bera (ETH Zurich); Shankar Balachandran, Joydeep Rakshit, Om J Omer (Intel Labs); Avishaii Abuhatzera, Belliappa Kuttanna (Intel); Sreenivas Subramoney (Intel Labs)

Leave a Reply

(Note: This name will be displayed publicly)