Overview of NMAX Neural Inferencing

Neural inferencing chip NMAX uses less DRAM without performance loss.


At HotChips 2018, Microsoft presented the attached slide in their Brainwave presentation: the ideal is to achieve high hardware utilization at low batch size. Existing architectures don’t do this: they have high utilization only at high batch sizes which means high latency. NMAX’ architecture loads weights quickly achieving almost the same high utilization at batch=1 as at large batch sizes. This is perfect for edge applications. And means that NMAX uses less silicon area to get the same throughput.

Read more here.

Leave a Reply

(Note: This name will be displayed publicly)