Neural inferencing chip NMAX uses less DRAM without performance loss.
At HotChips 2018, Microsoft presented the attached slide in their Brainwave presentation: the ideal is to achieve high hardware utilization at low batch size. Existing architectures don’t do this: they have high utilization only at high batch sizes which means high latency. NMAX’ architecture loads weights quickly achieving almost the same high utilization at batch=1 as at large batch sizes. This is perfect for edge applications. And means that NMAX uses less silicon area to get the same throughput.
Read more here.
Leave a Reply