Overview of NMAX Neural Inferencing

At HotChips 2018, Microsoft presented the attached slide in their Brainwave presentation: the ideal is to achieve high hardware utilization at low batch size. Existing architectures don’t do this: they have high utilization only at high batch sizes which means high latency. NMAX’ architecture loads weights quickly achieving almost the same high utilization at batch=1 as at large batch sizes... » read more

AI Chip Architectures Race To The Edge

As machine-learning apps start showing up in endpoint devices and along the network edge of the IoT, the accelerators that make AI possible may look more like FPGA and SoC modules than current data-center-bound chips from Intel or Nvidia. Artificial intelligence and machine learning need powerful chips for computing answers (inference) from large data sets (training). Most AI chips—both tr... » read more