NN-Baton: DNN Workload Orchestration & Chiplet Granularity Exploration for Multichip Accelerators

NN-Baton: DNN Workload Orchestration & Chiplet Granularity Exploration for Multichip Accelerators


“Abstract—The revolution of machine learning poses an unprecedented demand for computation resources, urging more transistors on a single monolithic chip, which is not sustainable in the Post-Moore era. The multichip integration with small functional dies, called chiplets, can reduce the manufacturing cost, improve the fabrication yield, and achieve die-level reuse for different system scales. DNN workload mapping and hardware design space exploration on such multichip systems are critical, but missing in the current stage. This work provides a hierarchical and analytical framework to describe the DNN mapping on a multichip accelerator and analyze the communication overhead. Based on this framework, we propose an automatic tool called NN-Baton with a pre-design flow and a post-design flow. The pre-design flow aims to guide the chiplet granularity exploration with given area and performance budgets for the target workload. The post-design flow focuses on the workload orchestration on different computation levels – package, chiplet, and core – in the hierarchy. Compared to Simba, NN-Baton generates mapping strategies that save 22.5%∼44% energy under the same computation and memory configurations. The architecture exploration demonstrates that area is a decisive factor for the chiplet granularity. For a 2048-MAC system under a 2 mm2 chiplet area constraint, the 4-chiplet implementation with 4 cores and 16 lanes of 8-size vector-MAC is always the top-pick computation allocation across several benchmarks. In contrast, the optimal memory allocation policy in the hierarchy typically depends on the neural network models.”

Source/Authors: Zhanhong Tan (Tsinghua University), Hongyu Cai (Tsinghua University); Runpei Dong (Xi’an Jiaotong University); Kaisheng Ma ((Tsinghua University).

Find technical paper here.

Technical paper presented at 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture.

Leave a Reply

(Note: This name will be displayed publicly)