A technical paper titled “Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator Hardware” was published by researchers at Argonne National Laboratory, University of Chicago, and Cerebras Systems.
“The recent trend toward deep learning has led to the development of a variety of highly innovative AI accelerator architectures. One such architecture, the Cerebras Wafer-Scale Engine 2 (WSE-2), features 40 GB of on-chip SRAM, making it a potentially attractive platform for latency- or bandwidth-bound HPC simulation workloads. In this study, we examine the feasibility of performing continuous energy Monte Carlo (MC) particle transport on the WSE-2 by porting a key kernel from the MC transport algorithm to Cerebras’s CSL programming model. New algorithms for minimizing communication costs and for handling load balancing are developed and tested. The WSE-2 is found to run 130 times faster than a highly optimized CUDA version of the kernel run on an NVIDIA A100 GPU — significantly outpacing the expected performance increase given the difference in transistor counts between the architectures.”
Find the technical paper here. Published November 2023 (preprint).
Tramm, John, Bryce Allen, Kazutomo Yoshii, Andrew Siegel, and Leighton Wilson. “Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator Hardware.” arXiv preprint arXiv:2311.01739 (2023).
Related Reading
Partitioning Processors For AI Workloads
General-purpose processing, and lack of flexibility, are far from ideal for AI/ML workloads.
Processor Tradeoffs For AI Workloads
Gaps are widening between technology advances and demands, and closing them is becoming more difficult.
Specialization Vs. Generalization In Processors
What will it take to achieve mass customization at the edge, with high performance and low power.
Leave a Reply