A technical paper titled “Failure Tolerant Training with Persistent Memory Disaggregation over CXL” was published (preprint) by researchers at KAIST and Panmnesia.
“TRAININGCXL can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead,” states the paper.
Find the technical paper here. or here (IEEE). Published Jan 2023 (preprint).
arXiv:2301.07492v2. Authors: Miryeong Kwon, Junhyeok Jang, Hanjin Choi, Sangwon Lee, Myoungsoo Jung.
IEEE: Kwon, Miryeong, et al. “Failure Tolerant Training with Persistent Memory Disaggregation over CXL.” IEEE Micro (2023).
Leave a Reply