Training Large LLM Models With Billions To Trillion Parameters On ORNL’s Frontier Supercomputer


A technical paper titled “Optimizing Distributed Training on Frontier for Large Language Models” was published by researchers at Oak Ridge National Laboratory (ORNL) and Universite Paris-Saclay. Abstract: "Large language models (LLMs) have demonstrated remarkable success as foundational models, benefiting various downstream applications through fine-tuning. Recent studies on loss scaling ... » read more