A new technical paper, "Rethinking Compute Substrates for 3D-Stacked Near-Memory LLM Decoding: Microarchitecture-Scheduling Co-Design," was published by researchers at University of Edinburgh, Peking University, University of Cambridge, University of Chinese Academy of Sciences, and the Hong Kong University of Science and Technology.
Abstract
"Large language model (LLM) decoding is a majo...
» read more