A new technical paper titled "LongSight: Compute-Enabled Memory to Accelerate Large-Context LLMs via Sparse Attention" was published by researchers at Cornell University.
Abstract
"Large input context windows in transformer-based LLMs help minimize hallucinations and improve output accuracy and personalization. However, as the context window grows, the attention phase increasingly dominates...
» read more