A new technical paper titled "Pushing the Envelope of LLM Inference on AI-PC and Intel GPUs" was published by researcher at Intel.
Abstract
"The advent of ultra-low-bit LLM models (1/1.58/2-bit), which match the perplexity and end-task performance of their full-precision counterparts using the same model size, is ushering in a new era of LLM inference for resource-constrained environments...
» read more