Researchers from Harvard University have released “RPU -- A Reasoning Processing Unit”.
Abstract
“Large language model (LLM) inference performance is increasingly bottlenecked by the memory wall. While GPUs continue to scale raw compute throughput, they struggle to deliver scalable performance for memory bandwidth bound workloads. This challenge is amplified by emerging reasonin...
» read more