A new technical paper titled “Kitsune: Enabling Dataflow Execution on GPUs with Spatial Pipelines” was published by researchers at NVIDIA and the University of Wisconsin-Madison.
Abstract
“State-of-the-art DL models are growing in size and complexity, with many modern models also increasing in heterogeneity of behavior. GPUs are still the dominant platform for DL applications, relying on a bulk-synchronous execution model which has many drawbacks and is ill-suited for the graph structure of DL applications. Many industry and academic works attempt to overcome these by employing vertical fusion – combining multiple sequential operations into a single kernel – but this approach still fails to realize three untapped opportunities: (1) the fact that many resources on the GPU are idle while only one operator executes due to temporal multiplexing of the SM; (2) lower energy from more intelligent on-chip data-movement which lends to higher performance in a power-provisioned environment. (3) inability to exploit reduction dimensions as a source of parallelism to ease pressure on batch size. This article explores relatively uncharted territory, answering the following key question: Can modest adjustments to the current GPU architecture enable efficient dataflow execution, thereby circumventing the constraints of vertical fusion without necessitating a clean-slate architecture design. We develop Kitsune – a set of primitives to construct spatial pipelines which enable dataflow execution on GPUs, and an end-to-end compiler based on PyTorch Dynamo. Across 5 challenge applications, Kitsune can provide up to 2.8× and 2.2× performance improvement as well as up to 99% and 45% off-chip traffic reduction for inference and training, respectively.”
Find the technical paper here. December 2025.
Michael Davies, Neal Crago, Karthikeyan Sankaralingam, and Stephen Keckler. 2025. Kitsune: Enabling Dataflow Execution on GPUs with Spatial Pipelines. ACM Trans. Archit. Code Optim. 22, 4, Article 146 (December 2025), 22 pages. https://doi.org/10.1145/3777466
Leave a Reply