Predict dynamic behaviors in a physics system in a way that’s computationally efficient and adaptable to a range of scenarios.
By Máté Stodulka and Tomas Zilhao Borges
The demand for immersive, realistic graphics in mobile gaming and AR or VR is pushing the limits of mobile hardware. Achieving lifelike simulations of fluids, cloth, and other materials historically requires intensive mathematical computations. While these traditional methods yield highly accurate results, they have been too resource-heavy to run real-time on mobile. But as mobile hardware advances, Machine Learning (ML) techniques, particularly Graph Neural Networks (GNNs), are emerging as a powerful, efficient alternative to emulate physics on mobile.
GNNs are particularly suited for scenarios where real-world situations can be represented as interactions between related objects. So, by representing each particle as a node and the forces between them as edges, GNNs can be used to predict dynamic behaviors in a physics system. This enables GNNs to approximate traditional methods in a way that’s computationally efficient and adaptable to a range of scenarios, making them promising for resource-constrained mobile devices. The recent launch of TensorFlow GNN offers a streamlined way to design, build and deploy GNNs, providing “ready to wear” architectures and essential tools to define nodes, edges, and interactions.
To assess the feasibility of these simulations on mobile, we evaluate the performance of GNN-based models on today’s state-of-the-art (SOTA) hardware.
With these goals in mind, we focus on two main objectives:
Fig. 1: A graphic representation of a GNN.
GNNs excel at representing data as networks of objects and their interactions. This ability makes GNNs particularly well-suited for applications where data is naturally structured as interconnected entities, for example social networks, recommendation systems, physics simulations, and so on.
GNNs extend the foundational ideas of Convolutional Neural Networks (CNNs) to graph data. While CNNs capture spatial locality in grid-like data (for example, images) through convolutional kernels, GNNs capture structural locality in graph data, allowing for flexible connections represented by sparse adjacency matrices. And like the sliding of convolutional kernels in CNNs, GNNs share weights across edges most commonly through message passing, efficiently capturing patterns across a graph.
At their core, GNNs consist of:
Graphs can be built from both static and dynamic properties.
The development of GNNs has introduced several key milestones in their expressive power:
Message passing is the core of GNNs, allowing nodes to share information. It generally involves:
Each message-passing step covers immediate neighbors, and additional layers capture a wider network context, enabling a comprehensive understanding of graph structures.
Fig. 2: Layers of TF-GNN.
TF-GNN was first published in 2022, with a major update in early 2024. TensorFlow (TF) itself is a mature framework, with support from a great number of platforms and tools. TF-GNN integrates tightly into base TF, with most added data and structures being made up of native TF operators, enabling compatibility with the popular Keras API as well as easy conversion into LiteRT, Google’s mobile friendly version of TensorFlow models.
The TF-GNN library is made up of multiple API levels, as shown in the graphic, enabling multiple levels of fine-tuning with increasing complexity.
Physics simulations traditionally relied on solving the Navier-Stokes partial differential equations (PDEs), offering high accuracy but demanding heavy computational power, which limits real-time applications on mobile devices. As an alternative, Machine Learning (ML) has recently emerged as a faster, less compute-heavy, adaptable solution for physics simulation.
Historically, physics simulation approaches are broadly divided into:
On a “frame of reference” classification level, all methods are roughly divided into two categories: Eulerian (grid-based) and Lagrangian (particle-based), which increases the diversity of implementations.
The paper called Learning to Simulate by DeepMind presents an innovative approach to simulating complex physics scenarios by using a GNN architecture. This approach successfully models various physical environments, for example fluids, rigid bodies, and deformable materials, with results that generalize well across new configurations and types of materials, in 2D and 3D. On the other hand, there were no performance numbers that were provided.
The paper has several novel ideas both on the “data modeling” and on the “GNN architecture” levels. More specifically, the paper’s framework is structured around an Encoder-Processor-Decoder architecture, each stage tailored to handle particle interactions and simulate physical behavior over time.
“Learning to Simulate” also presents a novel set of datasets for various materials, for example, water, sand, viscous substance. Each material’s behavior is being simulated using correct, traditional methods. Each dataset contains thousands of examples, providing both short-term and long-term interaction data, typically with around 200-500 timesteps, and with particles numbering between 1,000 and 2,000.
We appreciate DeepMind’s contributions to the field and have adopted their theoretical approach to using GNNs for physics simulations as well as their provided datasets.
While DeepMind’s original implementation uses an older TensorFlow 1.0 framework, which lacks compatibility with recent libraries, we adapt their architecture to TensorFlow 2, exploring the newly released TF-GNN (TensorFlow Graph Neural Networks) library.
To manage the computational costs associated with creating GraphTensors, we pre-process all data, saving intermediate GraphTensor states in TFRecord format. These pre-processed TFRecords then serve as the datasets for our implementation.
Our implementation largely follows DeepMind’s architecture, with an Encoder-Processor-Decoder separation. The Encoder creates the graph structure from raw data, the Processor (or Core GNN) implements Message Passing, and the Decoder extracts information from the graph. We start with position windows and end up predicting normalized acceleration for each particle.
Fig. 3: Node features.
The Encoder processes particle positions into relevant features in a GraphTensor structure and embeds them into the tfgnn.HIDDEN_STATE vector.
We start with five previous positions for each particle and their current absolute coordinates. This provides historical context and aids in generalization by focusing on relative dynamics instead of absolute positions.
The Encoder derives specific features:
Edges are dynamically generated based on particle proximity (within a specified radius), balancing model complexity with computational efficiency. We replaced KD-Trees with a TF-native approach, achieving significant speed improvements for accelerated hardware while maintaining information fidelity. Edge features include:
These relative features are the only position-based data available to the model; absolute positions are excluded from message-passing to support better generalization. Global context like gravity is scattered to each node as a feature, simplifying Message Passing.
Nodes, edges, and global context are combined into GraphTensors, and node features are embedded into a 128-dimensional tfgnn.HIDDEN_STATE vector, preparing them for message passing.
This is the core-GNN model, made up of multiple Message Passing (MP) layers. MP layers take the state of connected nodes (and edges, if edge features are present), pass them through a basic Multi-Layer Perceptron (MLP), and use that to update the node state. Multiple MP steps are necessary, as the default connectivity radius for particles is low (since edges are expensive), so multiple MPs are required to get data to particles further away. We achieved good results with 8 steps. The MP layers are highly configurable with various hyperparameters. Our choices mostly match the paper or are smaller – given that the model we are aiming for does not have to be as widely generalizable and should be runnable on mobile. We are using TF-GNN’s library-provided MtAlbis GraphUpdate layer, as it’s highly customizable. We set up ours with 128 hidden units and residual connections, but without using attention.
The Decoder mirrors the Encoder, translating the node state (tfgnn.HIDDEN_STATE) back to normalized acceleration values.
The Readout layer takes each particle’s tfgnn.HIDDEN_STATE feature, using a 2-layer MLP with ReLU activation to predict normalized acceleration values for each particle.
The Postprocessor further processes the output of the Readout, ensuring results are correctly scaled for regression tasks.
The model’s training approach aims to simulate particle dynamics accurately over extended time steps by combining both stepwise and rollout evaluation modes (explained below). Stepwise mode is used for the training process while rollout mode is ultimately used for deciding which model is best.
In addition, to assess the model’s real-world accuracy, visual evaluations are conducted using side-by-side matplotlib animations comparing predictions to ground truth.
Stepwise mode uses independent inputs to predict the next position frame-by-frame, allowing for faster training due to easier parallelization and trivial backpropagation.
Fig. 4: The Stepwise model.
Rollout mode, however, simulates real-world scenarios by feeding predictions as inputs for subsequent frames, causing accumulated error and revealing issues like boundary misinterpretation and oscillation near equilibrium (particle at rest).
Fig. 5: The Rollout model.
A rollout-MSE metric, implemented as a callback, generates rollout trajectories during training to better capture cumulative error, although this is slower due to the strictly serial nature of rollout evaluations. The supervised training approach predicts one frame at a time, with accumulated error over long rollouts.
The hyperparameter settings primarily follow recommendations from the DeepMind paper. Beyond the inherent reduction in complexity from using single-material models, further efforts were made to explore alternative hyperparameter options to simplify the model. Using Keras Tuner for hyperparameter search, these simplifications are expected to enhance mobile performance.
Models were trained on a NVIDIA RTX 6000 Ada GPU with 48GB GPU memory.
The training time depends on hyperparameters. For example, to train our model with the best tuned hyperparameters (8 message passing layers, with embedding size of 128, a total of ~700K parameters), it took around 10.5 minutes per epoch. We trained for 50 epochs with early stopping, so a total of around 9 hours of training time.
The gif below is the result of passing one set of positions to the model and running for around 300 frames in rollout mode. You can see that while it does accumulate some error over time, it is still perceived as reasonable movement for liquids.
Fig. 6: Results.
The time-step is learned implicitly and depends on consistent FPS, requiring separate models for different timings. Additional mechanisms and/or data would be needed to mitigate this.
Gravity is not learned directly, as the dataset only includes negative-Y gravity, so it is set as a global input parameter.
Also, while the model theoretically supports multi-material scenarios, it has only been tested with homogeneous materials and requires further validation.
Results for Pixel 9’s CPU-only inference:
Num particles | Inference time per step | Avg. time per rollout of 25 steps | Avg. time per rollout of 300 steps |
32 | ~1 ms | ~100 ms | ~1s |
800 | ~25 ms | ~0.7 ms | ~7s |
Despite the very modest results, there is great room for performance improvement. The current readings use CPU-only inference and with the device’s default delegate (XNNPack). In addition, the model is yet to be quantized and to have its architecture optimized.
On the other hand, we have found some evidence of limitations for GPU delegate usage. There are reported problems with GATHER, RESHAPE and SLICE as TFLite GPU delegate supports only a subset of TFLite and TF ops.
In this blog post we introduced GNN and how to build GNN models using different features of the TF-GNN API. We also shared the results of implementing a model from scratch following DeepMind’s theoretical basis. Based on our results, we recommend Graph Neural Networks for physics simulation workloads.
The TF-GNN API has a steep initial learning curve for using its data layer, especially if there is no data available already in the GraphSchema format. For the layers above the data one, the TF-GNN API is easy to use, with close integration into Keras.
Graph Neural Networks are on the path to becoming more mainstream, with exciting opportunities for the maturation of essential operations across message passing such as scatters, gathers, segmented operations, ragged and dynamic tensors.
Explore more about on-device inference with Real-time low light video enhancement using Neural Networks on mobile.
Tomas Zilhao Borges is a graduate software engineer at Arm.
Leave a Reply