Brandstetter, Johannes, Daniel Worrall, and Max Welling. 2023. “Message Passing Neural PDE Solvers.” arXiv. http://arxiv.org/abs/2202.03376.
Chakraborty, Dibyajyoti, Seung Whan Chung, Ashesh Chattopadhyay, and Romit Maulik. 2024. “Improved Deep Learning of Chaotic Dynamical Systems with Multistep Penalty Losses.” arXiv. http://arxiv.org/abs/2410.05572.
Chen, Yuan, and Dongbin Xiu. 2024. “Learning Stochastic Dynamical System via Flow Map Operator.” Journal of Computational Physics 508 (July): 112984. https://doi.org/10.1016/j.jcp.2024.112984.
Xu, Zhongshu, Yuan Chen, Qifan Chen, and Dongbin Xiu. 2024. “Modeling Unknown Stochastic Dynamical System Via Autoencoder.” Journal of Machine Learning for Modeling and Computing 5 (3). https://doi.org/10.1615/JMachLearnModelComput.2024055773.
Main Idea
A flow maps , such that with some (initial) state, ,
For our purposes, . We often also discretize as , such that . Then, if the flow is implicit, it maps
This inspires the common ResNet architecture, where we parameterize and encode this identity addition as the residual connection.
The Method of Lines constructs , by combining a spatial discretization with and the boundary conditions. Neural ODEs or Ordinary Differential Equation Discovery look for , as opposed to or . In other words, they find a function that must be integrated in time. Thus, just as Partial Differential Equation Discovery may form by discretizing in space, and then integrating in time, Neural ODEs or ODE discovery skip the discretization in space, but still must integrate in time. Flow maps must do neither! Thus, we can change spatial discretization and temporal discretization for PDE discovery. We can change temporal discretization for ODE discovery / Neural ODEs, but we often can not change either for flow map discovery. Interpretability also follows the same order, from most to least interpretable. Generalizability may follow a similar trend too! See Spectrum of Interpretability.
Distributional Shift and the Pushforward Trick
Let be the distribution of "initial conditions" (this can be arbitrary with batches starting at nonzero times). Let be the true distribution of the corresponding "initial conditions" timesteps later. Suppose we train with the following loss:
Fundamentally, the solver maps , where is the pushforward operator for . Subsequent iterations of the solver use samples from , not, as . Thus, when using the solver, we are using samples from a different distribution than what we trained on.
Stochastic FML
Suppose now we have
where is an event space in a probability space. The solution and the right-hand side is unknown.
We can decompose the search for into deterministic and stochastic parts and . We train as a standard deterministic flow map, where it should then approximate the mean. Then, we take samples from some distribution with a selected stochastic dimension, and use this as an input to . We could train as a Generative Adversarial Network.
Another alternative is to use an VAE. We will train and In particular, splitting the data into pairs and for a latent variable ,
and
We ideally want the decoder to handle all the influence of , or in other words, should be independent of (but this is the contradiction that info VAE addresses).