PINNs

Resources

Raissi, M., P. Perdikaris, and G. E. Karniadakis. 2019. “Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations.” Journal of Computational Physics 378 (February):686–707. https://doi.org/10.1016/j.jcp.2018.10.045.
Panahi, Milad, Giovanni Michele Porta, Monica Riva, and Alberto Guadagnini. 2024. “Modelling Parametric Uncertainty in PDEs Models via Physics-Informed Neural Networks.” arXiv. https://doi.org/10.48550/arXiv.2408.04690.
Cho, Woojin, Kookjin Lee, Donsub Rim, and Noseong Park. n.d. “Hypernetwork-Based Meta-Learning for Low-Rank Physics-Informed Neural Networks.”
Yang, Liu, Xuhui Meng, and George Em Karniadakis. 2021. “B-PINNs: Bayesian Physics-Informed Neural Networks for Forward and Inverse PDE Problems with Noisy Data.” Journal of Computational Physics 425 (January):109913. https://doi.org/10.1016/J.JCP.2020.109913.
Kiyani, Elham, Khemraj Shukla, Jorge F. Urbán, Jérôme Darbon, and George Em Karniadakis. 2025. “Which Optimizer Works Best for Physics-Informed Neural Networks and Kolmogorov-Arnold Networks?” arXiv. https://doi.org/10.48550/arXiv.2501.16371.
Wang, Sifan, Ananyae Kumar Bhartari, Bowen Li, and Paris Perdikaris. 2025. “Gradient Alignment in Physics-Informed Neural Networks: A Second-Order Optimization Perspective.” arXiv. https://doi.org/10.48550/arXiv.2502.00604.

Partial Differential Equation Discovery

Main Idea

PINNs are extremely nonlinear (global) Collocation Methods for forward solutions or Inverse Problems in Partial Differential Equations. The number of collocation points is generally unrelated to the number of variables to solve for (the parameters of the neural network), and a loss minimization is used, as opposed to solving a system of equations.

As opposed to some collocation methods which rely on recursive / analytic derivatives of basis functions, PINNs rely on Automatic Differentiation for the construction of their derivatives.

Vanilla PINNs suffer from inexact solution of BCs, ICs, and the Governing PDE. Some of these can be combatted through clever architecture changes, for simple cases.

Training

Recent work suggests that second-order/quasi-newton Optimization or Multi-Objective Optimization is essential for good performance, outweighing many other factors such as architecture and collocation point sampling. Generally to improve performance, this should be the first thing to try.

Exact Periodic BCs

If our domain is $Ω \times T$ = $[- L, L] \times T$ , and we wish for our neural network representing the state, $u^{θ} (x, t)$ to satisfy periodic BCs, by construction, we can use the following trick:

u^{θ} (x, t) = {DNN}^{θ} (v (x), t),

where

v (x) = [\sin (\frac{2 π x}{L}), \cos (\frac{2 π x}{L})] .

We can add higher frequencies to $v$ arbitrarily, scaling the arguments by integers. Then, $v$ constructs a set of $2 \cdot m$ orthogonal basis functions, where each of them satisfies the periodic BCs (equal value, and equal value of all derivatives). This is very similar to a spectral method, where we can construct a matrix $Φ (x)$ , and it's derivatives analytically. We then write the governing PDE in terms of the expansion $u (x) = Φ (x) c$ . We compute the residual by plugging in $u_{x} = Φ_{x} c, u_{x x} = Φ_{x x} c, . . .$ and solve for $c$ . For a linear equation, this a linear system; otherwise, we iteratively the linearized system. We only construct $Φ$ and it's derivatives once, as they just depend on the "mesh" (evaluation points or collocation points of the strong form). Note that the basis functions all satisfy the periodic boundary conditions, so the solution (in the span of the basis functions) must also. In fact, any function composition relying on this $u (x)$ , which does not elsewhere involve $x$ , will have the same fulfillment of periodic BCs. This is essentially what the PINNs method uses.

Exact Initial Conditions

We want $u^{θ} (x, 0) = g (x)$ . Let ${DNN}^{θ} (x, t)$ represent some arbitrary function which will be used in this construction. It could itself have other compositions, such as satisfying periodic boundary conditions, as described above.
Construct

u^{θ} (x, 0) = (1 - ϕ (t)) \cdot {DNN}^{θ} (x, t) + ϕ (t) \cdot g (x),

where $$\phi(t) = \exp\left ( {-\frac{\lambda t}{T}} \right ) \cdot \left ( 1 - \frac{t}{T} \right ),$$
now with $T$ denoting the maximum simulation time, and $λ \geq 0$ . Other forms of $ϕ$ can be used, although the influence of this arbitrary function is shadowed by the tunable parameters $θ$ . Note that $ϕ (0) = 1$ and $ϕ (T) = 0$ . In the construction of $u^{θ}$ , these correspond to $g (x)$ and ${DNN}^{θ} (x, t)$ respectively. We could select $λ$ based on some domain specific knowledge, for example the characteristic time scale in Turbulence, which is the "time required to forget the initial conditions".

Uncertainty

Suppose $μ$ are added as parametric variables to the PDE. How can we solve the PDE for new values of $μ$ ?

Naively build a PINNs solution for each new $μ$ , throwing away previous results.
Take $u^{θ} (x, t; μ)$ and train this now with samples over $μ$ . This implies a sort of continuity or smoothness of $u$ over $μ$ , which may or may not be the case. There is one training over all $μ$ .
Do some sneaky hypernetwork tricks. I.e. have $θ (μ)$ as a guess from some hypernetwork, then detach and use this as the initial guess for regular training. This has been done by having $u^{θ}$ take low rank updates, e.g. $θ = U diag (s) V^{T}$ , and only train $s$ during this 2nd phase. Share all $U$ and $V^{T}$ across $μ$ , but still learn them (with an orthogonality constraint). See “Hypernetwork-Based Meta-Learning for Low-Rank Physics-Informed Neural Networks.” Learning $U$ and $V^{T}$ here can be interpreted as Meta-Learning.

Bayesian PINNs

To facilitate UQ and have a probabilistic version of $u^{θ}$ (and inverse parameters too), we can use B-PINNs. Using Bayesian Methods, these effectively assume a prior distribution over $θ$ , $p (θ)$ , and aim to compute $p (θ | data, residuals)$ . This really only differs from the standard Bayesian methods by including the PDE residuals.

As part of this, we must construct a likelihood function, $p (data, residuals | θ) .$ We assume that the errors on the data and the residuals are independent of one another, and that each instance is itself independently drawn from the same Gaussian Distribution with known standard deviation $σ_{u}$ or $σ_{f}$ for the data and PDE residuals respectively. With this in hand, we can use either Variational Inference (e.g. VAEs) or Markov Chain Monte Carlo to get the posterior distribution. Then, we have samples of parameters from $p (θ | data, residuals)$ that we can use to give predictions for $u^{θ}$ .

For inverse problems, we can just group those parameters with $θ$ and continue as above, similar to how the easy generalization of PINNs to inverse problems.

Todo

Include other PINNs variants: training, architecture, and sampling
Discuss problems in training
Highlight scaling to higher dimensions

Resources

Related

Main Idea

Training

Exact Periodic BCs

Exact Initial Conditions

Uncertainty

Bayesian PINNs