Reduced Space vs. Full Space Optimization

Resources

Biros, George, Volkan Akçelik, Omar Ghattas, Judith Hill, David Keyes, Bart Van, and Bloemen Waanders. n.d. “Parallel Algorithms for PDE-Constrained Optimization.”
Hicken, Jason, and Juan Alonso. 2013. “Comparison of Reduced- and Full-Space Algorithms for PDE-Constrained Optimization.” In 51st AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition. Aerospace Sciences Meetings. American Institute of Aeronautics and Astronautics. https://doi.org/10.2514/6.2013-1043.
Adavani, Santi S., and George Biros. 2008. “Multigrid Algorithms for Inverse Problems with Linear Parabolic PDE Constraints.” SIAM Journal on Scientific Computing 31 (1): 369–97. https://doi.org/10.1137/070687426.

Main Idea

Within PDE Constrained Optimization there are broadly two classes of methods: full and reduced space methods. For a state vector $u$ , design/optimization variables $d$ , an objective function $J (u, d)$ , and a PDE residual $c (u, d)$ , the general problem is

\begin{aligned} \underset{u, d}{arg min} J (u, d), \\ s.t. c (u, d) = 0 . \end{aligned}

The PDE problem is well posed; given $d$ , we can uniquely determine $u$ from $c$ . Although multiple $d$ may produce the same value of $J$ , meaning the optimization problem can be ill-posed. In the discretized case, we have $u \in R^{n}$ , $d \in R^{m}$ , $c \in R^{n}$ , and $J (u, d) \in R$ . With adjoint variables or Lagrange multipliers as $λ \in R^{n}$ , the Lagrangian is

L (u, d, λ) = J (u, d) + λ^{T} c (u, d),

and the first-order optimality conditions (KKT Conditions) require that its gradient vanishes, implying

[\begin{matrix} \partial_{u} L \\ \partial_{d} L \\ \partial_{λ} L \end{matrix}] = [\begin{matrix} \partial_{u} J + [\partial_{u} c]^{T} λ \\ \partial_{d} J + [\partial_{d} c]^{T} λ \\ c \end{matrix}] = 0 .

Now to focus on a few terms:

$[\partial_{u} c]$ is the standard Jacobian of the PDE, which is often used in its solution. This first equation is also known as the Adjoint Equation.
The $c = 0$ condition (Primal Feasibility) simply states that $u$ must satisfy the PDE corresponding to $d$ .
In many cases, $\partial_{d} J$ is $0$ , as $d$ does not explicitly appear in the loss, e.g. in (unregularized) Inverse Problems.
There are $2 n + m$ resulting equations.

Full Space methods solve the KKT system directly, optimizing over $2 n + m$ variables, while Reduced Space methods solve the first and third sets of equations before tackling the second set. In other words, they remove $u$ and $λ$ from the optimization problem by solving the original PDE and the adjoint equation, respectively, for a given value of $d$ .

Full Space

Applying a Newton step to the KKT system gives a $2 n + m \times 2 n + m$ matrix system for the update to $u, d$ and $λ$ . This is generally too large for a direct solve, but Iterative Linear Solvers also suffer from ill-conditioning. These methods generally require specialized Preconditioners and may be slow to converge.

Reduced Space

We can rewrite the optimization such that $u$ depends on $d$ , which happens implicitly through $c (u, d) = 0$ . Then our problem is

\begin{aligned} \underset{d}{arg min} J^{'} (d), \end{aligned}

where $J^{'} (d) = J (u (d), d)$ , and $c (u (d), d) = 0$ by construction. This option is more clearly dependent on the well-posedness of the forward problem, as we must invert $c (∙, d)$ to get $u (d)$ . With this simplification we can either reduce the original KKT system or apply first order optimality again to get the same result of

\partial_{\mathbf{d}}\mathcal{J} + [\partial_{\mathbf{d}} \mathbf{c}] {#T} \mathbf{\lambda} = 0.

Note also that $λ = λ (d)$ , implicitly through the adjoint equation. These methods require full forward (and adjoint) solves far from the optimum. Through the inversion of $c (∙, d)$ (or strict enforcement of $c (u, d) = 0$ ), the problem may be more nonlinear in $d$ .

Resources

Related

Main Idea

Full Space

Reduced Space