Reduced Space vs. Full Space Optimization

Resources

Main Idea

Within PDE Constrained Optimization there are broadly two classes of methods: full and reduced space methods. For a state vector u, design/optimization variables d, an objective function J(u,d), and a PDE residual c(u,d), the general problem is

arg minu,dJ(u,d),s.t.c(u,d)=0.

The PDE problem is well posed; given d, we can uniquely determine u from c. Although multiple d may produce the same value of J, meaning the optimization problem can be ill-posed. In the discretized case, we have uRn, dRm, cRn, and J(u,d)R. With adjoint variables or Lagrange multipliers as λRn, the Lagrangian is

L(u,d,λ)=J(u,d)+λTc(u,d),

and the first-order optimality conditions (KKT Conditions) require that its gradient vanishes, implying

[uLdLλL]=[uJ+[uc]TλdJ+[dc]Tλc]=0.

Now to focus on a few terms:

Full Space methods solve the KKT system directly, optimizing over 2n+m variables, while Reduced Space methods solve the first and third sets of equations before tackling the second set. In other words, they remove u and λ from the optimization problem by solving the original PDE and the adjoint equation, respectively, for a given value of d.

Full Space

Applying a Newton step to the KKT system gives a 2n+m×2n+m matrix system for the update to u,d and λ. This is generally too large for a direct solve, but Iterative Linear Solvers also suffer from ill-conditioning. These methods generally require specialized Preconditioners and may be slow to converge.

Reduced Space

We can rewrite the optimization such that u depends on d, which happens implicitly through c(u,d)=0. Then our problem is

arg mindJ(d),

where J(d)=J(u(d),d), and c(u(d),d)=0 by construction. This option is more clearly dependent on the well-posedness of the forward problem, as we must invert c(,d) to get u(d). With this simplification we can either reduce the original KKT system or apply first order optimality again to get the same result of

\partial_{\mathbf{d}}\mathcal{J} + [\partial_{\mathbf{d}} \mathbf{c}] { #T} \mathbf{\lambda} = 0.

Note also that λ=λ(d), implicitly through the adjoint equation. These methods require full forward (and adjoint) solves far from the optimum. Through the inversion of c(,d) (or strict enforcement of c(u,d)=0), the problem may be more nonlinear in d.