Augmented Lagrangian Method

Resources

Nocedal, Jorge, and Stephen J. Wright. Numerical Optimization. 2nd ed. Springer, 2006.

Penalty Method

Main Idea

To solve a Constrained Optimization problem, solve a sequence of problems which add a penalty term and a Lagrange multiplier term to the objective function. The penalty parameter still increases, but by approximating the Lagrange multipliers, it does not need to increase as much. Similarly, by including a (convex) penalty term, we avoid issues if our original problem is non-convex.

Equality Constrained

Consider the equality constrained problem below.

\begin{matrix} min_{x} f (x) \\ subject to \\ g (x) = 0 \end{matrix}

The Augmented Lagrangian is given as

L (x, λ, μ) = f (x) + ⟨ λ, g (x) ⟩ + \frac{μ}{2} | | g (x) | |_{2}^{2} .

This resembles both the Lagrangian and the objective function from a quadratic Penalty Method. The unconstrained minimization of this yields the first-order optimality condition (as a vector equation):

\nabla_{x} L (x, λ, μ) = \nabla_{x} f (x) + λ \nabla_{x} g + μ g \nabla_{x} g = 0,

or equivalently,

\nabla_{x} L (x, λ, μ) = \nabla_{x} f + (λ + μ g) \nabla_{x} g = 0 .

Yet, the stationarity of the Lagrangian of the original problem requires

\nabla_{x} f + λ^{*} \nabla_{x} g = 0 .

To make these two optimality conditions equivalent, we need that $λ + μ g = λ^{*}$ (assuming that the Jacobian $\nabla_{x} g$ is full rank.) Thus, we update $λ$ accordingly,

λ^{*} \approx λ^{k + 1} = λ^{k} + μ g .

This also gives

g = \frac{λ^{*} - λ^{k}}{μ},

as opposed to in the Quadratic Penalty Method where

g = \frac{λ^{*}}{μ} .

Thus, we should satisfy $g = 0$ more closely, asymptotically as $μ \to \infty$ for both cases, but potentially faster / more generally depending on the convergence of $λ^{k}$ to $λ^{*}$ .

Inequality Constrained

If we instead have the more general inequality constrained optimization problem given below, there are further nuances.

\begin{matrix} min_{x} f (x) \\ subject to \\ h (x) \leq 0 \end{matrix}

By introducing a vector of slack variables $s$ , we replace the inequality constraints with equality constraints,

\begin{matrix} min_{x} f (x) \\ subject to \\ h (x) + s = 0, s \geq 0 . \end{matrix}

Taking the new design variable as $\tilde{x} = {[x, s]}^{T}$ and defining $\tilde{h}$ similarly, we then pose the problem with box constraints to incorporate the positivity requirement on $s$ ,

\begin{matrix} min_{\tilde{x}} f (\tilde{x}) \\ subject to \\ \tilde{h} (\tilde{x}) = 0, \\ l \leq \tilde{x} \leq u . \end{matrix}

The bound-constrained Lagrangian approach only takes the equality constraints (originally from $h$ ) into the augmented Lagrangian. Then, the box constraints are handled explicitly in the subproblem,

\begin{matrix} min_{\tilde{x}} f (\tilde{x}) + ⟨ λ, \tilde{h} (\tilde{x}) ⟩ + \frac{μ}{2} | | \tilde{h} (\tilde{x}) | |_{2}^{2} \\ subject to \\ l \leq \tilde{x} \leq u . \end{matrix}

This formulation is then very similar to the equality constrained case, but the subproblem must be solved to account for the box constraints. One option to solve the subproblem is with a projected gradient descent. The old package LANCELOT, or the newer GALAHAD uses this approach.

The Unconstrained approach is based on the so-called proximal point approach. The optimization problem can be rewritten as

min_{x} \underset{F (x)}{\underset{⏟}{max_{λ \geq 0} f (x) + ⟨ λ, h (x) ⟩}} .

If we have $h (x) > 0$ for any component, then the corresponding component of $λ$ becomes $\infty,$ making $F (x) = \infty$ . Otherwise, if all $h (x) < 0$ , $λ = 0$ , and $F (x) = f (x)$ (which also happens with $h (x) = 0$ ). However, this $F (x)$ is not smooth, having the jump along $h (x) = 0$ . We instead approximate $F (x)$ with

\hat{F} (x; λ^{k}, μ^{k}) = max_{λ \geq 0} f (x) + ⟨ λ, h (x) ⟩ - \frac{1}{2 μ^{k}} | | λ - λ^{k} | |_{2}^{2} .

In this formulation, $λ$ is encouraged to be close to $λ^{k}$ , or proximal.
We can rewrite this over individual components of $λ$ , as this problem is separable, and then solve this through the second-derivative test.

max_{λ_{i} \geq 0} λ_{i} h_{i} (x) - \frac{1}{2 μ^{k}} (λ_{i} - λ_{i}^{k})^{2} = {\begin{cases} 0 & otherwise \\ λ_{i}^{k} + μ^{k} h_{i} (x) & if \frac{λ_{i}^{k}}{μ^{k}} + h_{i} (x) \geq 0 \end{cases}

For $μ^{k} > 0$ , we can rewrite this entirely as

λ = ReLU (λ^{k} + μ^{k} h (x)) .

Finally, we plug this into the expression for $\hat{F}$ , and minimize over $x$ , before then updating $λ^{k}$ and $μ^{k}$ accordingly.

Resources

Related

Main Idea

Equality Constrained

Inequality Constrained