Augmented Lagrangian Method

Resources

Main Idea

To solve a Constrained Optimization problem, solve a sequence of problems which add a penalty term and a Lagrange multiplier term to the objective function. The penalty parameter still increases, but by approximating the Lagrange multipliers, it does not need to increase as much. Similarly, by including a (convex) penalty term, we avoid issues if our original problem is non-convex.

Equality Constrained

Consider the equality constrained problem below.

minxf(x)subject to g(x)=0

The Augmented Lagrangian is given as

L(x,λ,μ)=f(x)+λ,g(x)+μ2||g(x)||22.

This resembles both the Lagrangian and the objective function from a quadratic Penalty Method. The unconstrained minimization of this yields the first-order optimality condition (as a vector equation):

xL(x,λ,μ)=xf(x)+λxg+μgxg=0,

or equivalently,

xL(x,λ,μ)=xf+(λ+μg)xg=0.

Yet, the stationarity of the Lagrangian of the original problem requires

xf+λxg=0.

To make these two optimality conditions equivalent, we need that λ+μg=λ (assuming that the Jacobian xg is full rank.) Thus, we update λ accordingly,

λλk+1=λk+μg.

This also gives

g=λλkμ,

as opposed to in the Quadratic Penalty Method where

g=λμ.

Thus, we should satisfy g=0 more closely, asymptotically as μ for both cases, but potentially faster / more generally depending on the convergence of λk to λ.

Inequality Constrained

If we instead have the more general inequality constrained optimization problem given below, there are further nuances.

minxf(x)subject to h(x)0

By introducing a vector of slack variables s, we replace the inequality constraints with equality constraints,

minxf(x)subject to h(x)+s=0,s0.

Taking the new design variable as x~=[x,s]T and defining h~ similarly, we then pose the problem with box constraints to incorporate the positivity requirement on s,

minx~f(x~)subject to h~(x~)=0,lx~u.

The bound-constrained Lagrangian approach only takes the equality constraints (originally from h) into the augmented Lagrangian. Then, the box constraints are handled explicitly in the subproblem,

minx~f(x~)+λ,h~(x~)+μ2||h~(x~)||22subject to lx~u.

This formulation is then very similar to the equality constrained case, but the subproblem must be solved to account for the box constraints. One option to solve the subproblem is with a projected gradient descent. The old package LANCELOT, or the newer GALAHAD uses this approach.

The Unconstrained approach is based on the so-called proximal point approach. The optimization problem can be rewritten as

minxmaxλ0f(x)+λ,h(x)F(x).

If we have h(x)>0 for any component, then the corresponding component of λ becomes , making F(x)=. Otherwise, if all h(x)<0, λ=0, and F(x)=f(x) (which also happens with h(x)=0). However, this F(x) is not smooth, having the jump along h(x)=0. We instead approximate F(x) with

F^(x;λk,μk)=maxλ0f(x)+λ,h(x)12μk||λλk||22.

In this formulation, λ is encouraged to be close to λk, or proximal.
We can rewrite this over individual components of λ, as this problem is separable, and then solve this through the second-derivative test.

maxλi0λihi(x)12μk(λiλik)2={0otherwiseλik+μkhi(x)if λikμk+hi(x)0

For μk>0, we can rewrite this entirely as

λ=ReLU(λk+μkh(x)).

Finally, we plug this into the expression for F^, and minimize over x, before then updating λk and μk accordingly.