Coordinate Descent

Main Idea

In Unconstrained Optimization, coordinate descent is similar to applying a Preconditioner to Gradient Descent (see Newton's Method). This preconditioner sets the descent direction to be along just one coordinate / one entry of the vector to be optimized.

Consider

minxf(x).

Gradient descent iterates as

xk+1=xkηf(xk).

Preconditioning with T (which is ideally the inverse of the Hessian) gives

xk+1=xkηTkf(xk).

Tk is chosen so that

Tkf(xk)=[f(xk)]i,

which is done by Tk=eieiT.