Within Optimization, there are multiple objective functions, and the goal is formulated as
with , and is the feasible set, typically (which can incorporate constraints), and . Denote .
Details
A feasible solution is said to Pareto dominate another solution , if
That is, dominates , if it is at least as good as on every objective function, and strictly better on at least one. A solution is Pareto optimal (or Pareto efficient) if no other solution dominates it:
The Pareto front is . This is bounded by the nadir objective vector, , which is a collection of the worst (greatest) objective values achieved by . Conversely, he ideal objective vector is a collection of the best (least) objective values. Thus, for any (or even ), , in a component-wise sense.
Scalarization
The process of scalarization converts to a scalar-valued function, which is instead optimized. The global criterion scalarization problem considers
where the norm can be any Lebesgue Space norm, e.g. , , . Note that this depends on the scale of the objective functions, so some uniform, dimensionless scale is recommended.
More generally, a scalarization uses a function (potentially with parameters ) to convert to a single-objective problem:
Other methods include "hypervolume/Chebyshev scalarization".
Jacobian Descent
Recalling , the Jacobian of is (the transpose of what stacking gradients would look like). This is a matrix for a given input , with entries , or
A Taylor expansion about with perturbation gives
We define an aggregator function . In other words, converts from the Jacobian to a vector of the same size as the gradient, allowing for a similar update rule to that of Gradient Descent, with ,
For example, when , we take as the transpose operation and recover gradient descent exactly.
We want to decrease across all of its entries. Thus, according to the Taylor expansion, we require
where is a relation for the (natural) Partial Order on (). Take as row of (), and as (which is just a single column). If
then the objective function will increase with this step (according to the first-order Taylor Expansion). This does not always mean that pareto dominates , as if another objective function decreases, this is not the case.
Definition
Let be an aggregator. For any , the aggregator and are nonconflicting if
Let . Consider an matrix . Normalized steepest descent in the space described by gives the update as
With the same restriction that the step decreases the objective function, we require that
can not depend on , so this must hold for any . Thus, we require that is positive semi-definite. The corresponding is nonconflicting for all , as .
In other words, the nonconflicting requirement on and is that of positive semi-definiteness.
Definition
Let be an aggregator. If for all and , , the mapping
is linear, then is linear under scaling.
Here, is some weighting of the rows of , meaning that if a given of changes by a constant factor, it's contribution to (through ) should change similarly.
A weighted aggregator can be written as
for some , for all . This corresponds to the linear scalarization approach.
The Projection of onto the dual cone of the rows of is
We can think of the constraint as a pointed cone in . Adding more rows to can only make this cone more narrow. If is within this cone of , then according to the first-order Taylor Expansion, taking the step should reduce , for all of its entries. The projection is the nearest point such that this happens, or the current point, if it already reduces for all of its entries. It's possible to have a case where the dual cone is just , in which case the projection will always just be . Intuitively, the dual cone is the set within 90 degrees of each row of .