Hypernetworks

Resources

Main Idea

Use one Neural Network to generate parameters for another network. This generation depends on some broader parameter, or even hyperparameter. For instance, let fθ be a neural network (the "main", "target", or "primary" network). A hypernetwork g with parameters γ takes some context vector c and outputs θ:

gγ(c)=θ.

The original training then uses

fgγ(c)=fθ,

and optimizes with respect to γ.

As an example, consider the weights of a neural network at layer j as Wj. A hypernetwork may map jWj (i.e. take the layer number as the context vector), thus allowing the same hypernetwork to be involved in generating weights for multiple layers. From this lens, the weights of different layers come from a common source ("soft weight-sharing"). We probably wouldn't want to actually do either of these examples in practice, but they demonstrate the idea.

The context vector c is quite general. It can describe the architectures/training/task, come from the data, or even just be noise.

Hypernetworks can be used for Uncertainty Quantification. Depending on the problem setup, c can be interpreted as coming from the data (data-conditioned), or it may be seen as a sample of some distribution (noise-conditioned).

TODO: Initialization Problem