Hypernetworks

Resources

Ha, David, Andrew Dai, and Quoc V. Le. 2016. “HyperNetworks.” arXiv. http://arxiv.org/abs/1609.09106.
Chauhan, Vinod Kumar, Jiandong Zhou, Ping Lu, Soheila Molaei, and David A. Clifton. 2024. “A Brief Review of Hypernetworks in Deep Learning.” Artificial Intelligence Review 57 (9): 250. https://doi.org/10.1007/s10462-024-10862-8.

Main Idea

Use one Neural Network to generate parameters for another network. This generation depends on some broader parameter, or even hyperparameter. For instance, let $f^{θ}$ be a neural network (the "main", "target", or "primary" network). A hypernetwork $g$ with parameters $γ$ takes some context vector $c$ and outputs $θ$ :

g^{γ} (c) = θ .

The original training then uses

f^{g^{γ} (c)} = f^{θ},

and optimizes with respect to $γ$ .

As an example, consider the weights of a neural network at layer $j$ as $W_{j}$ . A hypernetwork may map $j \to W_{j}$ (i.e. take the layer number as the context vector), thus allowing the same hypernetwork to be involved in generating weights for multiple layers. From this lens, the weights of different layers come from a common source ("soft weight-sharing"). We probably wouldn't want to actually do either of these examples in practice, but they demonstrate the idea.

The context vector $c$ is quite general. It can describe the architectures/training/task, come from the data, or even just be noise.

Hypernetworks can be used for Uncertainty Quantification. Depending on the problem setup, $c$ can be interpreted as coming from the data (data-conditioned), or it may be seen as a sample of some distribution (noise-conditioned).

TODO: Initialization Problem