As a Divergence, the MMD says two distributions are equal if and only if all their moments match. A kernel measures similarity between pairs of samples, so if the similarity for samples taken from within the same distribution is the same as that between distributions, those distributions are the same as one another.
The MMD is also an Integral Probability metric. Formally, for distributions and ( from the generator, as data in Generative Modeling), this is
with
First, the expectations taken with respect to and are ultimately integrals, partly explaining the name of Integral Probability Metrics and reaffirming that MMD is one of these methods. Next, is the class of witness functions or features, where each function maps from the domain of and to (which could just be ?). is an Reproducing Kernel Hilbert Space (RKHS), which has some associated norm (induced by a particular inner product based on the so-called reproducing Kernel). Importantly, we don't need to consider the possibly infinite number of functions in . The supremum occurs at
Given samples of , , and samples of , , both drawn IID, noting potentially , the empirical estimate of the is
We can make new kernels by summing existing ones (similar to the sum of Symmetric Positive Definite matrices). This allows evaluating for a mixture of length scales,
Further, we can even mix kernels of different types, such as