Gumbel distribution¶
The Gumbel distribution has enjoyed a fair amount of attention by the machine learning community. It has a number of useful properties, including the Gumbel trick, and has proved very useful in applications such as A* sampling and the Concrete distribution. Here we discuss some of the properties of the Gumbel, which are useful for these applications and beyond.
Gumbels and exponentials¶
We say that a random variable \(\Gamma\) is Gumbel distributed, with parameter \(\kappa\), if its CDF is
The when \(\kappa = 0\), we call the distribution a standard Gumbel. The Gumbel distribution is closely related to the exponential distribution. We say that a random variable \(\Xi\) is exponentially distributed, with parameter \(\lambda\), if its CDF is
When \(\lambda = 1\), we call the distribution a standard exponential. Now, to draw a standard Gumbel \(\Gamma\) or a standard exponential \(\Xi\) random variable, we can first sample a uniformly distributed variable \(U \sim \mathcal{U}[0, 1]\) and apply the corresponding inverse CDFs to \(U\)
A standard Gumbel random variable is therefore distributed identically to the negative logarithm of a standard exponential random variable
More generally, a Gumbel with parameter \(\kappa\) is identically distributed to an exponential \(\Xi\) with parameter \(\lambda = \log \kappa\) since
Minimum of exponentials¶
The exponential distribution has numerous interesting properties. One of these properties is that the minimum of \(K\) exponentially distributed random variables \((\Xi_1, \dots, \Xi_K)\) with parameters \((\lambda_1, \dots, \lambda_K)\) respectively, is also exponentially distributed according to
whilst the index of the minimiser is categorically distributed according to
Further, and somewhat remarkably, the random variables \(L\) and \(I\) are independent. We can derive this result directly, following Bach. Let us consider the joint distribution
The distributions of \(I\) and \(M\) therefore factorise and the variables are independent, whilst their marginal distributions are
The Gumbel trick¶
The Gumbel trick is the same property, packaged in a different way. In particular
where each \(\Gamma_i\) is Gumbel distributed with parameter \(\kappa_i = \log \lambda_i\). Using the fact that
where \(Z_i\) is a standard Gumbel random variable, we arrive at
Therefore, if we suppose \(\pi_{1:K}\) are the probabilities of a categorical
and we take their logarithms \(\kappa_{1:K} = \log \pi_{1:K}\) and define the random variable
where \(Z_i\) is a standard Gumbel, then the \(I\) is distribued according to the categorical