# Gumbel distribution¶

The Gumbel distribution has enjoyed a fair amount of attention by the machine learning community. It has a number of useful properties, including the Gumbel trick, and has proved very useful in applications such as A* sampling and the Concrete distribution. Here we discuss some of the properties of the Gumbel, which are useful for these applications and beyond.

## Gumbels and exponentials¶

We say that a random variable \(\Gamma\) is Gumbel distributed, with parameter \(\kappa\), if its CDF is

The when \(\kappa = 0\), we call the distribution a standard Gumbel. The Gumbel distribution is closely related to the exponential distribution. We say that a random variable \(\Xi\) is exponentially distributed, with parameter \(\lambda\), if its CDF is

When \(\lambda = 1\), we call the distribution a standard exponential. Now, to draw a standard Gumbel \(\Gamma\) or a standard exponential \(\Xi\) random variable, we can first sample a uniformly distributed variable \(U \sim \mathcal{U}[0, 1]\) and apply the corresponding inverse CDFs to \(U\)

A standard Gumbel random variable is therefore distributed identically to the negative logarithm of a standard exponential random variable

More generally, a Gumbel with parameter \(\kappa\) is identically distributed to an exponential \(\Xi\) with parameter \(\lambda = \log \kappa\) since

## Minimum of exponentials¶

The exponential distribution has numerous interesting properties. One of these properties is that the minimum of \(K\) exponentially distributed random variables \((\Xi_1, \dots, \Xi_K)\) with parameters \((\lambda_1, \dots, \lambda_K)\) respectively, is also exponentially distributed according to

whilst the index of the minimiser is categorically distributed according to

Further, and somewhat remarkably, the random variables \(L\) and \(I\) are independent. We can derive this result directly, following Bach. Let us consider the joint distribution

The distributions of \(I\) and \(M\) therefore factorise and the variables are independent, whilst their marginal distributions are

## The Gumbel trick¶

The Gumbel trick is the same property, packaged in a different way. In particular

where each \(\Gamma_i\) is Gumbel distributed with parameter \(\kappa_i = \log \lambda_i\). Using the fact that

where \(Z_i\) is a standard Gumbel random variable, we arrive at

Therefore, if we suppose \(\pi_{1:K}\) are the probabilities of a categorical

and we take their logarithms \(\kappa_{1:K} = \log \pi_{1:K}\) and define the random variable

where \(Z_i\) is a standard Gumbel, then the \(I\) is distribued according to the categorical