# Moment generating functions¶

Moments are an important tool in the study of random variables. Moment generating functions are a useful tool related to the moments of random variables. Under certain conditions, there is a one-to-one mapping between random variables and moment generating functions. One example use of mgfs is the computation of a sum of independent random variables. Mgfs do not always exist, an issue that is circumvented by characteristic functions which exist for a much broader class of random variables. Two useful inequalities, the Markov and the Jensen inequalities are presented and proved.

## Moments¶

The moments of a random variable contain useful information about it. In fact, under the following technical conditions, the moments uniquely determine the distribution of the random variable.

**Theorem (Uniqueness theorem for moments)** Suppose that all moments \(\mathbb{E}(X^k), k = 1, 2, ...\)
of the random variable \(X\) exist, and that the series

is absolutely convergent for some \(t > 0\). Then the moments uniquely determine the distribution of \(X\).

## Covariance and correlation¶

We are often interested in the extent to which two random variables co-vary, a property that is quantified by the their covariance, as defined below.

**Definition (Covariance)** If \(X\) and \(Y\) are random variables, then their
covariance is denoted \(\text{cov}(X, Y)\) and defined as

whenever these expectations exist.

Clearly, if we multiply \(X\) and \(Y\) by \(a\) and \(b\) respectively, their covariance will change by a factor of \(ab\). We may be interested in a scale-invariant metric of the covariance between two random variables, captured by the correlation coefficient.

**Definition (Correlation coefficient)** If \(X\) and \(Y\) are random variables, then their correlation coefficient is denoted \(\rho(X, Y)\) and defined as

whenever the covariance and variances exist and \(\text{Var}(X)\text{Var}(Y ) \neq 0\).

The correlation coefficient of two random variables has absolute value less than or equal to \(1\), as stated by the following result which is worth bearing in mind.

**Theorem (Correlation between \(-1\) and \(1\))** If \(X\) and \(Y\) are random
variables, then

whenever this correlation exists.

The above result can be shown quickly from an application of the Cauchy-Schwartz inequality stated and proved below.

## Proof: \(-1 \leq \rho(X, Y) \leq 1\)

Given random variables \(X\) and \(Y\), define \(U = X - \bar{X}\) and \(V = Y- \bar{Y}\). By applying the Cauchy-Schwartz inequality on \(U\) and \(V\), we obtain

Taking a square root and substituting for \(U\) and \(V\) we arrive at the result

**Theorem (Cauchy-Schwartz inequality)** If \(U\) and \(V\) are random variables, then

whenever these expectations exist.

## Proof: Cauchy-Schwartz inequality

Let \(s \in \mathbb{R}\) be a real number and \(W = sU + V\) be a random variable. Then \(W^2 \geq 0\) and we have

where \(a = \mathbb{E}(U^2)\), \(b = 2\mathbb{E}(UV)\) and \(\mathbb{E}(V^2)\). Since \(\mathbb{E}(W^2) \geq 0\) holds for all values of \(s\), then the quadratic above can equal zero at most once - because otherwise it would achieve negative values. Therefore we have

from which we arrive at the result

## Moment generating functions¶

Since the moments of a random variable uniquely determine its distribution.

**Definition (Moment generating function)** The moment generating function of
a random variable \(X\), denoted \(M_X\) is defined by

for all \(t \in \mathbb{R}\) for which the expectation exists.

We have the following relation between moments of a random variable and derivatives of its mgf.

**Theorem (Moments equal to derivatives of mgf)** If \(M_X\) exists in a
neighbourhood of \(0\), then \(k = 1, 2, ...\),

the \(k^{th}\) derivative of \(M_X\) at \(t = 0\).

Further, we also have the following useful relation for the mgf of a sum of random variables.

**Theorem (Independence \(\implies\) mgf of sum factorises)** If \(X\) and \(Y\) are
independent random variables, then \(X + Y\) has moment generating function

Intuitively, since the moments of a random variable uniquely determine its distribution, then also a generating function \(M_X(t)\) uniquely determines the distribution of the corresponding random variable \(X\). On an intuitive level this can be seen by noting that \(M_X(t)\) can be rewritten as

so the moments can be determined from the mgf, and the distribution of \(X\) can then be determined from the moments. The following result formalises this intuition.

**Theorem (Uniqueness of mgfs)** If the moment generating function \(M_X(t) = \mathbb{E}(e^{tX}) < \infty\) for all \(t \in [-\delta, \delta]\) for some \(\delta > 0\), there is a unique distribution with mgf \(M_X\). Under this
condition, we have that \(\mathbb{E}(X^k) < \infty\) for \(k = 1, 2, ...\) and

## Examples of MGFS¶

Here are examples of moment generating functions of some common continuous random variables.

### Uniform¶

If \(X\) is uniformly distributed in \([a, b]\), then its mgf is

### Exponential¶

If \(X\) is exponentially distributed with parameter \(\lambda\), then its mgf is

### Normal¶

If \(X\) is normally distributed with parameters \(\mu\), \(\sigma^2 > 0\), then its mgf is

### Cauchy¶

If \(X\) is Cauchy distributed, then it does not have an mgf because the integral

diverges for any \(t \neq 0\). Many other variables do not have mgfs for the same reason, a difficulty that is circumvented by characteristic functions defined below.

### Gamma¶

If \(X\) is gamma distributed with parameters \(w > 0\) and \(\lambda > 0\), then its mgf is

## Markov and Jensen inequalities¶

The Markov inequality is a useful result that bounds the probability that a non-negative random variable is larger than some positive threshold.

**Theorem (Markov inequality)** For any non-negative random variable \(X
: \Omega \to \mathbb{R}\),

## Proof: Markov inequality

For any non-negative random variable \(X(\omega)\) and positive \(t > 0\), we have

where \(\mathbb{1}_{X \geq t} = 1\) if \(X(\omega) \geq t\) and \(\mathbb{1}_{X\geq t} = 0\) otherwise. Rearranging and taking expectations, we obtain

One consequence of the Markov inequality is the Chebyshev inequality

where \(\sigma^2\) is the variance of \(X\). The Markov inequality is useful in proofs involving bounds of probabilities that a variable will fall within a certain range.

Another useful result is Jensen’s inequality, which is handy when working with convex or concave functions.

**Definition (Convex function)** A function \(g : (a, b) \to \mathbb{R}\) is
convex if

for every \(t \in [0, 1]\) and \(u, v \in (a, b)\).

The definition of a concave function is as above, except the inequality sign is flipped. Jensen’s inequality then takes the following form.

**Theorem (Jensen’s inequality)** Let \(X\) be a random variable taking values
in the, possibly infinite, domain \((a, b)\) such that \(\mathbb{E}(X)\) exists
and \(g : (a, b) \to \mathbb{R}\) be a convex function such that \(\mathbb{E
}|g(X)| < \infty\). Then

It can be proved quickly by applying the supporting tangent theorem (see below) and taking an expectation over \(X\).

## Proof: Jensen's inequality

From the supporting tangent theorem we have

and by setting the constant \(w = \mathbb{E}(X)\) and taking an expectation over \(X\), the \(X - w\) term cancels and we obtain Jensen’s inequality

The supporting tangent theorem says that for any point \(w\) in the domain of a convex function \(g\), we can always find a line passing through \((w, g(w))\), which lower-bounds the function.

**Theorem (Supporting tangent theorem)** Let \(g : (a, b) \to \mathbb{R}\) be
convex, and let \(w \in (u, v)\). There exists \(\alpha \in \mathbb{R}\) such that

## Proof: Supporting tangent theorem

Since \(g\) is convex, we have

otherwise \(g\) could not be convex, because \(g(w)\) would be strictly less than the linear interpolation between \(g(u)\) and \(g(v)\) at \(w\). The above inequality holds for all \(u < w < v\), we can maximise the left hand side over \(u\) and the right hand side over \(v\) and obtain \(L_w \leq R_w\), where

we can then take \(\alpha \in [L_w, R_w]\) and see that

By rearranging the two sides of the above equation we obtain

for the cases where \(x = u < w\) and \(x = v > w\) respectively. The inequality holds trivially for \(x = w\).

## Characteristic functions¶

Unlike the moment generating function which might not exist for some random variables, the characteristic function of a random variable, defined below, exists for a broader set of variables.

**Definition (Characteristic function)** The characteristic function of a
random variable \(X\) is written \(\phi_X\) and defined as

The characteristic function has the following two useful properties.

**Theorem (Two properties of characteristic functions)** Let \(X\) and \(Y\) be
independent random variables with characteristic functions \(\phi_X\) and \(\phi_Y\). Then

If \(a, b \in \mathbb{R}\) and \(Z = aX + b\), then \(\phi_Z(t) = e^{itb} \phi_X(at)\).

The characteristic function of \(X + Y\) is \(\phi_{X + Y}(t) = \phi_X(t)\phi_Y(t)\).

## Proof: Properties of the characteristic function

To show the first property, consider

For the second property, consider

where we have used the fact that \(X\) and \(Y\) are independent to get from the second to the third line.

As with the mgf, the characteristic function of a random variable is unique, in the sense that two radoom variables have the same distributions if and only if they have the same characteristic functions.

**Theorem (Uniqueness of characteristic functions)** Let \(X\) and \(Y\) have characteristic functions \(\phi_X\) and \(\phi_Y\). Then \(X\) and \(Y\) have the same distributions if and only if \(\phi_X(t) = \phi_Y(t)\) for all \(\mathbb{R}\).

We can obtain the pdf of a random variable by applying the following inverse transformation.

**Theorem (Inversion theorem)** Let \(X\) have characteristic function \(\phi_X\)
and density function \(f\). Then

at every point \(x\) where \(f\) is differentiable.

Note the similarity between the Fourier transform and the transform of the characteristic function.