Moment generating functions#
Moments are an important tool in the study of random variables. Moment generating functions are a useful tool related to the moments of random variables. Under certain conditions, there is a one-to-one mapping between random variables and moment generating functions. One example use of mgfs is the computation of a sum of independent random variables. Mgfs do not always exist, an issue that is circumvented by characteristic functions which exist for a much broader class of random variables. Two useful inequalities, the Markov and the Jensen inequalities are presented and proved.
Moments#
The moments of a random variable contain useful information about it. In fact, under the following technical conditions, the moments uniquely determine the distribution of the random variable.
(Uniqueness theorem for moments)
Suppose that all moments \(\mathbb{E}(X^k), k = 1, 2, ...\) of the random variable \(X\) exist, and that the series
is absolutely convergent for some \(t > 0\). Then the moments uniquely determine the distribution of \(X\).
Covariance and correlation#
We are often interested in the extent to which two random variables co-vary, a property that is quantified by the their covariance, as defined below.
(Covariance)
If \(X\) and \(Y\) are random variables, then their covariance is denoted \(\text{cov}(X, Y)\) and defined as
whenever these expectations exist.
Clearly, if we multiply \(X\) and \(Y\) by \(a\) and \(b\) respectively, their covariance will change by a factor of \(ab\). We may be interested in a scale-invariant metric of the covariance between two random variables, captured by the correlation coefficient.
(Correlation coefficient)
If \(X\) and \(Y\) are random variables, then their correlation coefficient is denoted \(\rho(X, Y)\) and defined as
whenever the covariance and variances exist and \(\text{Var}(X)\text{Var}(Y) \neq 0\).
The correlation coefficient of two random variables has absolute value less than or equal to \(1\), as stated by the following result which is worth bearing in mind.
\(-1\) and \(1\))
(Correlation betweenIf \(X\) and \(Y\) are random variables, then
whenever this correlation exists.
The above result can be shown quickly from an application of the Cauchy-Schwartz inequality stated and proved below.
Proof: Correlation between \(-1~\) and \(~1\)
Given random variables \(X\) and \(Y\), define \(U = X - \bar{X}\) and \(V = Y- \bar{Y}\). By applying the Cauchy-Schwartz inequality on \(U\) and \(V\), we obtain
Taking a square root and substituting for \(U\) and \(V\) we arrive at the result
(Cauchy-Schwartz inequality)
If \(U\) and \(V\) are random variables, then
whenever these expectations exist.
Proof: Cauchy-Schwartz inequality
Let \(s \in \mathbb{R}\) be a real number and \(W = sU + V\) be a random variable. Then \(W^2 \geq 0\) and we have
where \(a = \mathbb{E}(U^2)\), \(b = 2\mathbb{E}(UV)\) and \(\mathbb{E}(V^2)\). Since \(\mathbb{E}(W^2) \geq 0\) holds for all values of \(s\), then the quadratic above can equal zero at most once - because otherwise it would achieve negative values. Therefore we have
from which we arrive at the result
Moment generating functions#
Since the moments of a random variable uniquely determine its distribution.
(Moment generating function)
The moment generating function of a random variable \(X\), denoted \(M_X\) is defined by
for all \(t \in \mathbb{R}\) for which the expectation exists.
We have the following relation between moments of a random variable and derivatives of its mgf.
(Moments equal to derivatives of mgf)
If \(M_X\) exists in a neighbourhood of \(0\), then \(k = 1, 2, \dots,\) then
the \(k^{th}\) derivative of \(M_X\) at \(t = 0\).
Further, we also have the following useful relation for the mgf of a sum of random variables.
\(\implies\) mgf of sum factorises)
(IndependenceIf \(X\) and \(Y\) are independent random variables, then \(X + Y\) has moment generating function
Intuitively, since the moments of a random variable uniquely determine its distribution, then also a generating function \(M_X(t)\) uniquely determines the distribution of the corresponding random variable \(X\). On an intuitive level this can be seen by noting that \(M_X(t)\) can be rewritten as
so the moments can be determined from the mgf, and the distribution of \(X\) can then be determined from the moments. The following result formalises this intuition.
(Uniqueness of mgfs)
If the moment generating function \(M_X(t) = \mathbb{E}(e^{tX}) < \infty\) for all \(t \in [-\delta, \delta]\) for some \(\delta > 0\), there is a unique distribution with mgf \(M_X\). Under this condition, we have that \(\mathbb{E}(X^k) < \infty\) for \(k = 1, 2, ...\) and
Examples of MGFS#
Here are examples of moment generating functions of some common continuous random variables.
Uniform#
If \(X\) is uniformly distributed in \([a, b],\) then its mgf is
Exponential#
If \(X\) is exponentially distributed with parameter \(\lambda\), then its mgf is
Normal#
If \(X\) is normally distributed with parameters \(\mu\), \(\sigma^2 > 0,\) then its mgf is
Cauchy#
If \(X\) is Cauchy distributed, then it does not have an mgf because the integral
diverges for any \(t \neq 0.\) Many other variables do not have mgfs for the same reason, a difficulty that is circumvented by characteristic functions defined below.
Gamma#
If \(X\) is gamma distributed with parameters \(w > 0\) and \(\lambda > 0,\) then its mgf is
Markov and Jensen inequalities#
The Markov inequality is a useful result that bounds the probability that a non-negative random variable is larger than some positive threshold.
(Markov inequality)
For any non-negative random variable \(X: \Omega \to \mathbb{R}\),
Proof: Markov inequality
For any non-negative random variable \(X(\omega)\) and positive \(t > 0\), we have
where \(\mathbb{1}_{X \geq t} = 1\) if \(X(\omega) \geq t\) and \(\mathbb{1}_{X\geq t} = 0\) otherwise. Rearranging and taking expectations, we obtain
One consequence of the Markov inequality is the Chebyshev inequality
where \(\sigma^2\) is the variance of \(X\). The Markov inequality is useful in proofs involving bounds of probabilities that a variable will fall within a certain range. Another useful result is Jensen’s inequality, which is key when working with convex or concave functions.
(Convex function)
A function \(g : (a, b) \to \mathbb{R}\) is convex if
for every \(t \in [0, 1]\) and \(u, v \in (a, b)\).
The definition of a concave function is as above, except the inequality sign is flipped. Jensen’s inequality then takes the following form.
(Jensen’s inequality)
Let \(X\) be a random variable taking values in the, possibly infinite, domain \((a, b)\) such that \(\mathbb{E}(X)\) exists and \(g : (a, b) \to \mathbb{R}\) be a convex function such that \(\mathbb{E}|g(X)| < \infty\). Then
It can be proved quickly by applying the supporting tangent theorem (see below) and taking an expectation over \(X\).
Proof: Jensen’s inequality
From the supporting tangent theorem we have
and by setting the constant \(w = \mathbb{E}(X)\) and taking an expectation over \(X\), the \(X - w\) term cancels and we obtain Jensen’s inequality
The supporting tangent theorem says that for any point \(w\) in the domain of a convex function \(g\), we can always find a line passing through \((w, g(w))\), which lower-bounds the function.
(Supporting tangent theorem)
Let \(g : (a, b) \to \mathbb{R}\) be convex, and let \(w \in (u, v).\) There exists \(\alpha \in \mathbb{R}\) such that
Proof: Supporting tangent theorem
Since \(g\) is convex, we have
otherwise \(g\) could not be convex, because \(g(w)\) would be strictly less than the linear interpolation between \(g(u)\) and \(g(v)\) at \(w.\) The above inequality holds for all \(u < w < v,\) we can maximise the left hand side over \(u\) and the right hand side over \(v\) and obtain \(L_w \leq R_w,\) where
we can then take \(\alpha \in [L_w, R_w]\) and see that
By rearranging the two sides of the above equation we obtain
for the cases where \(x = u < w\) and \(x = v > w\) respectively. The inequality holds trivially for \(x = w\).
Characteristic functions#
Unlike the moment generating function which might not exist for some random variables, the characteristic function of a random variable, defined below, exists for a broader set of variables.
(Characteristic function)
The characteristic function of a random variable \(X\) is written \(\phi_X\) and defined as
The characteristic function has the following two useful properties.
(Two properties of characteristic functions)
Let \(X\) and \(Y\) be independent random variables with characteristic functions \(\phi_X\) and \(\phi_Y.\) Then
If \(a, b \in \mathbb{R}\) and \(Z = aX + b\), then \(\phi_Z(t) = e^{itb} \phi_X(at)\).
The characteristic function of \(X + Y\) is \(\phi_{X + Y}(t) = \phi_X(t)\phi_Y(t)\).
Proof: Properties of the characteristic function
To show the first property, consider
For the second property, consider
where we have used the fact that \(X\) and \(Y\) are independent to get from the second to the third line.
As with the mgf, the characteristic function of a random variable is unique, in the sense that two radoom variables have the same distributions if and only if they have the same characteristic functions.
(Uniqueness of characteristic functions)
Let \(X\) and \(Y\) have characteristic functions \(\phi_X\) and \(\phi_Y\). Then \(X\) and \(Y\) have the same distributions if and only if \(\phi_X(t) = \phi_Y(t)\) for all \(\mathbb{R}.\)
We can obtain the pdf of a random variable by applying the following inverse transformation.
(Inversion theorem)
Let \(X\) have characteristic function \(\phi_X\) and density function \(f.\) Then
at every point \(x\) where \(f\) is differentiable.
Note the similarity between the Fourier transform and the transform of the characteristic function.