Main limit theorems

Main limit theorems#

In this chapter we introduce the idea of convergence for random variables, which may be in either of the three senses: (1) in mean-square, (2) in probability or (3) in distribution. These three senses are non-equivalent and in fact one implies the next, in the order given above. We present important theorems involving limits of random variables, such as the law of large numbers, the central limit theorem and the large deviation theorem. We also present the continuity theorems for moment generating functions and characteristic functions.

Convergence in mean-square#

We are often interested in the convergence of a sequence of random variables to another random variable. Unlike real sequences, where convergence has one meaning only, the convergence of a sequence of random variables can be defined in different ways. One such way is convergence in mean-square, as defined below.

Definition 46 (Mean-square convergence)

We say that a sequence of random variables \(X_1, X_2, ...\) converges in mean square to a limit variable \(X\) if

\[\begin{align} \mathbb{E}\left[(X_n - X)^2\right] \to 0 \text{ as } n \to \infty, \end{align}\]

and write this as \(X_n \to X\) in mean square as \(n \to \infty\).

Note that for mean-square convergence to be meaningful, \(\mathbb{E}(X)\) as well as all \(\mathbb{E}\left[(X_n - X)^2\right]\) must exist. We can use this definition to state one version of the mean square law of large numbers as follows.

Theorem 44 (Mean square law of large numbers)

Let \(X_1, X_2, ...\) be a sequence of independent random variables each with mean \(\mu\) and variance \(\sigma^2.\) Then

\[\begin{align} \frac{1}{N}\sum_{n = 1}^N X_n \to \mu \text{ as } n \to \infty \text{ in mean square}. \end{align}\]

Convergence in probability#

A weaker sense in which a sequence of random variables can converge is that of convergence in probability.

Definition 47 (Convergence in probability)

We say that a sequence of random variables \(X_1, X_2, ...\) converges in probability to \(X\) as \(n \to \infty\) if for all \(\epsilon > 0\)

\[\begin{align} \mathbb{P}\left(|X_n - X| > \epsilon\right) \to 0, \text{ as } n \to \infty, \end{align}\]

and write this as \(X_n \to X\) in probability as \(n \to \infty.\)

Convergence in probability is a weaker condition than convergence in mean-square in the sense that the former implies the latter, as stated by the following theorem.

Theorem 45 (Convergence in mean square \(\implies\) convergence in probability)

If \(X_1, X_2, ...\) is a sequence of random variables and \(X_n \to X\) in mean square, then \(X_n \to X\) in probability.

To prove the above we use Chebyshev’s inequality, a useful tool in probability theory, which is presented and proved below.

Theorem 46 (Chebyshev’s inequality)

If \(X\) is a random variable and \(\mathbb{E}\left(X^2\right)\) is finite then

\[\begin{align} \mathbb{P}(|X| \geq t) \leq \frac{\mathbb{E}\left(X^2\right)}{t^2}. \end{align}\]

Just as the law of large numbers can be stated in the mean-square sense, it can also be stated in the weaker sense of convergence in probability.

Theorem 47 (Weak law of large numbers)

Let \(X_1, X_2, ...\) be a sequence of independent random variables each with mean \(\mu\) and variance \(\sigma^2.\) Then

\[\begin{align} \frac{1}{N}\sum_{n = 1}^N X_n \to \mu \text{ as } n \to \infty \text{ in probability.} \end{align}\]

This weak version is a direct implication of the fact that convergence in mean-square implies convergence in probability, as shown in the following proof.

Unlike the mean-square law of large numbers, the weak law holds even when \(\sigma^2\) is infinite, provided that the \(X_n\) all come from the same distribution.

Central limit theorem#

Theorem 48 (Central limit theorem)

Let \(X_1, X_2, \dots\) be a sequence of independent and identically distributed random variables, each with mean \(\mu\) and variance \(\sigma^2.\) Then the random variable

\[\begin{align} Z_N = \frac{S_N - N\mu}{\sqrt{N}\sigma}, \text{ where } S_N = X_1 + X_2 + ... + X_N \end{align}\]

satisfies, as \(N \to \infty\),

\[\begin{align} \mathbb{P}(Z_N \leq z) \to \int^z_{-\infty} \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}u^2} du, \text{ for } x \in \mathbb{R}. \end{align}\]

Theorem 49 (Continuity theorem with mgfs)

Let \(X_1, X_2, ...\) be a sequence of random variables with moment generating functions \(M_1, M_2 ...\) and suppose that as \(n \to \infty\)

\[\begin{align} M_n(t) \to e^{\frac{1}{2}t^2} \text{ for } t \in \mathbb{R} \end{align}\]

Then

\[\begin{align} \mathbb{P}(Z_n \leq z) \to \int^z_{-\infty} \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}u^2} du, \text{ for } x \in \mathbb{R}. \end{align}\]

Large deviations#

We are sometimes interested in the probability that a sum of i.i.d. random variables \(S_N = X_1 + X_2 + ... + X_N\) will deviate from its mean by an amount proportional to \(N\), i.e. \(\mathbb{P}(S_N - \mu N > aN).\) The large deviation theorem relates the probability of a large deviation to a quantity called the Fenchel-Legendre transform, defined below.

Definition 48 (Fenchel-Legendre transform)

Given a random variable \(X\), with moment generating function \(M\), its Fenchel-Legendre transform is

\[\begin{align} \Lambda^*(a) = \sup\left\{at - \Lambda(t) : t \in \mathbb{R}\right\}, \text{ where } a \in \mathbb{R}, \end{align}\]

where \(\Lambda(t) = \log M(t).\)

The large deviation theorem then takes the following form.

Theorem 50 (Large deviation theorem)

Let \(X_1, X_2, ...\) be independent, identically distributed random variables with mean \(0\) and a common generating function \(M(t) = \mathbb{E}(e^{tX})\) which is finite in some neighbourhood \([-\delta, \delta]\) of the origin. Let \(a > 0\) be such that \(\mathbb{P}(X > a) > 0\). Then \(\Lambda^*(a) > 0\) and

\[\begin{align} \frac{1}{N} \log \mathbb{P}(S_N > aN) \to - \Lambda^*(a), \text{ as } N \to \infty. \end{align}\]

The proof of this theorem can be found in Probability and Random Processes[Grimmett and Stirzaker, 2001] (section 5.11), but is ommitted from the book and these notes. Instead we prove below that

\[\begin{align} \frac{1}{N}\log\mathbb{P}(S_N > aN) \leq -\Lambda^*(a). \end{align}\]

Convergence in distribution#

Definition 49 (Convergence in distribution)

The sequence \(X_1, X_2, ...\) is said to converge in distribution, or to converge weakly, to \(X\) as \(n \to \infty\) if

\[\begin{align} \mathbb{P}(X_n \leq x) \to \mathbb{P}(X \leq x), \text{ for any } x \in C, \end{align}\]

where \(C\) is the set of reals at which the distribution function of \(X\) is continuous. If the above holds, we write \(X_n \implies X.\)

Convergence in distribution is a weaker condition than convergence in probability. If the condition \(x \in \mathbb{R}\) was used instead of \(x \in C\) above, there would exist examples of sequences of random variables that would converge in probability but not in distribution. By requiring that the criterion for convergence holds only where \(\mathbb{P}(X \leq x)\) is continuous, we avoid these cases and make convergence in probability a special case of the weaker criterion of convergence in distribution, as stated by the following theorem.

Theorem 51 (Convergence in probability (\implies) convergence in distribution)

Let \(X_1, X_2, ...\) be a sequence of variables and \(X_n \to X\) in probability, then \(X_n \implies X\).

Below are two proofs for this result. The first is a textbook proof from the book,[Grimmett et al., 1986] based on inequalities.

The second is an alternative proof which includes a diagram, that is hopefully more intuitive.

Theorem 52 (Convergence in distribution to \(c\) \(\implies\) convergence in probability to \(c\))

Let \(X_1, X_2, ...\) be a sequence of variables and which coverges in distribution to a constant \(c\). Then \(X_n\) converges to \(c\) in probability also.

Limits of characteristic functions#

Theorem 53 (Continuity theorem with characteristic functions)

Let \(X, X_1, X_2, ...\) be random variables with characteristic functions \(\phi, \phi_1, \phi_2 \dots .\) Then \(X_N \implies X\) as \(N \to \infty\) if and only if

\[\begin{align} \phi_N(t) \to \phi(t) \text{ as } N \to \infty, \text{ for } t \in \mathbb{R}. \end{align}\]

References#

[GWW86]

G. Grimmett, D.J.A. Welsh, and D. Welsh. Probability: An Introduction. Oxford University Press. Clarendon Press, 1986.

[GS01]

G.R. Grimmett and D.R. Stirzaker. Probability and random processes. Number 391. Oxford university press, 2001.