{
"cells": [
{
"cell_type": "markdown",
"id": "689f4043-0a55-4ac0-9cc0-8e7542668378",
"metadata": {},
"source": [
"# Moment generating functions\n",
"\n",
"Moments are an important tool in the study of random variables.\n",
"Moment generating functions are a useful tool related to the moments of random variables. Under certain conditions, there is a one-to-one mapping between random variables and moment generating functions.\n",
"One example use of mgfs is the computation of a sum of independent random variables.\n",
"Mgfs do not always exist, an issue that is circumvented by characteristic functions which exist for a much broader class of random variables.\n",
"Two useful inequalities, the Markov and the Jensen inequalities are presented and proved."
]
},
{
"cell_type": "markdown",
"id": "45d67c05-abee-4ed7-8f79-2edcede7ba8f",
"metadata": {},
"source": [
"(prob-intro-moments)=\n",
"## Moments\n",
"\n",
"The moments of a random variable contain useful information about it. In fact, under the following technical conditions, the moments uniquely determine the distribution of the random variable.\n",
"\n",
":::{prf:theorem} Uniqueness theorem for moments\n",
"\n",
"Suppose that all moments $\\mathbb{E}(X^k), k = 1, 2, ...$ of the random variable $X$ exist, and that the series\n",
" \n",
"$$\\begin{align}\n",
"\\sum^\\infty_{k = 0}\\frac{t^k}{k!} \\mathbb{E}(X^k),\n",
"\\end{align}$$\n",
" \n",
"is absolutely convergent for some $t > 0$.\n",
"Then the moments uniquely determine the distribution of $X$.\n",
":::\n",
"\n",
"## Covariance and correlation\n",
"\n",
"We are often interested in the extent to which two random variables co-vary, a property that is quantified by the their covariance, as defined below.\n",
"\n",
":::{prf:definition} Covariance\n",
"\n",
"If $X$ and $Y$ are random variables, then their covariance is denoted $\\text{cov}(X, Y)$ and defined as\n",
"\n",
"$$\\begin{align}\n",
"\\text{cov}(X, Y) = \\mathbb{E}\\left(X - \\mathbb{E}(X)\\right)\\mathbb{E}\\left(Y - \\mathbb{E}(Y)\\right),\n",
"\\end{align}$$\n",
" \n",
"whenever these expectations exist.\n",
":::\n",
"\n",
"Clearly, if we multiply $X$ and $Y$ by $a$ and $b$ respectively, their covariance will change by a factor of $ab$.\n",
"We may be interested in a scale-invariant metric of the covariance between two random variables, captured by the correlation coefficient.\n",
"\n",
":::{prf:definition} Correlation coefficient\n",
"\n",
"If $X$ and $Y$ are random variables, then their correlation coefficient is denoted $\\rho(X, Y)$ and defined as\n",
" \n",
"$$\\begin{align}\n",
"\\rho(X, Y) = \\frac{\\text{cov}(X, Y)}{\\sqrt{\\text{var}(X)\\text{var}(Y)}},\n",
"\\end{align}$$\n",
" \n",
"whenever the covariance and variances exist and $\\text{Var}(X)\\text{Var}(Y) \\neq 0$.\n",
":::\n",
"\n",
"The correlation coefficient of two random variables has absolute value less than or equal to $1$, as stated by the following result which is worth bearing in mind.\n",
"\n",
":::{prf:theorem} Correlation between $-1$ and $1$\n",
"\n",
"If $X$ and $Y$ are random variables, then\n",
" \n",
"$$\\begin{align}\n",
"-1 \\leq \\rho(X, Y) \\leq 1,\n",
"\\end{align}$$\n",
" \n",
"whenever this correlation exists.\n",
":::\n",
"\n",
"The above result can be shown quickly from an application of the Cauchy-Schwartz inequality stated and proved below.\n",
"\n",
":::{dropdown} Proof: Correlation between $-1~$ and $~1$\n",
"\n",
"Given random variables $X$ and $Y$, define $U = X - \\bar{X}$ and $V = Y- \\bar{Y}$. By applying the Cauchy-Schwartz inequality on $U$ and $V$, we obtain\n",
"\n",
"$$\\begin{align}\n",
"\\frac{\\mathbb{E}(UV)^2}{\\mathbb{E}(U^2)\\mathbb{E}(V^2)} \\leq 1.\n",
"\\end{align}$$\n",
"\n",
"Taking a square root and substituting for $U$ and $V$ we arrive at the result \n",
"\n",
"$$\\begin{align}\n",
"-1 \\leq \\rho(X, Y) \\leq 1.\n",
"\\end{align}$$\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "86d94cd6-5c05-4181-924e-03ad3e1a3ce6",
"metadata": {},
"source": [
":::{prf:theorem} Cauchy-Schwartz inequality\n",
"\n",
"If $U$ and $V$ are random variables, then\n",
"\n",
"$$\\begin{align}\n",
"\\mathbb{E}(UV)^2 \\leq \\mathbb{E}(U^2)\\mathbb{E}(V^2),\n",
"\\end{align}$$\n",
"\n",
"whenever these expectations exist.\n",
":::\n",
"\n",
":::{dropdown} Proof: Cauchy-Schwartz inequality\n",
"\n",
"Let $s \\in \\mathbb{R}$ be a real number and $W = sU + V$ be a random variable. Then $W^2 \\geq 0$ and we have \n",
"\n",
"$$\\begin{align}\n",
"\\mathbb{E}(X^2) = a s^2 + b s + c \\geq 0,\n",
"\\end{align}$$\n",
" \n",
"where $a = \\mathbb{E}(U^2)$, $b = 2\\mathbb{E}(UV)$ and $\\mathbb{E}(V^2)$. Since $\\mathbb{E}(W^2) \\geq 0$ holds for all values of $s$, then the quadratic above can equal zero at most once - because otherwise it would achieve negative values. Therefore we have \n",
" \n",
"$$\\begin{align}\n",
"b^2 - 4ac = 4\\mathbb{E}(UV)^2 - 4\\mathbb{E}(U^2)\\mathbb{E}(V^2) \\leq 0,\n",
"\\end{align}$$\n",
" \n",
"from which we arrive at the result\n",
" \n",
"$$\\begin{align}\n",
"\\mathbb{E}(UV)^2 \\leq \\mathbb{E}(U^2)\\mathbb{E}(V^2).\n",
"\\end{align}$$\n",
"\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "ec4c564d-f798-4239-a5f3-0f252820edc5",
"metadata": {},
"source": [
"## Moment generating functions\n",
"\n",
"Since the moments of a random variable uniquely determine its distribution.\n",
"\n",
":::{prf:definition} Moment generating function\n",
"\n",
"The moment generating function of a random variable $X$, denoted $M_X$ is defined by\n",
" \n",
"$$\\begin{align}\n",
"M_X(t) = \\mathbb{E}(e^{tX}),\n",
"\\end{align}$$\n",
" \n",
"for all $t \\in \\mathbb{R}$ for which the expectation exists.\n",
":::\n",
"\n",
"\n",
" \n",
"We have the following relation between moments of a random variable and derivatives of its mgf.\n",
"\n",
":::{prf:theorem} Moments equal to derivatives of mgf\n",
"\n",
"If $M_X$ exists in a neighbourhood of $0$, then $k = 1, 2, \\dots,$ then\n",
" \n",
"$$\\begin{align}\n",
"\\mathbb{E}(X^k) = M_X^{(k)}(0),\n",
"\\end{align}$$\n",
"\n",
"the $k^{th}$ derivative of $M_X$ at $t = 0$.\n",
"\n",
":::\n",
"\n",
"Further, we also have the following useful relation for the mgf of a sum of random variables.\n",
"\n",
":::{prf:theorem} Independence $\\implies$ mgf of sum factorises\n",
"\n",
"If $X$ and $Y$ are independent random variables, then $X + Y$ has moment generating function\n",
" \n",
"$$M_{X + Y}(t) = M_X(t) M_Y(t).$$\n",
":::\n",
"\n",
"Intuitively, since the moments of a random variable uniquely determine its distribution, then also a generating function $M_X(t)$ uniquely determines the distribution of the corresponding random variable $X$.\n",
"On an intuitive level this can be seen by noting that $M_X(t)$ can be rewritten as\n",
"\n",
"$$\\begin{align}\n",
"\\mathbb{E}(e^{tX}) &= \\mathbb{E}\\left[ \\sum_{n = 1}^N \\frac{1}{n!} (tX)^n, \\right]\\\\\n",
" &= \\sum_{n = 1}^N \\frac{t^n}{n!} \\mathbb{E}\\left[X^n\\right],\n",
"\\end{align}$$\n",
"\n",
"so the moments can be determined from the mgf, and the distribution of $X$ can then be determined from the moments. The following result formalises this intuition.\n",
"\n",
":::{prf:theorem} Uniqueness of mgfs\n",
"\n",
"If the moment generating function $M_X(t) = \\mathbb{E}(e^{tX}) < \\infty$ for all $t \\in [-\\delta, \\delta]$ for some $\\delta > 0$, there is a unique distribution with mgf $M_X$.\n",
"Under this condition, we have that $\\mathbb{E}(X^k) < \\infty$ for $k = 1, 2, ...$ and\n",
" \n",
"$$\\begin{align}\n",
"M_X(t) = \\sum^\\infty_{k = 0} \\frac{t^k}{k!} \\mathbb{E}(X^k) \\text{ for } |t\n",
"| < \\delta.\n",
"\\end{align}$$\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "b8675ee4-d7bc-493e-95b7-f3f3b45981dc",
"metadata": {},
"source": [
"## Examples of MGFS\n",
"\n",
"Here are examples of moment generating functions of some common continuous random variables.\n",
"\n",
"### Uniform\n",
"\n",
"If $X$ is uniformly distributed in $[a, b],$ then its mgf is\n",
"\n",
"$$\\begin{align}\n",
"M_X(t) = \\frac{e^{tb} - e^{ta}}{t}.\n",
"\\end{align}$$\n",
"\n",
"### Exponential\n",
"\n",
"If $X$ is exponentially distributed with parameter $\\lambda$, then its mgf is\n",
"\n",
"$$\\begin{align}\n",
"M_X(t) = \\frac{\\lambda}{\\lambda - t}.\n",
"\\end{align}$$\n",
"\n",
"### Normal\n",
"\n",
"If $X$ is normally distributed with parameters $\\mu$, $\\sigma^2 > 0,$ then its mgf is\n",
" \n",
"$$\\begin{align}\n",
"M_X(t) = \\exp\\left(\\mu t + \\frac{\\sigma^2t}{2}\\right).\n",
"\\end{align}$$\n",
"\n",
"### Cauchy\n",
"\n",
"If $X$ is Cauchy distributed, then it does not have an mgf because the integral\n",
"\n",
"$$\\begin{align}\n",
"\\int^\\infty_{-\\infty} \\frac{e^{tx}}{1 + x^2} dx,\n",
"\\end{align}$$\n",
"\n",
"diverges for any $t \\neq 0.$\n",
"Many other variables do not have mgfs for the same reason, a difficulty that is circumvented by characteristic functions defined below.\n",
"\n",
"### Gamma\n",
"\n",
"If $X$ is gamma distributed with parameters $w > 0$ and $\\lambda > 0,$ then its mgf is\n",
"\n",
"$$\\begin{align}\n",
"M_X(t) = \\left(\\frac{\\lambda}{\\lambda - t}\\right)^w.\n",
"\\end{align}$$"
]
},
{
"cell_type": "markdown",
"id": "619ad1da-ecf1-4690-8565-6c8200e07eea",
"metadata": {},
"source": [
"(prob-intro-markov-jensen)=\n",
"## Markov and Jensen inequalities\n",
"\n",
"The Markov inequality is a useful result that bounds the probability that a non-negative random variable is larger than some positive threshold.\n",
"\n",
":::{prf:theorem} Markov inequality\n",
"\n",
"For any non-negative random variable $X: \\Omega \\to \\mathbb{R}$,\n",
" \n",
"$$\\begin{align}\n",
"\\mathbb{P}(X \\geq t) \\leq \\frac{\\mathbb{E}(X)}{t} \\text{ for } t > 0.\n",
"\\end{align}$$\n",
":::\n",
"\n",
":::{dropdown} Proof: Markov inequality\n",
"\n",
"For any non-negative random variable $X(\\omega)$ and positive $t > 0$, we have\n",
" \n",
"$$\\begin{align}\n",
"X(\\omega) \\geq t \\mathbb{1}_{X \\geq t},\n",
"\\end{align}$$\n",
" \n",
"where $\\mathbb{1}_{X \\geq t} = 1$ if $X(\\omega) \\geq t$ and $\\mathbb{1}_{X\\geq t} = 0$\n",
"otherwise. Rearranging and taking expectations, we obtain \n",
"\n",
"$$\\begin{align}\n",
"\\mathbb{P}(X \\geq t) = \\frac{\\mathbb{E}(X)}{t}.\n",
"\\end{align}$$\n",
":::\n",
"\n",
"One consequence of the Markov inequality is the Chebyshev inequality\n",
"\n",
"$$\\begin{align}\n",
"\\mathbb{P}(|X - \\bar{X}| \\geq \\alpha) \\leq \\frac{\\sigma^2}{\\alpha^2}\n",
"\\end{align}$$\n",
"\n",
"where $\\sigma^2$ is the variance of $X$. The Markov inequality is useful in proofs involving bounds of probabilities that a variable will fall within a certain range.\n",
"Another useful result is Jensen's inequality, which is key when working with convex or concave functions.\n",
"\n",
":::{prf:definition} Convex function\n",
"\n",
"A function $g : (a, b) \\to \\mathbb{R}$ is convex if\n",
" \n",
"$$\\begin{align}\n",
"g\\left(tu + (1 - t)v\\right) \\leq t g(u) + (1 - t) g(v),\n",
"\\end{align}$$\n",
"\n",
"for every $t \\in [0, 1]$ and $u, v \\in (a, b)$.\n",
":::\n",
"\n",
"The definition of a concave function is as above, except the inequality sign is flipped.\n",
"Jensen's inequality then takes the following form.\n",
"\n",
":::{prf:theorem} Jensen's inequality\n",
"\n",
"Let $X$ be a random variable taking values in the, possibly infinite, domain $(a, b)$ such that $\\mathbb{E}(X)$ exists and $g : (a, b) \\to \\mathbb{R}$ be a convex function such that $\\mathbb{E}|g(X)| < \\infty$. Then\n",
" \n",
"$$\\begin{align}\n",
"\\mathbb{E}[g(X)] \\geq g[\\mathbb{E}(X)].\n",
"\\end{align}$$\n",
":::\n",
"\n",
"It can be proved quickly by applying the supporting tangent theorem (see below) and taking an expectation over $X$.\n",
"\n",
":::{dropdown} Proof: Jensen's inequality\n",
"\n",
"From the supporting tangent theorem we have\n",
" \n",
"$$\\begin{align}\n",
"g(X) \\geq g(w) + \\alpha (X - w),\n",
"\\end{align}$$\n",
" \n",
"and by setting the constant $w = \\mathbb{E}(X)$ and taking an expectation over $X$, the $X - w$ term cancels and we obtain Jensen's inequality\n",
" \n",
"$$\\begin{align}\n",
"\\mathbb{E}[g(X)] \\geq g(\\mathbb{E}(X)).\n",
"\\end{align}$$\n",
":::\n",
"\n",
"The supporting tangent theorem says that for any point $w$ in the domain of a convex function $g$, we can always find a line passing through $(w, g(w))$, which lower-bounds the function.\n",
"\n",
"\n",
":::{prf:theorem} Supporting tangent theorem\n",
"\n",
"Let $g : (a, b) \\to \\mathbb{R}$ be convex, and let $w \\in (u, v).$\n",
"There exists $\\alpha \\in \\mathbb{R}$ such that\n",
" \n",
"$$\\begin{align}\n",
"g(x) \\geq g(w) + \\alpha (x - w), \\text{ for } x \\in (a, b).\n",
"\\end{align}$$\n",
":::\n",
"\n",
"\n",
":::{dropdown} Proof: Supporting tangent theorem\n",
"\n",
"Since $g$ is convex, we have\n",
"\n",
"$$\\begin{align}\n",
"\\frac{g(w) - g(u)}{w - u} \\leq \\frac{g(v) - g(w)}{v - w},\n",
"\\end{align}$$\n",
"\n",
"otherwise $g$ could not be convex, because $g(w)$ would be strictly less than the linear interpolation between $g(u)$ and $g(v)$ at $w.$\n",
"The above inequality holds for all $u < w < v,$ we can maximise the left hand side over $u$ and the right hand side over $v$ and obtain $L_w \\leq R_w,$ where\n",
" \n",
"$$\\begin{align}\n",
"L_w = \\sup\\left\\{\\frac{g(w) - g(u)}{w - u} : u < w\\right\\}, R_w = \\inf\\left\\{\\frac{g(v) - g(w)}{v - w} : v < w\\right\\}.\n",
"\\end{align}$$\n",
"\n",
"we can then take $\\alpha \\in [L_w, R_w]$ and see that\n",
" \n",
"$$\\begin{align}\n",
"\\frac{g(w) - g(u)}{w - u} \\leq \\alpha \\leq \\frac{g(v) - g(w)}{v- w}.\n",
"\\end{align}$$\n",
" \n",
"By rearranging the two sides of the above equation we obtain\n",
" \n",
"$$\\begin{align}\n",
"g(x) \\geq g(w) + \\alpha (x - w),\n",
"\\end{align}$$\n",
" \n",
"for the cases where $x = u < w$ and $x = v > w$ respectively. The inequality holds trivially for $x = w$.\n",
"\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "28527fc5-0928-4f7d-aa0c-8520c58a3ec6",
"metadata": {},
"source": [
"(prob-intro-char-funcs)=\n",
"## Characteristic functions\n",
"\n",
"Unlike the moment generating function which might not exist for some random variables, the characteristic function of a random variable, defined below, exists for a broader set of variables.\n",
"\n",
":::{prf:definition} Characteristic function\n",
"\n",
"The characteristic function of a random variable $X$ is written $\\phi_X$ and defined as\n",
" \n",
"$$\\begin{align}\n",
"\\phi_X(t) = \\mathbb{E}(e^{itX}), \\text{ for } t \\in \\mathbb{R}.\n",
"\\end{align}$$\n",
":::\n",
"\n",
"The characteristic function has the following two useful properties.\n",
"\n",
":::{prf:theorem} Two properties of characteristic functions\n",
"\n",
"Let $X$ and $Y$ be independent random variables with characteristic functions $\\phi_X$ and $\\phi_Y.$\n",
"Then \n",
" \n",
"1. If $a, b \\in \\mathbb{R}$ and $Z = aX + b$, then $\\phi_Z(t) = e^{itb} \\phi_X(at)$.\n",
"2. The characteristic function of $X + Y$ is $\\phi_{X + Y}(t) = \\phi_X(t)\\phi_Y(t)$.\n",
":::\n",
"\n",
":::{dropdown} Proof: Properties of the characteristic function\n",
"\n",
"To show the first property, consider\n",
"\n",
"$$\\begin{align}\n",
"\\phi_Z(t) &= \\mathbb{E}\\left(e^{itZ}\\right)\\\\\n",
"&= \\mathbb{E}\\left(e^{it(aX + b)}\\right)\\\\\n",
"&= e^{itb} \\mathbb{E}\\left(e^{itaX}\\right)\\\\\n",
"&= e^{itb} \\phi_X(at).\n",
"\\end{align}$$\n",
" \n",
"For the second property, consider\n",
" \n",
"$$\\begin{align}\n",
"\\phi_{X + Y}(t) &= \\mathbb{E}\\left(e^{it(X + Y)}\\right)\\\\\n",
"&= \\mathbb{E}\\left(e^{itX} e^{itY}\\right)\\\\\n",
"&= \\mathbb{E}\\left(e^{itX}\\right)\\mathbb{E}\\left(e^{itY}\\right)\\\\\n",
"&= \\phi_X(t) \\phi_Y(t),\n",
"\\end{align}$$\n",
" \n",
"where we have used the fact that $X$ and $Y$ are independent to get from the second to the third line.\n",
":::\n",
"\n",
"As with the mgf, the characteristic function of a random variable is unique, in the sense that two radoom variables have the same distributions if and only if they have the same characteristic functions.\n",
"\n",
":::{prf:theorem} Uniqueness of characteristic functions\n",
"\n",
"Let $X$ and $Y$ have characteristic functions $\\phi_X$ and $\\phi_Y$. Then $X$ and $Y$ have the same distributions if and only if $\\phi_X(t) = \\phi_Y(t)$ for all $\\mathbb{R}.$\n",
":::\n",
"\n",
"We can obtain the pdf of a random variable by applying the following inverse transformation.\n",
"\n",
":::{prf:theorem} Inversion theorem\n",
"\n",
"Let $X$ have characteristic function $\\phi_X$ and density function $f.$\n",
"Then\n",
" \n",
"$$\\begin{align}\n",
"f(x) = \\frac{1}{2\\pi}\\int^\\infty_{-\\infty} e^{-itx} \\phi(t) dt,\n",
"\\end{align}$$\n",
" \n",
"at every point $x$ where $f$ is differentiable.\n",
":::\n",
"\n",
"Note the similarity between the Fourier transform and the transform of the characteristic function."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "rw",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 5
}