{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "689f4043-0a55-4ac0-9cc0-8e7542668378",
   "metadata": {},
   "source": [
    "# Moment generating functions\n",
    "\n",
    "Moments are an important tool in the study of random variables.\n",
    "Moment generating functions are a useful tool related to the moments of random variables. Under certain conditions, there is a one-to-one mapping between random variables and moment generating functions.\n",
    "One example use of mgfs is the computation of a sum of independent random variables.\n",
    "Mgfs do not always exist, an issue that is circumvented by characteristic functions which exist for a much broader class of random variables.\n",
    "Two useful inequalities, the Markov and the Jensen inequalities are presented and proved."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45d67c05-abee-4ed7-8f79-2edcede7ba8f",
   "metadata": {},
   "source": [
    "(prob-intro-moments)=\n",
    "## Moments\n",
    "\n",
    "The moments of a random variable contain useful information about it. In fact, under the following technical conditions, the moments uniquely determine the distribution of the random variable.\n",
    "\n",
    ":::{prf:theorem} Uniqueness theorem for moments\n",
    "\n",
    "Suppose that all moments $\\mathbb{E}(X^k), k = 1, 2, ...$ of the random variable $X$ exist, and that the series\n",
    " \n",
    "$$\\begin{align}\n",
    "\\sum^\\infty_{k = 0}\\frac{t^k}{k!} \\mathbb{E}(X^k),\n",
    "\\end{align}$$\n",
    " \n",
    "is absolutely convergent for some $t > 0$.\n",
    "Then the moments uniquely determine the distribution of $X$.\n",
    ":::\n",
    "\n",
    "## Covariance and correlation\n",
    "\n",
    "We are often interested in the extent to which two random variables co-vary, a property that is quantified by the their covariance, as defined below.\n",
    "\n",
    ":::{prf:definition} Covariance\n",
    "\n",
    "If $X$ and $Y$ are random variables, then their covariance is denoted $\\text{cov}(X, Y)$ and defined as\n",
    "\n",
    "$$\\begin{align}\n",
    "\\text{cov}(X, Y) = \\mathbb{E}\\left(X - \\mathbb{E}(X)\\right)\\mathbb{E}\\left(Y - \\mathbb{E}(Y)\\right),\n",
    "\\end{align}$$\n",
    " \n",
    "whenever these expectations exist.\n",
    ":::\n",
    "\n",
    "Clearly, if we multiply $X$ and $Y$ by $a$ and $b$ respectively, their covariance will change by a factor of $ab$.\n",
    "We may be interested in a scale-invariant  metric of the covariance between two random variables, captured by the correlation coefficient.\n",
    "\n",
    ":::{prf:definition} Correlation coefficient\n",
    "\n",
    "If $X$ and $Y$ are random variables, then their correlation coefficient is denoted $\\rho(X, Y)$ and defined as\n",
    " \n",
    "$$\\begin{align}\n",
    "\\rho(X, Y) = \\frac{\\text{cov}(X, Y)}{\\sqrt{\\text{var}(X)\\text{var}(Y)}},\n",
    "\\end{align}$$\n",
    " \n",
    "whenever the covariance and variances exist and $\\text{Var}(X)\\text{Var}(Y) \\neq 0$.\n",
    ":::\n",
    "\n",
    "The correlation coefficient of two random variables has absolute value less than or equal to $1$, as stated by the following result which is worth bearing in mind.\n",
    "\n",
    ":::{prf:theorem} Correlation between $-1$ and $1$\n",
    "\n",
    "If $X$ and $Y$ are random variables, then\n",
    " \n",
    "$$\\begin{align}\n",
    "-1 \\leq \\rho(X, Y) \\leq 1,\n",
    "\\end{align}$$\n",
    " \n",
    "whenever this correlation exists.\n",
    ":::\n",
    "\n",
    "The above result can be shown quickly from an application of the Cauchy-Schwartz inequality stated and proved below.\n",
    "\n",
    ":::{dropdown} Proof: Correlation between $-1~$ and $~1$\n",
    "\n",
    "Given random variables $X$ and $Y$, define $U = X - \\bar{X}$ and $V = Y- \\bar{Y}$. By applying the Cauchy-Schwartz inequality on $U$ and $V$, we obtain\n",
    "\n",
    "$$\\begin{align}\n",
    "\\frac{\\mathbb{E}(UV)^2}{\\mathbb{E}(U^2)\\mathbb{E}(V^2)} \\leq 1.\n",
    "\\end{align}$$\n",
    "\n",
    "Taking a square root and substituting for $U$ and $V$ we arrive at the result \n",
    "\n",
    "$$\\begin{align}\n",
    "-1 \\leq \\rho(X, Y) \\leq 1.\n",
    "\\end{align}$$\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86d94cd6-5c05-4181-924e-03ad3e1a3ce6",
   "metadata": {},
   "source": [
    ":::{prf:theorem} Cauchy-Schwartz inequality\n",
    "\n",
    "If $U$ and $V$ are random variables, then\n",
    "\n",
    "$$\\begin{align}\n",
    "\\mathbb{E}(UV)^2 \\leq \\mathbb{E}(U^2)\\mathbb{E}(V^2),\n",
    "\\end{align}$$\n",
    "\n",
    "whenever these expectations exist.\n",
    ":::\n",
    "\n",
    ":::{dropdown} Proof: Cauchy-Schwartz inequality\n",
    "\n",
    "Let $s \\in \\mathbb{R}$ be a real number and $W = sU + V$ be a random variable. Then $W^2 \\geq 0$ and we have \n",
    "\n",
    "$$\\begin{align}\n",
    "\\mathbb{E}(X^2) = a s^2 + b s + c \\geq 0,\n",
    "\\end{align}$$\n",
    "    \n",
    "where $a = \\mathbb{E}(U^2)$, $b = 2\\mathbb{E}(UV)$ and $\\mathbb{E}(V^2)$. Since $\\mathbb{E}(W^2) \\geq 0$ holds for all values of $s$, then the quadratic above can equal zero at most once - because otherwise it would achieve negative values. Therefore we have \n",
    "    \n",
    "$$\\begin{align}\n",
    "b^2 - 4ac = 4\\mathbb{E}(UV)^2 - 4\\mathbb{E}(U^2)\\mathbb{E}(V^2) \\leq 0,\n",
    "\\end{align}$$\n",
    "    \n",
    "from which we arrive at the result\n",
    "    \n",
    "$$\\begin{align}\n",
    "\\mathbb{E}(UV)^2 \\leq \\mathbb{E}(U^2)\\mathbb{E}(V^2).\n",
    "\\end{align}$$\n",
    "\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec4c564d-f798-4239-a5f3-0f252820edc5",
   "metadata": {},
   "source": [
    "## Moment generating functions\n",
    "\n",
    "Since the moments of a random variable uniquely determine its distribution.\n",
    "\n",
    ":::{prf:definition} Moment generating function\n",
    "\n",
    "The moment generating function of a random variable $X$, denoted $M_X$ is defined by\n",
    " \n",
    "$$\\begin{align}\n",
    "M_X(t) = \\mathbb{E}(e^{tX}),\n",
    "\\end{align}$$\n",
    " \n",
    "for all $t \\in \\mathbb{R}$ for which the expectation exists.\n",
    ":::\n",
    "\n",
    "\n",
    "  \n",
    "We have the following relation between moments of a random variable and derivatives of its mgf.\n",
    "\n",
    ":::{prf:theorem} Moments equal to derivatives of mgf\n",
    "\n",
    "If $M_X$ exists in a neighbourhood of $0$, then $k = 1, 2, \\dots,$ then\n",
    " \n",
    "$$\\begin{align}\n",
    "\\mathbb{E}(X^k) = M_X^{(k)}(0),\n",
    "\\end{align}$$\n",
    "\n",
    "the $k^{th}$ derivative of $M_X$ at $t = 0$.\n",
    "\n",
    ":::\n",
    "\n",
    "Further, we also have the following useful relation for the mgf of a sum of random variables.\n",
    "\n",
    ":::{prf:theorem} Independence $\\implies$ mgf of sum factorises\n",
    "\n",
    "If $X$ and $Y$ are independent random variables, then $X + Y$ has moment generating function\n",
    " \n",
    "$$M_{X + Y}(t) = M_X(t) M_Y(t).$$\n",
    ":::\n",
    "\n",
    "Intuitively, since the moments of a random variable uniquely determine its distribution, then also a generating function $M_X(t)$ uniquely determines the distribution of the corresponding random variable $X$.\n",
    "On an intuitive level this can be seen by noting that $M_X(t)$ can be rewritten as\n",
    "\n",
    "$$\\begin{align}\n",
    "\\mathbb{E}(e^{tX}) &= \\mathbb{E}\\left[ \\sum_{n = 1}^N \\frac{1}{n!} (tX)^n, \\right]\\\\\n",
    "                   &=  \\sum_{n = 1}^N \\frac{t^n}{n!} \\mathbb{E}\\left[X^n\\right],\n",
    "\\end{align}$$\n",
    "\n",
    "so the moments can be determined from the mgf, and the distribution of $X$ can then be determined from the moments. The following result formalises this intuition.\n",
    "\n",
    ":::{prf:theorem} Uniqueness of mgfs\n",
    "\n",
    "If the moment generating function $M_X(t) = \\mathbb{E}(e^{tX}) < \\infty$ for all $t \\in [-\\delta, \\delta]$ for some $\\delta > 0$, there is a unique distribution with mgf $M_X$.\n",
    "Under this condition, we have that $\\mathbb{E}(X^k) < \\infty$ for $k = 1, 2, ...$ and\n",
    " \n",
    "$$\\begin{align}\n",
    "M_X(t) = \\sum^\\infty_{k = 0} \\frac{t^k}{k!} \\mathbb{E}(X^k) \\text{ for } |t\n",
    "| < \\delta.\n",
    "\\end{align}$$\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8675ee4-d7bc-493e-95b7-f3f3b45981dc",
   "metadata": {},
   "source": [
    "## Examples of MGFS\n",
    "\n",
    "Here are examples of moment generating functions of some common continuous random variables.\n",
    "\n",
    "### Uniform\n",
    "\n",
    "If $X$ is uniformly distributed in $[a, b],$ then its mgf is\n",
    "\n",
    "$$\\begin{align}\n",
    "M_X(t) = \\frac{e^{tb} - e^{ta}}{t}.\n",
    "\\end{align}$$\n",
    "\n",
    "### Exponential\n",
    "\n",
    "If $X$ is exponentially distributed with parameter $\\lambda$, then its mgf is\n",
    "\n",
    "$$\\begin{align}\n",
    "M_X(t) = \\frac{\\lambda}{\\lambda - t}.\n",
    "\\end{align}$$\n",
    "\n",
    "### Normal\n",
    "\n",
    "If $X$ is normally distributed with parameters $\\mu$, $\\sigma^2 > 0,$ then its mgf is\n",
    " \n",
    "$$\\begin{align}\n",
    "M_X(t) = \\exp\\left(\\mu t + \\frac{\\sigma^2t}{2}\\right).\n",
    "\\end{align}$$\n",
    "\n",
    "### Cauchy\n",
    "\n",
    "If $X$ is Cauchy distributed, then it does not have an mgf because the integral\n",
    "\n",
    "$$\\begin{align}\n",
    "\\int^\\infty_{-\\infty} \\frac{e^{tx}}{1 + x^2} dx,\n",
    "\\end{align}$$\n",
    "\n",
    "diverges for any $t \\neq 0.$\n",
    "Many other variables do not have mgfs for the same reason, a difficulty that is circumvented by characteristic functions defined below.\n",
    "\n",
    "### Gamma\n",
    "\n",
    "If $X$ is gamma distributed with parameters $w > 0$ and $\\lambda > 0,$ then its mgf is\n",
    "\n",
    "$$\\begin{align}\n",
    "M_X(t) = \\left(\\frac{\\lambda}{\\lambda - t}\\right)^w.\n",
    "\\end{align}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "619ad1da-ecf1-4690-8565-6c8200e07eea",
   "metadata": {},
   "source": [
    "(prob-intro-markov-jensen)=\n",
    "## Markov and Jensen inequalities\n",
    "\n",
    "The Markov inequality is a useful result that bounds the probability that a non-negative random variable is larger than some positive threshold.\n",
    "\n",
    ":::{prf:theorem} Markov inequality\n",
    "\n",
    "For any non-negative random variable $X: \\Omega \\to \\mathbb{R}$,\n",
    " \n",
    "$$\\begin{align}\n",
    "\\mathbb{P}(X \\geq t) \\leq \\frac{\\mathbb{E}(X)}{t} \\text{ for } t > 0.\n",
    "\\end{align}$$\n",
    ":::\n",
    "\n",
    ":::{dropdown} Proof: Markov inequality\n",
    "\n",
    "For any non-negative random variable $X(\\omega)$ and positive $t > 0$, we have\n",
    "    \n",
    "$$\\begin{align}\n",
    "X(\\omega) \\geq t \\mathbb{1}_{X \\geq t},\n",
    "\\end{align}$$\n",
    "    \n",
    "where $\\mathbb{1}_{X \\geq t} = 1$ if $X(\\omega) \\geq t$ and $\\mathbb{1}_{X\\geq t} = 0$\n",
    "otherwise. Rearranging and taking expectations, we obtain \n",
    "\n",
    "$$\\begin{align}\n",
    "\\mathbb{P}(X \\geq t) = \\frac{\\mathbb{E}(X)}{t}.\n",
    "\\end{align}$$\n",
    ":::\n",
    "\n",
    "One consequence of the Markov inequality is the Chebyshev inequality\n",
    "\n",
    "$$\\begin{align}\n",
    "\\mathbb{P}(|X - \\bar{X}| \\geq \\alpha) \\leq \\frac{\\sigma^2}{\\alpha^2}\n",
    "\\end{align}$$\n",
    "\n",
    "where $\\sigma^2$ is the variance of $X$. The Markov inequality is useful in proofs involving bounds of probabilities that a variable will fall within a certain range.\n",
    "Another useful result is Jensen's inequality, which is key when working with convex or concave functions.\n",
    "\n",
    ":::{prf:definition} Convex function\n",
    "\n",
    "A function $g : (a, b) \\to \\mathbb{R}$ is convex if\n",
    " \n",
    "$$\\begin{align}\n",
    "g\\left(tu + (1 - t)v\\right) \\leq t g(u) + (1 - t) g(v),\n",
    "\\end{align}$$\n",
    "\n",
    "for every $t \\in [0, 1]$ and $u, v \\in (a, b)$.\n",
    ":::\n",
    "\n",
    "The definition of a concave function is as above, except the inequality sign is flipped.\n",
    "Jensen's inequality then takes the following form.\n",
    "\n",
    ":::{prf:theorem} Jensen's inequality\n",
    "\n",
    "Let $X$ be a random variable taking values in the, possibly infinite, domain $(a, b)$ such that $\\mathbb{E}(X)$ exists and $g : (a, b) \\to \\mathbb{R}$ be a convex function such that $\\mathbb{E}|g(X)| < \\infty$. Then\n",
    "  \n",
    "$$\\begin{align}\n",
    "\\mathbb{E}[g(X)] \\geq g[\\mathbb{E}(X)].\n",
    "\\end{align}$$\n",
    ":::\n",
    "\n",
    "It can be proved quickly by applying the supporting tangent theorem (see below) and taking an expectation over $X$.\n",
    "\n",
    ":::{dropdown} Proof: Jensen's inequality\n",
    "\n",
    "From the supporting tangent theorem we have\n",
    "    \n",
    "$$\\begin{align}\n",
    "g(X) \\geq g(w) + \\alpha (X - w),\n",
    "\\end{align}$$\n",
    "    \n",
    "and by setting the constant $w = \\mathbb{E}(X)$ and taking an expectation over $X$, the $X - w$ term cancels and we obtain Jensen's inequality\n",
    "    \n",
    "$$\\begin{align}\n",
    "\\mathbb{E}[g(X)] \\geq g(\\mathbb{E}(X)).\n",
    "\\end{align}$$\n",
    ":::\n",
    "\n",
    "The supporting tangent theorem says that for any point $w$ in the domain of a convex function $g$, we can always find a line passing through $(w, g(w))$, which lower-bounds the function.\n",
    "\n",
    "\n",
    ":::{prf:theorem} Supporting tangent theorem\n",
    "\n",
    "Let $g : (a, b) \\to \\mathbb{R}$ be convex, and let $w \\in (u, v).$\n",
    "There exists $\\alpha \\in \\mathbb{R}$ such that\n",
    " \n",
    "$$\\begin{align}\n",
    "g(x) \\geq g(w) + \\alpha (x - w), \\text{ for } x \\in (a, b).\n",
    "\\end{align}$$\n",
    ":::\n",
    "\n",
    "\n",
    ":::{dropdown} Proof: Supporting tangent theorem\n",
    "\n",
    "Since $g$ is convex, we have\n",
    "\n",
    "$$\\begin{align}\n",
    "\\frac{g(w) - g(u)}{w - u} \\leq \\frac{g(v) - g(w)}{v - w},\n",
    "\\end{align}$$\n",
    "\n",
    "otherwise $g$ could not be convex, because $g(w)$ would be strictly less than the linear interpolation between $g(u)$ and $g(v)$ at $w.$\n",
    "The above inequality holds for all $u < w < v,$ we can maximise the left hand side over $u$ and the right hand side over $v$ and obtain $L_w \\leq R_w,$ where\n",
    "    \n",
    "$$\\begin{align}\n",
    "L_w = \\sup\\left\\{\\frac{g(w) - g(u)}{w - u} : u < w\\right\\}, R_w = \\inf\\left\\{\\frac{g(v) - g(w)}{v - w} : v < w\\right\\}.\n",
    "\\end{align}$$\n",
    "\n",
    "we can then take $\\alpha \\in [L_w, R_w]$ and see that\n",
    "    \n",
    "$$\\begin{align}\n",
    "\\frac{g(w) - g(u)}{w - u} \\leq \\alpha \\leq \\frac{g(v) - g(w)}{v- w}.\n",
    "\\end{align}$$\n",
    "    \n",
    "By rearranging the two sides of the above equation we obtain\n",
    "    \n",
    "$$\\begin{align}\n",
    "g(x) \\geq g(w) + \\alpha (x - w),\n",
    "\\end{align}$$\n",
    "    \n",
    "for the cases where $x = u < w$ and $x = v > w$ respectively. The inequality holds trivially for $x = w$.\n",
    "\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28527fc5-0928-4f7d-aa0c-8520c58a3ec6",
   "metadata": {},
   "source": [
    "(prob-intro-char-funcs)=\n",
    "## Characteristic functions\n",
    "\n",
    "Unlike the moment generating function which might not exist for some random variables, the characteristic function of a random variable, defined below, exists for a broader set of variables.\n",
    "\n",
    ":::{prf:definition} Characteristic function\n",
    "\n",
    "The characteristic function of a random variable $X$ is written $\\phi_X$ and defined as\n",
    " \n",
    "$$\\begin{align}\n",
    "\\phi_X(t) = \\mathbb{E}(e^{itX}), \\text{ for } t \\in \\mathbb{R}.\n",
    "\\end{align}$$\n",
    ":::\n",
    "\n",
    "The characteristic function has the following two useful properties.\n",
    "\n",
    ":::{prf:theorem} Two properties of characteristic functions\n",
    "\n",
    "Let $X$ and $Y$ be independent random variables with characteristic functions $\\phi_X$ and $\\phi_Y.$\n",
    "Then \n",
    " \n",
    "1. If $a, b \\in \\mathbb{R}$ and $Z = aX + b$, then $\\phi_Z(t) = e^{itb} \\phi_X(at)$.\n",
    "2. The characteristic function of $X + Y$ is $\\phi_{X + Y}(t) = \\phi_X(t)\\phi_Y(t)$.\n",
    ":::\n",
    "\n",
    ":::{dropdown} Proof: Properties of the characteristic function\n",
    "\n",
    "To show the first property, consider\n",
    "\n",
    "$$\\begin{align}\n",
    "\\phi_Z(t) &= \\mathbb{E}\\left(e^{itZ}\\right)\\\\\n",
    "&= \\mathbb{E}\\left(e^{it(aX + b)}\\right)\\\\\n",
    "&= e^{itb} \\mathbb{E}\\left(e^{itaX}\\right)\\\\\n",
    "&= e^{itb} \\phi_X(at).\n",
    "\\end{align}$$\n",
    "    \n",
    "For the second property, consider\n",
    "    \n",
    "$$\\begin{align}\n",
    "\\phi_{X + Y}(t) &= \\mathbb{E}\\left(e^{it(X + Y)}\\right)\\\\\n",
    "&= \\mathbb{E}\\left(e^{itX} e^{itY}\\right)\\\\\n",
    "&= \\mathbb{E}\\left(e^{itX}\\right)\\mathbb{E}\\left(e^{itY}\\right)\\\\\n",
    "&= \\phi_X(t) \\phi_Y(t),\n",
    "\\end{align}$$\n",
    "    \n",
    "where we have used the fact that $X$ and $Y$ are independent to get from the second to the third line.\n",
    ":::\n",
    "\n",
    "As with the mgf, the characteristic function of a random variable is unique, in the sense that two radoom variables have the same distributions if and only if they have the same characteristic functions.\n",
    "\n",
    ":::{prf:theorem} Uniqueness of characteristic functions\n",
    "\n",
    "Let $X$ and $Y$ have characteristic functions $\\phi_X$ and $\\phi_Y$. Then $X$ and $Y$ have the same distributions if and only if $\\phi_X(t) = \\phi_Y(t)$ for all $\\mathbb{R}.$\n",
    ":::\n",
    "\n",
    "We can obtain the pdf of a random variable by applying the following inverse transformation.\n",
    "\n",
    ":::{prf:theorem} Inversion theorem\n",
    "\n",
    "Let $X$ have characteristic function $\\phi_X$ and density function $f.$\n",
    "Then\n",
    " \n",
    "$$\\begin{align}\n",
    "f(x) = \\frac{1}{2\\pi}\\int^\\infty_{-\\infty} e^{-itx} \\phi(t) dt,\n",
    "\\end{align}$$\n",
    " \n",
    "at every point $x$ where $f$ is differentiable.\n",
    ":::\n",
    "\n",
    "Note the similarity between the Fourier transform and the transform of the characteristic function."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "rw",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}