# Markov chains¶

## Markov chain and property¶

**Definition (Markov chain and markov property)** The sequence \(\mathbf{X} = X_0, X_1, ...\) is called a Markov chain if it satisfies the Markov property

for all \(n \geq 0\) and all \(x_0, x_1, ..., x_{n + 1} \in S\). The Markov chain is called homogeneuous if for all \(u, v \in S\), the conditional probability \(\mathbb{P}(X_{n + 1} = x_{n + 1} | X_n = x_n)\) does not depend on the value of \(n\). In this case, we the *transition matrix* \(P\) and *initial distribution* \(\lambda\) of the chain are defined as

Because they are probability distributions, \(P\) and \(\lambda\) satisfy

Any matrix \(P\) which satisfies the above property is called a **stochastic matrix**. The book[GS01] and these notes deal with homogeneous Markov chains only, although some of the definitions and theorems also apply to inhomogeneous Markov chains.

**Theorem (\(\mathbf{X}\) is a Markov chain \(\iff\) distribution factorises)** Let \(\lambda\) be a distribution and \(P\) be a stochastic matrix. The random sequence \(\mathbf{X} = (X_n : n \geq 0)\) is a Markov chain with initial distribution \(\lambda\) and transition matrix \(P\) if and only if

for all \(n \geq 0\) and \(x_0, x_1, ..., x_n \in S\).

## Proof: \(\mathbf{X}\) is a Markov chain \(\iff\) distribution factorises

Suppose \(\mathbb{X} = (X_n : 0 \neq n)\) is a Markov chain with initial distribution \(\lambda = (\lambda_i : i \neq S)\) and transition matrix \(P = (p_{i, j} : i, j \neq S)\). From the definition of conditional probability and using the Markov property

and proceeding recursively we obtain the required result, noting that \(\mathbb{P}(X_0 = x_0) = \lambda_{x_0}\). Going the other way, suppose

for all \(n \geq 0\) and \(x_0, x_1, ..., x_n \in S\). Then \(\mathbf{X}\) satisfies the Markov property because

**Theorem (Extended Markov property)** Let \(\mathbf{X} = (X_n : n \geq 0)\) is a Markov chain. For \(n \geq 0\), for any event \(H\) given in terms of \(X_0, X_1, ..., X_{n - 1}\) and any event given in terms of \(X_{n + 1}, X_{n + 2}, ...\),

## Partial Proof: Extended Markov property

This is a proof of a restricted version the extended Markov property, in which \(F\) depends on a finite number of values of the Markov chain, although the infinite case also holds.

Let \(H\) be an event \(X_0, X_1, ..., X_{n - 1}\) only, in that it is a function of these random variables only. Similarly, let \(F\) be a function of \(X_{n + 1}, X_{n + 2}, ... X_{n + k}\) for some \(k \geq 0\), only. By the Markov property we have

Then summing over all values of \(x_0, ..., x_{n - 1}\) corresponding to \(H\) and over all values of \(x_{n + 1}, ..., x_{n + k}\) corresponding to \(F\), and dividing both sides by \(\mathbb{P}(H | X_n = x_n)\), we obtain

arriving at the result.

**Theorem (Chapman-Kolmogorov equations)** Let \(\mathbf{X} = (X_n : n \geq 0)\) be a Markov chain with initial distribution \(\lambda\) and transition matrix \(P\). Then the *n-step transition probabilities* \(p_{x_i, x_j}(n) = \mathbb{P}(X_n = x_j | X_0 = x_i)\) satisfy

for \(x_i, x_j \in S\) and \(n, m \geq 0\).

## Proof: Chapman-Holmogorov equations

From the definition of \(p_{x_i, x_j}(n + m)\) we see

where we have used the Markov property to go from the second to the third line.

**Definition (Communicating states)** Let \(\mathbf{X} = X_0, X_1, ...\) be a Markov chain with state space \(S\) and transition matrix \(P\). For \(i, j \in S\), we say that \(i\) leads to \(j\), written \(i \to j\), if \(p_{i, j}(n) > 0\) for some \(n \geq 0\). If \(i \to j\) and \(j \to i\), we write \(i \leftrightarrow j\), and say that \(i\) and \(j\) communicate.

**Lemma (Communication relation)** The relation \(\leftrightarrow\) is an equivalence relation.

## Proof: Communication relation

A relation is an equivalence relation if it is reflexive, symmetric and transitive. First, for any \(i \in S\) we have \(i \to i\), so \(i \leftrightarrow i\), so communication is reflexive. Second, if \(i \leftrightarrow j\) we have \(i \to j\) and \(j \to i\), from which it follows \(j \to i\), so communication is symmetric. Finally, suppose \(i \leftrightarrow j\) and \(j \leftrightarrow k\). Then, there exist \(n, m \geq 0\) such that \(p_{i, j}(n) > 0\) and \(p_{j, k}(m) > 0\). By the Chapman-Kolmogorov equations

from which it follows that \(i \to k\). Similarly we have \(k \to i\), thus \(i \leftrightarrow k\) and \(\leftrightarrow\) is transitive.

**Definition (Communicating classes, irreducible chains, closed sets)** The equivalence classes of \(\leftrightarrow\) are called communicating classes. A Markov chain \(\mathbf{X}\) is called irreducible, if there is a single communicating class. A subset \(C \subseteq S\) is called closed if \(i \in C, i \to j \implies j \in C\). If the singleton set \(\{i\}\) is closed, \(i\) is called an absorbing state.

**Definition (First-passage times and probabilites)** The first-passage time to state \(j\) is defined as

and the first-passage probabilites are defined as

**Definition (First-passage times and probabilites)** A state \(i\) is called recurrent if \(\mathbb{P}(T_j < \infty | X_0 = i) = 1\), and it is called transient if it is not recurrent.

**Theorem (Recurrence \(\iff\) sum of return probabilities diverges)** The state \(i\) is recurrent if and only if

To prove this result we use the following theorem, which related the generating functions of the return probabilities and the first-passage times.

## Proof: Recurrence \(\iff\) sum of return probabilities diverges

Starting from the relation

we set \(j = i\) and rearrange to obtain

Then letting \(s \to 1\), we obtain

and considering \(\lim_{s \to 1} F_{i, i}(s) = f_{i, i}\), we see that the sum on the left diverges if and only if \(f_{i, i} = 1\), i.e. if the state is recurrent.

**Theorem (Gen. func. of return and first-passage probabilities)** Let \(P_{i, j}(s)\) and \(F_{i, j}(s)\) be the generating functions

Then for any \(i, j \in S\), we have

## Proof: Gen. func. of return and first-passage probabilities

First, we can write \(p_{i, j}(n)\) as

Writing \(H = \{X_n \neq j \text{ for } 1 \leq n < m\}\) and using the extended Markov property

Using the fact that the chain is homogeneous

and multiplying both sides by \(s^n\) and summing over \(n\), we obtain

where we have used the fact \(f_{i, j}(0) = 0\). Lastly, using the fact that \(p_{i, j}(0) = \delta_{i, j}\), we arrive at the result

**Theorem (Communicating class and recurrence/transience)** Let \(C\) be a communicating class. Then

Either every state in \(C\) is recurrent or every state is transient.

Suppose \(C\) contains some recurrent state. Then \(C\) is closed.

## Proof: Communicating class and recurrence/transience

**Part 1:** Suppose \(i \in C\) is recurrent. Then from the Chapman-Kolmogorov equations, for any \(j \in C\) we have

so \(j\) is also recurrent.

**Part 2:** Suppose \(i \in C\) is recurrent and that \(C\) is not closed. Then there exist \(j \in C\) and \(k \not \in C\) such that \(j \to k\) and \(k \not \to j\). From the previous part of this theorem, \(j\) is also recurrent. Using these facts we arrive at the contradiction

where the first inequality follows because \(k \not \to j\), so if the chain transitions from \(j\) to \(k\) in the first step, it cannot return to \(j\) in any number of steps. Therefore the assumption that \(C\) is not closed, is contradicted.

**Theorem (Communication and recurrence of finite \(S\))** Suppose that the state space \(S\) is finite

There exists at least one recurrent state.

If the chain is irreducible, all states are recurrent.

## Proof: Communication and recurrence of finite \(S\)

**Part 1:** Suppose \(S\) is finite and that all states are transient. Since all states are transient

for any \(i, j \in S\). Since \(P_{j, i}(1) = \sum^\infty_{n = 1} p_{j, i}(n) < \infty\), we must have \(p_{j, i}(n) \to 0\) as \(n \to \infty\). Since \(p_{i, j}(n)\) is a distribution however

reaching a contradiction.

**Part 2:** Suppose that the chain is irreducible, so that all states belong to the same communicating class. From the first part of this theorem, there exists at least one recurrent state and since all states belonging to the same communicating class are either all recurrent or all transient, it follows that all states are recurrent.

**Theorem (Polya’s theorem)** The symmetric random walk on \(\mathbb{Z}^d\) is recurrent if \(d = 1, 2, ...\) and transient if \(d \geq 3\).

**Definition (Hitting time, hitting and absorption probabilities)** The hitting time of a subset \(A \subseteq S\) is the earliest time \(n\) at which \(X_n \in A\).

The hitting probability is defined in terms of the hitting time as

If \(A\) is closed, \(h^A_i\) is called an absorption probability.

**Theorem (Hitting probabilities)** The vector of hitting probabilities \(h^A = (h^A_i : i \in S)\) is the minimal non-negative solution to

**Theorem (Expected hitting times)** The vector of expected hitting times \(k^A = (k_i^A = \mathbb{E}\left[H_i^A\right] : i \in S)\) is the minimal non-negative solution to

## Strong Markov property¶

**Definition (Stopping time)** The random variable \(T : \Omega \to \{0, 1, 2, ...\} \cup \{\infty\}\) is called a stopping time for the Markov chain \(\mathbf{X}\), if the event \(\{T = n\}\) is given in terms of \(X_0, X_1, ..., X_n\) only, for all \(n \geq 0\).

**Theorem (Strong Markov property)** Let \(\mathbf{X} = X_0, X_1, ...\) be a Markov chain with transition matrix \(P\), and let \(T\) be a stopping time. Given \(T < \infty\) and \(X_T = i\), the sequence \(\mathbf{Y} = Y_0, Y_1, ...\), given by \(Y_k = X_{T + k}\), is a Markov chain with transition matrix \(P\) and initial state \(Y_0 = i\). Further, given that \(T < \infty\) and \(X_T = i\), \(\mathbf{Y}\) is independent of \(X_0, X_1, ..., X_{T - 1}\).

## Proof: Strong Markov property

We want to show that

which follows from the Markov property, except we also need to take care conditioning on the event \(\{T < \infty\}\) rather than \(\{T < \infty\} \cup \{T = \infty\}\). Let \(0 \leq m < \infty\) and consider

which follows from the Markov property together with the facts that \(T\) is a stopping time and the chain is homogeneous. Now summing over \(m = 0, 1, 2, ...\) and dividing by \(\mathbb{P}(T < \infty | X_T = i)\) on both sides we obtain

## Classification of states¶

**Theorem (PDF of number of visits)** Let \(X_0 = i\) and let \(V_i = |\{n \geq 1 : X_n = i\}|\) be the number of visits of the chain to state \(i\). Then \(V_i\) has the geometric distribution

where \(f = f_{i, i} = \mathbb{P}(X_n = i \text{ for some } n \geq 1)\), is the return probability. From this it follows that

\(\mathbb{P}(V_i = \infty | X_0 = i) = 1\) if \(i\) is recurrent,

\(\mathbb{P}(V_i < \infty | X_0 = i) = 1\) if \(i\) is transient.

## Proof: PDF of number of visits

Let \(f_{i, i} = \mathbb{P}(T_i < \infty | X_0 = i)\) and write \(T_i^r\) for the time at which the chain visits state \(i\) for the \(r^{th}\) time. Then

where we have used the strong Markov property and the fact that the chain is homogeneous. In particular \(T_i^k\) is a stopping time so

Proceeding recursively, we obtain

For the limiting behaviour, we consider that \(f_{i, i} = 1\) if \(i\) is recurrent and \(f_{i, i} = 0\) if \(i\) is transient and let \(r \to \infty\), arriving at the result.

**Definition (Mean recurrence time)** The mean recurrence time \(\mu_i\) of a state \(i\) is defined by

**Definition (Null and positive states)** If \(i\) is recurrent, we call it null if \(\mu_i = \infty\), and positive if \(\mu_i < \infty\).

**Definition (Period of a state)** The period \(d_i\) of the state \(i\) is

The state \(i\) is called aperiodic if \(d_i = 1\), and periodic if \(d_i > 1\).

**Definition (Ergodic states)** State \(i\) is called ergodic if it is aperiodic and positive recurrent.

**Theorem (Implications of communication between states)** If \(i \leftrightarrow j\), then

\(i\) and \(j\) have the same period,

\(i\) is recurrent if and only if \(j\) is recurrent,

\(i\) is positive recurrent if and only if \(j\) is positive recurrent,

\(i\) is ergodic if and only if \(j\) is ergodic.

## Invariant distributions¶

**Definition (Invariant distribution)** Let \(\mathbf{X} = X_0, X_1, ...\) be a Markov chain with transition matrix \(P\). The vector \(\pi = (\pi_i : i \in S)\) is called an invariant distribution of the chain if:

It is a distribution: \(\pi_i \geq 0\) for all \(i \in S\), and \(\sum_{i \in S} \pi_i = 1\),

It is invariant under the transition matrix: \(\pi_j = \sum_{i \in S} \pi_i P_{i, j}\).

**Theorem (Implications of communication between states)** Consider an irreducible Markov chain.

There exists an invariant distribution \(\pi\) if and only if some state is positive recurrent.

If there exists an invariant distribution \(\pi\), then every state is positive recurrent and \begin{align}\pi_i = \frac{1}{\mu_i} \text{ for } i \in S,\end{align} where \(\mu_i\) is the mean recurrence time of state \(i\). In particular, \(\pi\) is the unique invariant distribution.

## Convergence to equilibrium¶

**Theorem (Convergence theorem for Markov chains)** Consider a Markov chain that is aperiodic, irreducible and positive recurrent. For \(i, j \in S\)

where \(\pi\) is the unique invariant distribution of the chain.

**Theorem (Irreducibility, recurrence and nullness)** Let \(\mathbf{X}\) be an irreducible, recurrent Markov chain. The following are equivalent

There exists a state \(i\) such that \(p_{i, i}(n) \to 0\) as \(n \to \infty\).

Every state is null recurrent.

**Theorem (Convergence of mean visitation)** Let \(i \in S\). If the chain is irreducible and positive recurrent,

irrespective of the initial distribution of the chain.

## Time reversal¶

**Definition (Reverse chain)** Let \(\mathbf{X}\) be an irreducible, positive recurrent Markov chain, with transition matrix \(P\) and initial distribution \(\lambda = \pi\) equal to its invariant distribution \(\pi\). The reversed chain \(\mathbf{Y} = (Y_n : 0 \leq n \leq N)\) is given by \(Y_n = X_{N - n}\) for \(0 \leq n \leq N\).

**Theorem (Reverse chain)** Given a markov Chain \(\mathbf{X}\), its reversed chain \(\mathbf{Y}\) is an irreducible Markov chain with transition matrix \(\hat{P} = (\hat{p} : i, j \in S)\) given by

with an invariant distribution \(\pi\).

**Definition (Reversible chain)** Let \(\mathbf{X} = (X_n : 0 \leq n \leq N)\) be an irreducible Markov chain such that \(X_0\) has the invariant distribution \(\pi\). The chain is called reversible if \(\mathbb{X}\) and its time reversal \(\mathbb{Y}\) have the same distribution matrices, which is to say that

**Theorem (Reverse chain)** Let \(P\) be the transition matrix of an irreducible chain \(\mathbf{X}\), and suppose that \(\pi\) is a distribution statisfying

Then \(\pi\) is the unique invariant distribution of the chain. Furthermore, \(\mathbf{X}\) is reversible in equilibrium.

## Random walk on a graph¶

**Theorem (Random walk on a finite connected graph)** The random walk on the finite connected graph \(G = (V, E)\) is an irreducible Markov chain with unique invariant distribution

The chain is reversible in equilibrium.