Vladimir Dobrushkin
http://math.uri.edu/~dobrush/

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the appendix entitled GNU Free Documentation License.

Markov Chains

Andrey Andreyevich Markov.

  In a probabilistic description of an experiment or process, the outcome or result is uncertain and may take any value from known set of states. For example, if a coin is tossed successively, the outcome in the n-th toss could be a head or a tail; or if an ordinary die is rolled, the outcome may be 1, 2, 3, 4, 5, or 6. Associated with each ourcome is a number p (\( 0 \le p \le 1 \) ), called the probability of the outcome, which is a measure of the likelihood of occurence of the outcome. If we denote the outcome of a roll of a die by X, then we write \( \Pr [X=i] = p_i \) to mean that the probability that X takes the value i is equal to pi, where \( p_1 + p_2 + \cdots + p_6 = 1 . \) The symbol X is called a random variable and, in general, X may take a real value.

 Often we are given the information that a particular event A has occurred, in which case the probability of occurrence of another event B will be effected. This gives rise to the concept of the conditional probability of the event A given that B has occurred, which is denoted by \( \Pr [A\,|\,B] \) and defined as

\[ \Pr [A\,|\,B] \Pr [A \ \mbox{ and } B]/\Pr [B] , \]
provided that \( \Pr [B] \ne 0. \)

A discrete-time Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event (also sometimes characterized as "memorylessness"). Roughly speaking, a process satisfies the Markov property if one can make predictions for the future of the process based solely on its present state just as well as one could knowing the process's full history, hence independently from such history.

Yakov Tamarkin.

  A Markov process is named after the Russian mathematician Andrey Andreyevich Markov (1856--1922), a student of Pafnuty Chebyshev (Saint Petersburg, Russia). Markov studied Markov processes in the early 20th century, publishing his first paper on the topic in 1906. Markov and his younger brother Vladimir Andreevich Markov (1871--1897) proved Markov brothers' inequality. His son, another Andrei Andreevich Markov (1903--1979), was also a notable mathematician, making contributions to constructive mathematics and recursive function theory. One of his famous Ph.D. students was Jacob/Yakov Davidovich Tamarkin (1888--1945), who upon immigration to the USA, worked since 1927 at Brown University.

 Markov chains have many applications as statistical models of real-world processes, such as studying cruise control systems in motor vehicles, queues or lines of customers arriving at an airport, exchange rates of currencies, storage systems such as dams, and population growths of certain animal species. The algorithm known as PageRank, which was originally proposed for the internet search engine Google, is based on a Markov process. Furthermore, Markov processes are the basis for general stochastic simulation methods known as Gibbs sampling and Markov Chain Monte Carlo, are used for simulating random objects with specific probability distributions, and have found extensive application in Bayesian statistics.

Example: Suppose that m moleculas are distributed between urns (I) and (II), and at each step or trial, a molecule is chosen at random and transferred from its urn to the otehr urn. Let Xn denote the number of molecules in urn (I) just after the n-th step. Then \( X_0 , X_1 , \ldots \) is a Markov chain with state space \( \left\{ 0, 1, \ldots , m \right\} \) and transition probabilities are
\[ \begin{split} P_{i,i-1} &= \frac{i}{m} , \quad\mbox{if } 0 \le i \le m , \\ P_{i,i+1} &= \frac{m-i}{m} , \quad\mbox{if } 0 \le i \le m , \\ P_{i,j} &=0 , \quad\mbox{if } |i-j| >1. \end{split} \]

 

Example: In \( \mathbb{R}^n , \) the vectors \( e_1 [1,0,0,\ldots , 0] , \quad e_2 =[0,1,0,\ldots , 0], \quad \ldots , e_n =[0,0,\ldots , 0,1] \) form a basis for n-dimensional real space, and it is called the standard basis. Its dimension is n.

 

Example: Let us consider the set of all real \( m \times n \) matrices, and let \( {\bf M}_{i,j} \) denote the matrix whose only nonzero entry is a 1 in the i-th row and j-th column. Then the set \( {\bf M}_{i,j} \ : \ 1 \le i \le m , \ 1 \le j \le n \) is a basis for the set of all such real matrices. Its dimension is mn.

 

Example: The set of monomials \( \left\{ 1, x, x^2 , \ldots , x^n \right\} \) form a basis in the set of all polynomials of degree up to n. It has dimension n+1. ■

 

Example: The infinite set of monomials \( \left\{ 1, x, x^2 , \ldots , x^n , \ldots \right\} \) form a basis in the set of all polynomials. ■

 

Theorem: Let V be a vector space and \( \beta = \left\{ {\bf u}_1 , {\bf u}_2 , \ldots , {\bf u}_n \right\} \) be a subset of V. Then β is a basis for V if and only if each vector v in V can be uniquely decomposed into a linear combination of vectors in β, that is, can be uniquely expressed in the form

\[ {\bf v} = \alpha_1 {\bf u}_1 + \alpha_2 {\bf u}_2 + \cdots + \alpha_n {\bf u}_n \]
for unique scalars \( \alpha_1 , \alpha_2 , \ldots , \alpha_n . \)

  If the vectors \( \left\{ {\bf u}_1 , {\bf u}_2 , \ldots , {\bf u}_n \right\} \) form a basis for a vector space V, then every vector in V can be uniquely expressed in the form

\[ {\bf v} = \alpha_1 {\bf u}_1 + \alpha_2 {\bf u}_2 + \cdots + \alpha_n {\bf u}_n \]
for appropriately chosen scalars \( \alpha_1 , \alpha_2 , \ldots , \alpha_n . \) Therefore, v determines a unqiue n-tuple of scalars \( \left[ \alpha_1 , \alpha_2 , \ldots , \alpha_n \right] \) and, conversely, each n-tuple of scalars determines a unique vector \( {\bf v} \in V \) by using the entries of the n-tuple as the coefficients of a linear combination of \( {\bf u}_1 , {\bf u}_2 , \ldots , {\bf u}_n . \) This fact suggests that V is like the n-dimensional vector space \( \mathbb{R}^n , \) where n is the number of vectors in the basis for V.

Theorem: Let S be a linearly independent subset of a vector space V, and let v be an element of V that is not in S. Then \( S \cup \{ {\bf v} \} \) is linearly dependent if and only if v belongs to the span of the set S.

Theorem: If a vector space V is generated by a finite set S, then some subset of S is a basis for V

A vector space is called finite-dimensional if it has a basis consisting of a finite
number of elements. The unique number of elements in each basis for V is called
the dimension of V and is denoted by dim(V). A vector space that is not finite-
dimensional is called infinite-dimensional.

  The next example demonstrates how Mathematica can determine the basis or set of linearly independent vectors from the given set. Note that basis is not unique and even changing the order of vectors, a software can provide you another set of linearly independent vectors.

Example: Suppose we are given four linearly dependent vectors:

MatrixRank[m = {{1, 2, 0, -3, 1, 0}, {1, 2, 2, -3, 1, 2}, {1, 2, 1, -3, 1, 1}, {3, 6, 1, -9, 4, 3}}]

Out[1]= 3

Then each of the following scripts determine a subset of linearly independent vectors:

m[[ Flatten[ Position[#, Except[0, _?NumericQ], 1, 1]& /@
Last @ QRDecomposition @ Transpose @ m ] ]]

Out[2]= {{1, 2, 0, -3, 1, 0}, {1, 2, 2, -3, 1, 2}, {3, 6, 1, -9, 4, 3}}

or, using subroutine

MinimalSublist[x_List] :=
Module[{tm, ntm, ytm, mm = x}, {tm = RowReduce[mm] // Transpose,
ntm = MapIndexed[{#1, #2, Total[#1]} &, tm, {1}],
ytm = Cases[ntm, {___, ___, d_ /; d == 1}]};
Cases[ytm, {b_, {a_}, c_} :> mm[[All, a]]] // Transpose]

we apply it to our set of vectors.

m1 = {{1, 2, 0, -3, 1, 0}, {1, 2, 1, -3, 1, 2}, {1, 2, 0, -3, 2, 1}, {3, 6, 1, -9, 4, 3}};
MinimalSublist[m1]

Out[3]= {{1, 0, 1}, {1, 1, 1}, {1, 0, 2}, {3, 1, 4}}

In m1 you see 1 row and n columns together,so you can transpose it to see it as column {{1, 1, 1, 3}, {0, 1, 0, 1}, {1, 1, 2, 4}}

One can use also the standard Mathematica command: IndependenceTest. ■