## Vladimir Dobrushkin

http://math.uri.edu/~dobrush/

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the appendix entitled GNU Free Documentation License.

# Markov Chains

In a probabilistic description of an experiment or process, the outcome or result is uncertain and may take
any value from known set of states. For example, if a coin is tossed successively, the outcome in the *n*-th
toss could be a head or a tail; or if an ordinary die is rolled, the outcome may be 1, 2, 3, 4, 5, or 6. Associated
with each ourcome is a number *p* (\( 0 \le p \le 1 \) ), called the probability of
the outcome, which is a measure of the likelihood of occurence of the outcome. If we denote the outcome of a roll
of a die by *X*, then we write \( \Pr [X=i] = p_i \) to mean that the probability that
*X* takes the value *i* is equal to *p*_{i}, where
\( p_1 + p_2 + \cdots + p_6 = 1 . \) The symbol *X* is called a random variable and,
in general, *X* may take a real value.

Often we are given the information that a particular event *A* has occurred, in which case the probability
of occurrence of another event *B* will be effected. This gives rise to the concept of the conditional probability
of the event *A* given that *B* has occurred, which is denoted by
\( \Pr [A\,|\,B] \) and defined as

A discrete-time **Markov chain** is a stochastic model describing a sequence of possible events in which
the probability of each event depends only on the state attained in the previous event (also sometimes
characterized as "memorylessness"). Roughly speaking, a process satisfies the Markov property if one can make
predictions for the future of the process based solely on its present state just as well as one could knowing the
process's full history, hence independently from such history.

A Markov process is named after the Russian mathematician Andrey Andreyevich Markov (1856--1922), a student of Pafnuty Chebyshev (Saint Petersburg, Russia). Markov studied Markov processes in the early 20th century, publishing his first paper on the topic in 1906. Markov and his younger brother Vladimir Andreevich Markov (1871--1897) proved Markov brothers' inequality. His son, another Andrei Andreevich Markov (1903--1979), was also a notable mathematician, making contributions to constructive mathematics and recursive function theory. One of his famous Ph.D. students was Jacob/Yakov Davidovich Tamarkin (1888--1945), who upon immigration to the USA, worked since 1927 at Brown University.

Markov chains have many applications as statistical models of real-world processes, such as studying cruise control systems in motor vehicles, queues or lines of customers arriving at an airport, exchange rates of currencies, storage systems such as dams, and population growths of certain animal species. The algorithm known as PageRank, which was originally proposed for the internet search engine Google, is based on a Markov process. Furthermore, Markov processes are the basis for general stochastic simulation methods known as Gibbs sampling and Markov Chain Monte Carlo, are used for simulating random objects with specific probability distributions, and have found extensive application in Bayesian statistics.

*m*moleculas are distributed between urns (I) and (II), and at each step or trial, a molecule is chosen at random and transferred from its urn to the otehr urn. Let

*X*

_{n}denote the number of molecules in urn (I) just after the

*n*-th step. Then \( X_0 , X_1 , \ldots \) is a Markov chain with state space \( \left\{ 0, 1, \ldots , m \right\} \) and transition probabilities are

*n*-dimensional real space, and it is called the

**standard basis**. Its dimension is

*n*.

*i*-th row and

*j*-th column. Then the set \( {\bf M}_{i,j} \ : \ 1 \le i \le m , \ 1 \le j \le n \) is a basis for the set of all such real matrices. Its dimension is

*mn*.

*n*. It has dimension

*n*+1. ■

Theorem: Let *V* be a vector space and
\( \beta = \left\{ {\bf u}_1 , {\bf u}_2 , \ldots , {\bf u}_n \right\} \) be a subset of
*V*. Then β is a basis for *V* if and only if each vector *v* in *V* can be uniquely
decomposed into a linear combination of vectors in β, that is, can be uniquely expressed in the form

If the vectors \( \left\{ {\bf u}_1 , {\bf u}_2 , \ldots , {\bf u}_n \right\} \)
form a basis for a vector space *V*, then every vector in *V* can be uniquely expressed in the form

**v**determines a unqiue

*n*-tuple of scalars \( \left[ \alpha_1 , \alpha_2 , \ldots , \alpha_n \right] \) and, conversely, each

*n*-tuple of scalars determines a unique vector \( {\bf v} \in V \) by using the entries of the

*n*-tuple as the coefficients of a linear combination of \( {\bf u}_1 , {\bf u}_2 , \ldots , {\bf u}_n . \) This fact suggests that

*V*is like the

*n*-dimensional vector space \( \mathbb{R}^n , \) where

*n*is the number of vectors in the basis for

*V*.

Theorem: Let *S* be a linearly independent subset of a vector space *V*,
and let **v** be an element of *V* that is not in *S*. Then
\( S \cup \{ {\bf v} \} \) is linearly dependent if and only if **v**
belongs to the span of the set *S*.

Theorem: If a vector space *V* is generated by a finite set *S*, then
some subset of *S* is a basis for *V* ■

A vector space is called **finite-dimensional** if it has a basis consisting of a finite

number of elements. The unique number of elements in each basis for *V* is called

the **dimension** of *V* and is denoted by dim(*V*). A vector space that is not finite-

dimensional is called
**infinite-dimensional**.

The next example demonstrates how *Mathematica* can determine the basis or set of linearly independent
vectors from the given set. Note that basis is not unique and even changing the order of vectors, a software can provide
you another set of linearly independent vectors.

```
MatrixRank[m =
{{1, 2, 0, -3, 1, 0},
{1, 2, 2, -3, 1, 2},
{1, 2, 1, -3, 1, 1},
{3, 6, 1, -9, 4, 3}}]
```

Then each of the following scripts determine a subset of linearly independent vectors:

```
m[[ Flatten[ Position[#, Except[0, _?NumericQ], 1, 1]& /@
```

Last @ QRDecomposition @ Transpose @ m ] ]]

or, using subroutine

```
MinimalSublist[x_List] :=
```

Module[{tm, ntm, ytm, mm = x}, {tm = RowReduce[mm] // Transpose,

ntm = MapIndexed[{#1, #2, Total[#1]} &, tm, {1}],

ytm = Cases[ntm, {___, ___, d_ /; d == 1}]};

Cases[ytm, {b_, {a_}, c_} :> mm[[All, a]]] // Transpose]

we apply it to our set of vectors.
```
m1 = {{1, 2, 0, -3, 1, 0}, {1, 2, 1, -3, 1, 2}, {1, 2, 0, -3, 2,
1}, {3, 6, 1, -9, 4, 3}};
```

MinimalSublist[m1]

`{{1, 1, 1, 3}, {0, 1, 0, 1}, {1, 1, 2, 4}}`

One can use also the standard *Mathematica*command:

**IndependenceTest**. ■