Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Hands-On Markov Models with Python
Hands-On Markov Models with Python

Hands-On Markov Models with Python: Implement probabilistic models for learning complex data sequences using the Python ecosystem

eBook
$20.98 $29.99
Paperback
$38.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Hands-On Markov Models with Python

Introduction to the Markov Process

In this chapter, we will develop the basic concepts that we need to understand Hidden Markov Models (HMM). We will cover the following topics:

  • Random processes
  • Markov processes
  • Markov chains or discrete-time Markov processes
  • Continuous-time Markov chains

Random variables

As we always do in statistics, let's start with a simple example of rolling a dice. If we consider rolling a fair dice, the outcome of the dice can be anything from 1 to 6, and is random. To represent such situations (the outcome of rolling the dice in this case), in mathematics we use the concept of random variables. We come across a lot of such variables in our everyday lives. Another example could be ordering food at a restaurant. In this case, the outcome could be any food item on the menu. In general terms, a random variable is a variable whose possible values are outcomes of a random phenomenon. The possible states of the outcomes are also known as the domain of the random variable, and the outcome is based on the probability distribution defined over the domain of the random variable.

Coming back to rolling the dice, the domain of the random variable outcome, O, is given by domain(O) = (1, 2, 3, 4, 5, 6), and the probability distribution is given by a uniform distribution P(o) = 1/6 ∀ ∈ domain(O). Similarly, in the case of the restaurant example, for the random variable choosing a dish, the domain would be every item on the menu, and the probability distribution would depend on your food preference. In both of the previous examples, the domain of the random variable has discrete variables; such random variables are known as discrete random variables. But it's also possible for the domain to be a continuous space. For example, consider the random variable representing the stock price of Google tomorrow. The domain of this random variable will be all positive real numbers with most of the probability mass distributed around ±5% of today's price. Such random variables are known as continuous random variables.

Random processes

In the previous section, we discussed random variables that are able to mathematically represent the outcomes of a single random phenomenon. But what if we want to represent these random events over some period of time or the length of an experiment? For example, let's say we want to represent the stock prices for a whole day at intervals of every one hour, or we want to represent the height of a ball at intervals of every one second after being dropped from some height in a vacuum. For such situations, we would need a set of random variables, each of which will represent the outcome at the given instance of time. These sets of random variables that represent random variables over a period of time are also known as random processes. It is worth noting that the domains of all these random variables are the same. Therefore, we can also think of the process as just changing the states.

Here, we have been talking about random variables at different instances of time, but it doesn't need to be time-based in every case. It could be just some other event. But since, in most cases, it is usually time, and it is much easier to talk about random processes in terms of time, we will use time to represent any such event. The same concepts will apply to creating a model if it varies over some other event instead of time.

Now let's discuss the previous two examples in more detail. Starting with the example of dropping the ball from a height in a vacuum, if we know the exact value of gravity and the height from which the ball is being dropped, we will be able to determine the exact location of the ball at every interval of one second using Newton's laws of motion.

Such random processes, in which we can deterministically find the state of each random variable given the initial conditions (in this case, dropping the ball, zero initial velocity) and the parameters of the system (in this case, the value of gravity), are known as deterministic random processes (commonly called deterministic processes).

Now let's go to the second example; representing the stock price over time. In this case, even if we know the current price and the exact probability distribution of the price at the next one hour mark, we won't be able to deterministically compute the value. These random processes, in which we can't determine the state of a process, even if we are given the initial conditions and all the parameters of the system, are known as stochastic random processes (commonly called processes). A very good way of understanding or getting a feel for a stochastic process is to think of it as being the opposite of a deterministic process.

Markov processes

A stochastic process is called a Markov process if the state of the random variable at the next instance of time depends only on the outcome of the random variable at the current time. In simplistic mathematical terms, for a stochastic process, S = {R1, R2, . . ., Rn} = {R}t=1, . . ., n, to be a Markov process, it must satisfy the following condition:

According to the previous condition, the probability distribution for any variable at any given instance in a Markov process is a conditional distribution, which is conditioned only on the random variable at the last time instance. This property of a system, such that the future states of the system depend only on the current state of the system, is also known as the Markov property. Systems satisfying the Markov property are also known as memoryless systems since they don't need to remember the previous states to compute the distribution of the next state, or, in other words, the next state depends only on the current state of the system.

A very common example used to explain the Markov process is a drunk man walking along a street. We consider that, since the man is drunk, he can either take a step backward, a step forward, or stay in his current position, which is given by some distribution of these, let's say [0.4, 0.4, 0.2]. Now, given the position of the man at any given instance in time, his position at the next instance depends only on his current position and the parameters of the system (his step size and the probability distribution of possible actions). Therefore, this is an example of a Markov process.

In the previous example, let's assume that the drunk man takes an action (steps forward/backward or stays in his position) at fixed intervals of time and his step size is always the same. With these considerations, the Markov process in our example has a discrete state space. Also, since the man takes steps after fixed intervals of time, we can think of it as a discrete time. But Markov processes don't need to have discrete state space or discrete time intervals. Considering discrete and continuous time as well as discrete and continuous state space, we can categorize Markov processes into four main categories:

  • Discrete time and discrete state space
  • Discrete time and continuous state space
  • Continuous time and discrete state space
  • Continuous time and continuous state space

We will discuss each of these categories of Markov process in more detail in the following sections.

Installing Python and packages

Installation on Windows

Miniconda can be installed on a Windows system by just double-clicking on the downloaded .exe file and following the installation instructions. After installation, we will need to create a conda environment and install all the required packages in the environment. To create a new Python 3.4 environment with the name hmm, run the following command:

conda create -n hmm python=3.4

After creating the environment, we will need to activate it and install the required packages in it. This can be done using the following commands:

activate hmm
conda install numpy scipy

Installation on Linux

On Linux, after downloading the Miniconda file, we will need to give it execution permissions and then install it. This can be done using the following commands:

chmod +x Miniconda.sh
./Miniconda.sh

After executing the file, we can simply follow the installation instructions. Once installed, we will need to create a new environment and install the required packages. We can create a new Python 3.4 environment with the name hmm using the following commands:

conda create -n hmm python=3.4

Once the environment has been created, we will need to activate it and install the packages inside it using the following:

source activate hmm
conda install numpy scipy

Markov chains or discrete-time Markov processes

A Markov chain is a type of Markov process in which the time is discrete. However, there is a lot of disagreement among researchers on what categories of Markov process should be called Markov chain. But, most commonly, it is used to refer to discrete-state-space Markov processes. Therefore, a Markov chain is a stochastic process over a discrete state space satisfying the Markov property. More formally, we can say that a discrete-time Markov chain is a sequence of random variables X1, X2, X3, ... that satisfy the Markov property, namely that the probability of moving from the current state to the next state depends only on the present state and not on any of the previous states. In terms of the probability distribution, we can say that, given that the system is at time instance n, the conditional distribution of the states at the next time instance, n + 1, is conditionally independent of the state of the system at time instances {1, 2, . . ., n-1}, given the state of the random variable at time instance n. This can be written as follows:

Markov chains are often represented using directed graphs. The nodes in the directed graphs represent the different possible states of the random variables, and the edges represent the probability of the system going from one state to the other in the next time instance. Let's take a simple example of predicting the weather to understand this representation better. We will consider that there are three possible states of the random variable Weather={Sunny, Rainy, Snowy}, and possible Markov chains for this can be represented as shown in Figure 1.1:

Figure 1.1: A simple Markov chain on the random variable, representing the random variable Weather={Sunny, Rainy, Snowy} and showing the probability of the random variable switching to other states in the next time instance

One of the main points to understand in Markov chains is that we are modeling the outcomes of a sequence of random variables over time. This is sometimes confusing for people since the model is represented using a single graph, which doesn't mention anything about time. So, the name state transitions is not a particularly good name for this, since the state is not changing for any random variable; rather, we are trying to determine the state of the next random variable given the observed state of our current random variable. Coming back to our example, we can see that the nodes of the graph represent the different possible states of the random variable Weather, and the edges between them show the probability of the next random variable taking the different possible states, given the state of the current random variable. The self-loops show the probability of the model staying in its current state. In the previous Markov chain, let's say we know that the observed state of the current random variable is Sunny, then the probability that the random variable at the next time instance will also take the value Sunny is 0.8. It could also take the value Rainy with a probability of 0.19, or Snowy with a probability of 0.01. One thing to note here is that the sum of all the probability values on all the outward edges from any state should equal 1, since it's an exhaustive event.

Now, let's try to code this simple Markov chain. We will start by defining a simple MarkovChain class, and we will keep on adding methods to this class as we go through this chapter:

import numpy as np

class MarkovChain(object):
def __init__(self, transition_prob):
"""
Initialize the MarkovChain instance.

Parameters
----------
transition_prob: dict
A dict object representing the transition probabilities in
Markov Chain. Should be of the form: {'state1': {'state1':
0.1, 'state2': 0.4}, 'state2': {...}}
"""
self.transition_prob = transition_prob
self.states = list(transition_prob.keys())

def next_state(self, current_state):
"""
Returns the state of the random variable at the next time
instance.

Parameters
----------
current_state: str
The current state of the system.
"""
return np.random.choice(
self.states, p=[self.transition_prob[current_state][next_state]
for next_state in self.states])

def generate_states(self, current_state, no=10):
"""
Generates the next states of the system.

Parameters
----------
current_state: str
The state of the current random variable.

no: int
The number of future states to generate.
"""
future_states = []
for i in range(no):
next_state = self.next_state(current_state)
future_states.append(next_state)
current_state = next_state
return future_states

Now, we can try out our example with this MarkovChain class:

>>> transition_prob = {'Sunny': {'Sunny': 0.8, 'Rainy': 0.19, 
'Snowy': 0.01},
'Rainy': {'Sunny': 0.2, 'Rainy': 0.7,
'Snowy': 0.1},
'Snowy': {'Sunny': 0.1, 'Rainy': 0.2,
'Snowy': 0.7}}

>>> weather_chain = MarkovChain(transition_prob=transition_prob)
>>> weather_chain.next_state(current_state='Sunny')
'Sunny'
>>> weather_chain.next_state(current_state='Snowy')
'Snowy'
>>> weather_chain.generate_states(current_state='Snowy', no=10)
['Snowy', 'Snowy', 'Snowy', 'Rainy', 'Snowy', 'Snowy', 'Rainy', 'Rainy', 'Snowy', 'Snowy']
In the previous code example, you might find your outputs to be different from what's shown here. This is because the Markov chain is probabilistic in nature and it picks on the next state based on a probability distribution, which can give different outputs on different runs.

So far in the discussion, we have considered that the probability space of the variables doesn't change over different instances of time. This is known as a time-homogeneous Markov chain, but it is also possible to have a time-inhomogeneous Markov chain, which also has a lot of applications but is outside the scope of this book.

Parameterization of Markov chains

In the code for the Markov chain in the previous section, we used a dictionary to parameterize the Markov chain that had the probability values of all the possible state transitions. Another way of representing state transitions is using a transition matrix. The transition matrix, as the name suggests, uses a tabular representation for the transition probabilities. The transition matrix for the example in Figure 1.1 is shown in the following table.

The following table shows the transition matrix for the Markov chain shown in Figure 1.1. The probability values represent the probability of the system going from the state in the row to the states mentioned in the columns:

States Sunny Rainy Snowy
Sunny 0.8 0.19 0.01
Rainy 0.2 0.7 0.1
Snowy 0.1+ 0.2 0.7

The transition matrix represents the same information as in the dictionary, but in a more compact way. For this reason, the transition matrix is the standard way of representing Markov chains. Let's modify our MarkovChain class so that it can accept a transition matrix:

import numpy as np

class MarkovChain(object):
def __init__(self, transition_matrix, states):
"""
Initialize the MarkovChain instance.

Parameters
----------
transition_matrix: 2-D array
A 2-D array representing the probabilities of change of
state in the Markov Chain.

states: 1-D array
An array representing the states of the Markov Chain. It
needs to be in the same order as transition_matrix.
"""
self.transition_matrix = np.atleast_2d(transition_matrix)
self.states = states
self.index_dict = {self.states[index]: index for index in
range(len(self.states))}
self.state_dict = {index: self.states[index] for index in
range(len(self.states))}

def next_state(self, current_state):
"""
Returns the state of the random variable at the next time
instance.

Parameters
----------
current_state: str
The current state of the system.
"""
return np.random.choice(
self.states,
p=self.transition_matrix[self.index_dict[current_state], :])

def generate_states(self, current_state, no=10):
"""
Generates the next states of the system.

Parameters
----------
current_state: str
The state of the current random variable.

no: int
The number of future states to generate.
"""
future_states = []
for i in range(no):
next_state = self.next_state(current_state)
future_states.append(next_state)
current_state = next_state
return future_states

Running this code should also give similar results to what we got in the previous section. Using a transition matrix might not seem like a good idea because it requires us to create extra variables to store the indices. But, in cases when we have hundreds of states, using a transition matrix is much more efficient than using the simple dictionary implementation. In the case of a transition matrix, we can simply use NumPy indexing to get the probability values in the next_state method, whereas we were looping over all the state names in the previous implementation:

>>> transition_matrix = [[0.8, 0.19, 0.01],
[0.2, 0.7, 0.1],
[0.1, 0.2, 0.7]]
>>> weather_chain = MarkovChain(transition_matrix=transition_matrix,
states=['Sunny', 'Rainy', 'Snowy'])
>>> weather_chain.next_state(current_state='Sunny')
'Sunny'
>>> weather_chain.next_state(current_state='Snowy')
'Sunny'
>>> weather_chain.generate_states(current_state='Snowy', no=10)
['Snowy', 'Rainy', 'Rainy', 'Rainy', 'Rainy', 'Rainy',
'Rainy', 'Rainy', 'Sunny', 'Sunny']

Properties of Markov chains

In this section, we will talk about the different properties of Markov chains, namely reducibility, periodicity, transience and recurrence, ergodicity, and steady-state analysis and limiting distributions. We will also try some simple examples of our MarkovChain class to show these properties.

Reducibility

A Markov chain is said to be irreducible if we can reach any state of the given Markov chain from any other state. In terms of states, state j is said to be accessible from another state i if a system that started at state i has a non-zero probability of getting to the state j. In more formal terms, state j is said to be accessible from state i if an integer nij ≥ 0 exists such that the following condition is met:

The nij here is basically the number of steps it takes to go from state i to j, and it can be different for different pairs of values for i and j. Also, for a given state i, if all the values for nij = 0, it means that all the states of the Markov chain are directly accessible from it. The accessibility relation is reflexive and transitive, but not necessary symmetric. We can take a simple example to understand this property:

Figure 1.2: An example of an irreducible Markov chain

In the previous example, it can be clearly seen that all of the states are accessible from all other states and hence are irreducible.

Note in the examples in Figure 1.2 and Figure 1.3 that we haven't represented edges if probability values are 0. This helps to keep the model less complicated and easier to read.

In the following example, we can see that state D is not accessible from A, B, or C. Also, state C is not accessible from either A or B. But all the states are accessible from state D, and states A and B are accessible from C:

Figure 1.3: An example of a reducible Markov chain

We can also add a couple of methods to our MarkovChain class to check which states in our chain are reachable and whether our chain is irreducible:

from itertools import combinations

def is_accessible(self, i_state, f_state):
"""
Check if state f_state is accessible from i_state.

Parameters
----------
i_state: str
The state from which the accessibility needs to be checked.

f_state: str
The state to which accessibility needs to be checked.
"""
reachable_states = [i_state]
for state in reachable_states:
if state == self.index_dict[f_state]:
return True
else:
reachable_states.append(np.nonzero(
self.transition_matrix[self.index_dict[i_state], :])[0])
return False

def is_irreducible(self):
"""
Check if the Markov Chain is irreducible.
"""
for (i, j) in combinations(self.states, self.states):
if not self.is_accessible(i, j):
return False
return True

Let's give our examples a try using the examples in Figure 1.2 and Figure 1.3:

>>> transition_irreducible = [[0.5, 0.5, 0, 0],
[0.25, 0, 0.5, 0.25],
[0.25, 0.5, 0, 0.25],
[0, 0, 0.5, 0.5]]
>>> transition_reducible = [[0.5, 0.5, 0, 0],
[0, 1, 0, 0],
[0.25, 0.5, 0, 0],
[0, 0, 0.25, 0.75]]
>>> markov_irreducible = MarkovChain(transition_matrix=transition_irreducible,
states=['A', 'B', 'C', 'D'])
>>> markov_reducible = MarkovChain(transition_matrix=transition_reducible,
states=['A', 'B', 'C', 'D'])
>>> markov_irreducible.is_accessible(i_state='A', f_state='D')
True
>>> markov_irreducible.is_accessible(i_state='B', f_state='D')
True
>>> markov_irreducible.is_irreducible()
True
>>> markov_reducible.is_accessible(i_state='A', f_state='D')
False
>>> markov_reducible.is_accessible(i_state='D', f_state='A')
True
>>> markov_reducible.is_accessible(i_state='C', f_state='D')
False
>>> markov_reducible.is_irreducible()
False

Periodicity

State i is said to have period k if any possible path to return to state i would be a multiple of k steps. Formally, it is defined like this:

Here, gcd means the greatest common divisor (GCD). Basically, k is the GCD of the length/number of steps of all possible paths from state i back to itself. If there are no possible paths from state i back to itself, then the period for it is not defined. We also need to note that k has nothing to do with the number of steps required to return to the starting state. For example, let's say that for any given state the number of steps required to return to it are (4, 6, 8, 12, 16). In this case k=2, but the minimum number of steps required to return is 4, and 2 doesn't even appear in the list of possible numbers of steps.

For any given state in the Markov chain, if k=1, the state is said to be aperiodic. A Markov chain is called aperiodic if all of its states are aperiodic. One major thing to note is that, in the case of an irreducible Markov chain, a single aperiodic state is enough to imply that all the states are aperiodic. Let's take a simple example and check the periodicity of different states:

Figure 1.4: Markov chain is also periodic

In the previous example, we can easily see that for state A the possible paths to return are A -> B -> C -> A or A -> B -> C -> D -> E -> C -> A. For these two paths, the path lengths are 3 and 6, respectively, and hence state A has a period of 3. Similarly, B, C, D, and E also each has a period of 3 in the Markov chain, and hence the Markov chain is also periodic:

Figure 1.5: Example of Markov Chain with aperiodic states.

In this example, we added a couple of extra edges, due to which the possible path lengths for A are now 3, 5, 7, ...; and for B are 2, 3, 4, 5, ... And, since the GCD of these path lengths is 1, states A and B are both now aperiodic. Similarly, we can compute the period of other nodes, each of which is also 1, and hence the Markov chain is also aperiodic.

Let's now add a couple of new methods to our MarkovChain class to compute the period of different states and check whether our model is aperiodic:

def get_period(self, state):
"""
Returns the period of the state in the Markov Chain.

Parameters
----------
state: str
The state for which the period needs to be computed.
"""
return gcd([len(i) for i in all_possible_paths])

def is_aperiodic(self):
"""
Checks if the Markov Chain is aperiodic.
"""
periods = [self.get_period(state) for state in self.states]
for period in periods:
if period != 1:
return False
return True

Let's now try out our methods on our examples. In this example, we will randomly assign probability values to different transitions:

>>> transition_periodic = [[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0.5, 0, 0, 0.5, 0],
[0, 0, 0, 0, 1],
[0, 0, 1, 0, 0]]
>>> transition_aperiodic = [[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0.5, 0.25, 0, 0.25, 0],
[0, 0, 0, 0, 1],
[0, 0, 0.5, 0.5, 0]]
>>> markov_periodic = MarkovChain(transition_matrix=transition_periodic,
states=['A', 'B', 'C', 'D', 'E'])
>>> markov_aperiodic = MarkovChain(transition_matrix=transition_aperiodic,
states=['A', 'B', 'C', 'D', 'E'])

>>> markov_periodic.get_period('A')
3
>>> markov_periodic.get_period('C')
3
>>> markov_aperiodic.is_periodic()
False

>>> markov_aperiodic.get_period('A')
1
>>> markov_aperiodic.get_period('B')
1
>>> markov_aperiodic.is_periodic()
True

Transience and recurrence

Given that we start at state i, it is called transient if there is a non-zero probability that we will never return to state i. To define this in more formal terms, let's consider a random variable Ti as the first return time to state i:

Let's now define another term, , as the probability of the system returns to state i after n steps:

Now we can define that any given state i is transient if the following condition is met:

In the preceding equation, we are basically checking whether the total sum of probabilities of returning to state i in step sizes less than is less than 1. If the total sum is less than 1, it would mean that the probability of Ti to be is greater than 0 which would mean that the state i is transient. The given state i is called recurrent if it is not transient:

Figure 1.6:

In the preceding example, we can see that states A and B are transient because A doesn't have any incoming edge. B does have an incoming edge, but it's incoming from another transient state and therefore it is also transient. Hence, once the system leaves state A or B, it won't be able to come back.

It is really simple to check whether a given state is transient or not. We can simply check whether there are any incoming edges from other states or not. If not, the state is transient. Let's write a simple method to check this for our MarkovChain class:

def is_transient(self, state):
"""
Checks if a state is transient or not.

Parameters
----------
state: str
The state for which the transient property needs to be checked.
"""
if all(self.transition_matrix[~self.index_dict[state], self.index_dict[state]] == 0):
return True
else:
return False

Now we can use this method in our example in Figure 1.6 to check which nodes are transient:

>>> transient_matrix = [[0, 0.5, 0.5, 0],
[0, 0, 0.25, 0.75],
[0, 0, 0, 1],
[0, 0, 0.5, 0.5]]
>>> transient_markov = MarkovChain(transition_matrix=transient_matrix,
states=['A', 'B', 'C', 'D']) >>> transient_markov.is_transient('A')
True
>>> transient_markov.is_transient('B')
True
>>> transient_markov.is_transient('C')
False

In the following subsections, we will talk about the statistical properties of the random variable Ti.

Mean recurrence time

The first-return time for the initial state i is also known as the hitting time. It was represented using the random variable Ti in the previous section. The mean recurrence time of state i is defined as its expected return time:

If the mean recurrence time, Mi, is finite, the state is called positive recurrent, otherwise it is called null recurrent.

Expected number of visits

As is evident from the name, the expected number of visits for any state i is the number of times the system is expected to be in that state. Also, a given state i is recurrent if and only if the expected number of visits to i is infinite:

Absorbing states

State i is said to be an absorbing state if it is impossible for a system to leave that state once it reaches it. For a state to be an absorbing state, the probability of staying in the same state should be 1, and all the other probabilities should be 0:

In a Markov chain, if all the states are absorbing, then we call it an absorbing Markov chain:

Figure 1.7: An example showing an absorbing state C, since the probability of transitioning from state C to C is 1

Again, we can add a very simple method to check for absorbing states in our MarkovChain class:

def is_absorbing(self, state):
"""
Checks if the given state is absorbing.

Parameters
----------
state: str
The state for which we need to check whether it's absorbing
or not.
"""
state_index = self.index_dict[state]
if self.transition_matrix[state_index, state_index]

We can again check whether our state in the example is absorbing by creating a Markov chain and using the is_absorbing method:

>>> absorbing_matrix = [[0, 1, 0],
[0.5, 0, 0.5],
[0, 0, 1]]
>>> absorbing_chain = MarkovChain(transition_matrix=absorbing_matrix,
states=['A', 'B', 'C'])
>>> absorbing_chain.is_absorbing('A')
False
>>> absorbing_chain.is_absorbing('C')
True

Ergodicity

State i is said to be ergodic if it is recurrent, has a period of 1, and has a finite mean recurrence time. If all the states of a Markov chain are ergodic, then it's an ergodic Markov chain. In general terms, a Markov chain is ergodic if there is a number N, such that any state in the system can be reached from any other state in any number of steps greater than or equal to the number N. Therefore, in the case of a fully connected transition matrix, where all transitions have a non-zero probability, this condition is fulfilled with N=1.

Steady-state analysis and limiting distributions

In a Markov chain, vector π is called the stationary distribution if ∀ j ∈ s satisfies the following conditions:

The stationary distribution is one of the most important properties of Markov chains, and we will talk about it in much more detail in later sections of this chapter.

Continuous-time Markov chains

Continuous-time Markov chains are quite similar to discrete-time Markov chains except for the fact that in the continuous case we explicitly model the transition time between the states using a positive-value random variable. Also, we consider the system at all possible values of time instead of just the transition times.

Exponential distributions

The random variable x is said to have an exponential distribution with a rate of distribution of λ if its probability density function is defined as follows:

Here, the rate of distribution λ needs to be greater than 0. We can also compute the expectation of X as this:

We see that the expectation of X is inversely proportional to the rate of learning. This means that an exponential distribution with a higher rate of learning would have a lower expectation. The exponential distribution is often used to model problems that involve modelling time until some event happens. A simple example could be modelling the time before an alarm clock goes off, or the time before a server comes to your table in a restaurant. And, as we know , the higher the learning rate, the sooner we would expect the event to happen, and hence the name learning rate.

We can also compute the second moment and the variance of the exponential distribution:

And, using the first moment and the second moment, we can compute the variance of the distribution:

Figure 1.x: Probability distribution of exponential distribution

Now we will move on to some of the properties of the exponential distribution that are relevant to our example:

  • Memoryless: Figure 1.x shows a plot of an exponential distribution. In the diagram, we can clearly see that the graph after any given point (a in this case) is an exact copy of the original distribution. We can also say that an exponential distribution that is conditioned on (X > a) still stays exponential. If we think about this property in terms of our examples, it means that if we had an alarm clock, and at any time, t, we check that it still hasn't gone off, we can still determine the distribution over the time ahead of t, which will be the same exponential distribution. This property of the exponential distribution is known as being memoryless, since at any given point in time, if you know the current state of the system (in this example, that the alarm hasn't gone off), you can determine the probability distribution over time in the future. This property of exponential distributions is quite similar to Markov chains, as you may recall from previous sections.
  • Probability of minimum value: Let's say we have n independent exponential distributions over the random variables X0, . . ., Xn with learning rates λ0, ..., λn, respectively. For these distributions, we can prove that the distribution of min(X0, . . ., Xn) is also an exponential distribution with learning rate :

We will use both of these properties of the exponential distribution in our example for the continuous time Markov chain in a later section.

Poisson process

The Poisson process is a continuous process, and there can be multiple interpretations of it, which lead to different possible definitions. In this section, we will start with the formal definition and build up to a more simple, intuitive definition. A continuous-time stochastic process N(t):t > 0 is a Poisson process with a rate λ > 0 if the following conditions are met:

  • N(0) = 0
  • It has stationary and independent increments
  • The distribution of N(t) is Poisson with mean λt:

First of all, we need to define what the stationary and independent increments are. For a continuous-time stochastic process, X(t): ≥ 0, an increment is defined as the difference in state of the system between two time instances; that is, given two time instances s and t with s < t, the increment from time s to time t is X(t) - X(s). As the name suggests, a process is said to have a stationary increment if its distribution for the increment depends only on the time difference.

In other words, a process is said to have a stationary increment if the distribution of X(t1) - X(s1) is equal to X(t2) - X(s2) if t1 > s1,t2 > s2 and t1 - s1 = t2 -s2. A process is said to have an independent increment if any two increments in disjointed time intervals are independent; that is, if t1 > s1 > t2 > s2, then the increments X(t2) - X(s2) and X(t1) - X(s1) are independent.

Now let's come back to defining the Poisson process. The Poisson process is essentially a counting process that counts the number of events that have occurred before time t. This count of the number of events before time t is given by N(t), and, similarly, the number of events occurring between time intervals t and t + s is given by N(t + s) - N(t). The value N(t + s) - N(t) is Poisson-distributed with a mean λs. We can see that the Poisson process has stationary increments in fixed time intervals, but as , the value of N(t) will also approach infinity; that is, . Another thing worth noting is that, as the value of λ increases, the number of events happening will also increase, and that is why λ is also known as the rate of the process.

This brings us to our second simplified definition of the Poisson process. A continuous-time stochastic process N(t): t ≥ 0 is called a Poisson process with the rate of learning λ > 0 if the following conditions are met:

  • N(0) = 0
  • It is a counting process; that is, N(T) gives the count of the number of events that have occurred before time t
  • The times between the events are distributed independently and identically, with an exponential distribution having a learning rate of λ

Continuous-time Markov chain example

Now, since we have a basic understanding of exponential distributions and the Poisson process, we can move on to the example to build up a continuous-time Markov chain. In this example, we will try to show how the properties of exponential distributions can be used to build up generic continuous-time Markov chains. Let's consider a hotel reception where n receptionists are working in parallel. Also consider that the guests arrive according to a Poisson process, with rate λ, and the service time for each guest is represented using an exponential random variable with learning rate µ. Also, if all the receptionists are busy when a new guest arrives, he/she will depart without getting any service. Now let's consider that a new guest arrives and finds all the receptionists are busy, and let's try to compute the expected number of busy receptionists in the next time interval.

Let's start by assuming that Tk represents the number of k busy receptionists in the next time instance. We can also use Tk to represent the expected number of busy receptionists found by the next arriving guest if k receptionists are busy at the current time instance. These two representations of Tk are equivalent because of the memoryless property of exponential distributions.

Firstly, T0 is clearly 0, because if there are currently 0 busy receptionists, the next arrival will also find 0 busy receptionists for sure. Now, considering T1, if there are currently i busy receptionists, the next arriving guest will find 1 busy receptionist if the time to the next arrival is less than the remaining service time for the busy receptionist. From the memoryless property, we know that the next arrival time is exponentially distributed with a learning rate of λ, and the remaining service time is also exponentially distributed with a learning rate of µ. Therefore, the probability that the next guest will find one receptionist busy is and hence the following is true:

In general, we consider the situation that k receptionists are busy. We can obtain an expression for Tk by conditioning on what happens first. When we have k receptionists busy, we can think of basically k+1 independent exponential distributions: k exponentials with a learning rate of µ for the remaining service time for each receptionist, and 1 exponential distribution with a learning rate of λ for the next arriving guest. In our case, we want to condition on whether a service completion happens first or a new guest arrives first. The time for a service completion will be the minimum of the k exponentials. This first completion time is also exponentially distributed with a learning rate of . Now, the probability of having a service completion before the next guest arrives is . Similarly, the probability of the next thing happening being a guest arrival is .

Now, based on this, we can say that if the next event is service completion, then the expected number of busy receptionists will be Tk-1. Otherwise, if a guest arrives first, there will be k busy receptionists. Therefore we have the following:

We need to just solve this recursion now. T2 will be given by this equation:

If we continue this same pattern, we will get T3 as follows:

We can see a pattern in the values of T1 and T2, and therefore we can write a general term for it as follows:

Let's point out our observations on the previous example:

  • At any given time instance, if there are i busy receptionists, for i < n there are i + 1 independent exponential distributions, with i of them having rate µ, and 1 of them having rate λ. The time until the process makes a jump is exponential, and its rate is given by iµ + λ. If all the receptionists are busy, then only the n exponential distributions corresponding to the service time can trigger a jump, and the time until the process makes a jump is exponential with rate .
  • When the process jumps from state i, for i < n, it jumps to state i + 1 with probability , and jumps to state i - 1 with probability of .
  • When the process makes a jump from state i, we can start up a whole new set of distributions corresponding to the state we jumped to. This is because, even though some of the old exponential distributions haven't triggered, it's equivalent to resetting or replacing those distributions.
Every time we jump to state i, regardless of when the time is, the distribution of how long we stay in state i and the probabilities of where we jump to next when we leave state i are the same. In other words, the process is time-homogeneous.

The preceding description of a continuous-time stochastic process corresponds to a continuous-time Markov chain. In the next section, we will try to define it in a more formal way.

Continuous-time Markov chain

In the previous section, we showed an example of a continuous-time Markov chain to give an indication of how it works. Let's now move on to formally define it. In a continuous-time Markov chain with a discrete state space S, for each state i ∈ S we have an associated set of ni independent exponential distributions with rates qi, j1, ..., qi,jni, where j1, ..., jni is the set of possible states the process may jump to when it leaves state i. And, when the process enters state i, the amount of time it spends in state i is exponentially distributed with rate vi = qij1+...+qijni, and when it leaves state i it will go to state jl with probability for l = 1, ...,ni.

We can also extend the Markov property from the discrete-time case to continuous time.

For a continuous-time stochastic process (X(t) : t ≥ 0) with state space S, we say it has the Markov property if the following condition is met:

Here, 0 ≤ t1 ≤ t2 ≤. . . .tn-1 ≤ s ≤ t is any non-decreasing sequence of n + 1 times, and i1, i2, . . ., in-1, i, j ∈ S are any n + 1 states in the state space, for any integer n ≥ 1.

Similarly, we can extend time-homogeneity to the case of continuous-time Markov chains. We say that a continuous-time Markov chain is time homogenous if, for any s ≤ t, and any states i, j ∈ S, the following condition is met:

As in the case of discrete-time Markov chains, a continuous-time Markov chain does not need to be time-homogeneous, but non-homogeneous Markov chains are out of scope for this book. For more details on non-homogeneous Markov chains, you can refer to Cheng-Chi Huang's thesis on the topic: https://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=8613&context=rtd.

Now let's define the transition probability for a continuous-time Markov chain. Just as the rates qij in a continuous-time Markov chain are the counterpart of the transition probabilities pij in a discrete-time Markov chain, there is a counterpart to the n-step transition probabilities pij(t) for a time-homogeneous, continuous-time Markov chain, which is defined as follows:

Summary

In this chapter, we gave a detailed introduction to Markov chains. We talked about different types of Markov chains, mainly chains with a discrete state space, with either discrete time or continuous time. We also introduced the concepts of time-homogeneous and non-time-homogeneous Markov chains. We discussed the different properties of Markov chains in detail, and provided relevant examples and code.

Markov chains and their properties are the basic concepts on which HMMs are built. In the next chapter, we will discuss HMMs in much more detail.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Build a variety of Hidden Markov Models (HMM)
  • Create and apply models to any sequence of data to analyze, predict, and extract valuable insights
  • Use natural language processing (NLP) techniques and 2D-HMM model for image segmentation

Description

Hidden Markov Model (HMM) is a statistical model based on the Markov chain concept. Hands-On Markov Models with Python helps you get to grips with HMMs and different inference algorithms by working on real-world problems. The hands-on examples explored in the book help you simplify the process flow in machine learning by using Markov model concepts, thereby making it accessible to everyone. Once you’ve covered the basic concepts of Markov chains, you’ll get insights into Markov processes, models, and types with the help of practical examples. After grasping these fundamentals, you’ll move on to learning about the different algorithms used in inferences and applying them in state and parameter inference. In addition to this, you’ll explore the Bayesian approach of inference and learn how to apply it in HMMs. In further chapters, you’ll discover how to use HMMs in time series analysis and natural language processing (NLP) using Python. You’ll also learn to apply HMM to image processing using 2D-HMM to segment images. Finally, you’ll understand how to apply HMM for reinforcement learning (RL) with the help of Q-Learning, and use this technique for single-stock and multi-stock algorithmic trading. By the end of this book, you will have grasped how to build your own Markov and hidden Markov models on complex datasets in order to apply them to projects.

Who is this book for?

Hands-On Markov Models with Python is for you if you are a data analyst, data scientist, or machine learning developer and want to enhance your machine learning knowledge and skills. This book will also help you build your own hidden Markov models by applying them to any sequence of data. Basic knowledge of machine learning and the Python programming language is expected to get the most out of the book

What you will learn

  • Explore a balance of both theoretical and practical aspects of HMM
  • Implement HMMs using different datasets in Python using different packages
  • Understand multiple inference algorithms and how to select the right algorithm to resolve your problems
  • Develop a Bayesian approach to inference in HMMs
  • Implement HMMs in finance, natural language processing (NLP), and image processing
  • Determine the most likely sequence of hidden states in an HMM using the Viterbi algorithm

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Sep 27, 2018
Length: 178 pages
Edition : 1st
Language : English
ISBN-13 : 9781788629331
Category :
Languages :
Concepts :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Sep 27, 2018
Length: 178 pages
Edition : 1st
Language : English
ISBN-13 : 9781788629331
Category :
Languages :
Concepts :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 120.97
Recurrent Neural Networks with Python Quick Start Guide
$32.99
Python Reinforcement Learning Projects
$48.99
Hands-On Markov Models with Python
$38.99
Total $ 120.97 Stars icon
Banner background image

Table of Contents

10 Chapters
Introduction to the Markov Process Chevron down icon Chevron up icon
Hidden Markov Models Chevron down icon Chevron up icon
State Inference - Predicting the States Chevron down icon Chevron up icon
Parameter Learning Using Maximum Likelihood Chevron down icon Chevron up icon
Parameter Inference Using the Bayesian Approach Chevron down icon Chevron up icon
Time Series Predicting Chevron down icon Chevron up icon
Natural Language Processing Chevron down icon Chevron up icon
2D HMM for Image Processing Chevron down icon Chevron up icon
Markov Decision Process Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Half star icon Empty star icon Empty star icon 2.3
(4 Ratings)
5 star 0%
4 star 25%
3 star 0%
2 star 50%
1 star 25%
RFEMYGDIO Jan 21, 2019
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
Excelente
Amazon Verified review Amazon
DWH Dec 02, 2018
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
Although the algorithms in this book are generally correct it is riddled with crippling errors. There are undefined variables and out of range errors in almost every example. These are still present in the code that you download directly from the publisher. Buyer beware, you'll spend more time troubleshooting than learning.
Amazon Verified review Amazon
Victoria Sherratt Jul 10, 2021
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
It's ok, not great. It's printed by Amazon and some of the graphics are a bit low quality. What annoyed me most about this book is the chapter I was most interested in, the authors didn't bother with the code - "it would be too long for this book" - but in later chapters presented pages of back to back code. The book is not long at all so why skirt over it?
Amazon Verified review Amazon
Damian Jan Cordon Jun 26, 2019
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
Las fórmulas y gráficos de este libro son diminutas en el Kindle y no es posible aumentar su tamaño, lo que hace imposible seguir correctamente los razonamientos que aplica ya que no es posible acceder a la justificacion matemática de lo que explica.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.