Hidden Markov Model
Usually when there is a temporal or sequential structure in the data, the data that are later the sequence are correlated with the data that arrive prior in the sequence. For example, whether today rains is correlated with whether it rained yesterday and in some degree (possibly) decreasing with the day before yesterday and so on. There are many datasets with the sequential property that need to be modeled. Lnaguage is another such dataset in which the word that comes next depends upon the previous word(s). One of the simpler ways, nonetheless effective in many cases, of modeling such sequential data is known as Markov Modeling.
Markov models
A Markov model is a state based model which assumes that the probability distribution of next states in the sequence is completely determined by the current state. Since, such models can be represented as graphs or chains, they are also known as Markov chains. This assumption although restrictive is powerful in modeling complex systems like page ranking. In NLP, Markov chains were one of the first models used to model the natural language. Although, the basic version of the Markov model restricts the dependence of next state on the current state alone, there are n-th order Markov chains which allow the modeling of dependencies on n-previous states.
- Transition probabilty
Observations and Hidden states
In many cases, however, the Markov chain alone can be insufficient when we are unable to directly observe the states of the system that we are interested in predicting. For example, let us suppose we have the record of a person’s daily water in take (in number of glasses) in the summer of 1997. And we are given the task of predicting the temperature that summer every day. For simplicity let us assume that we have three possible temperatures (cold, warm, hot). The person probably drinks more water on a hot day than on a cold day. However, there is variance in the data that can be due to several other factors like the activities of the person on that particular day.
In such cases, the temperature is modeled as hidden state and the water consumptions are the observations.
- Transition prbability
- Emission probability
- Initial probabilty
Observations
HMM assumptions
- Transition between states is a Markov process.
- The likehood of an observation depends only on the current state.
Likelihood
\(P(O|\lambda) = \sum_{over h \in H}P(O|h)P(h)\)
Forward trelis
def alpha(
observations: List[int],
transition_prob: np.ndarray,
emission_prob: np.ndarray,
initial_prob: np.ndarray,
) -> Tuple[float, np.ndarray]:
"""Compute a forward trelis for likelihood of observations given
HMM parameters
e.g. alpha[t][i] = P(o_1,o_2,...o_t, q_t=i| HMM parameters)
Args:
observations (List[int]): list of sequential observations
transition_prob (np.ndarray): transition probability matrix for hidden markov states
emission_prob (np.ndarray): emission probability matrix
initial_prob (np.ndarray): probability for initial states
Returns:
Tuple[float, np.ndarray]: likelihood of the observations, alpha probability matrix
"""
n_states = len(transition_prob)
n_observations = len(observations)
lkhood = 0
alpha_memoization = np.zeros((n_observations, n_states))
for i in range(n_observations):
for j in range(n_states):
if i == 0:
# initial conditions
alpha_memoization[i][j] = (
initial_prob[j] * emission_prob[j][observations[i]]
)
else:
for k in range(n_states):
alpha_memoization[i][j] += (
alpha_memoization[i - 1][k]
* transition_prob[k][j]
* emission_prob[j][observations[i]]
)
for i in range(n_states):
lkhood += alpha_memoization[n_observations - 1][i]
return lkhood, alpha_memoizationBackward trelis
def beta(observations, transition_prob, emission_prob, initial_prob):
"""Compute a backward trelis for likelihood of future observations
given HMM parameters
e.g. beta[t][i] = P(o_{t+1},...,o_T, q_t=i| HMM parameters)
Args:
observations ([type]): [description]
transition_prob ([type]): [description]
emission_prob ([type]): [description]
Returns:
[type]: [description]
"""
n_states = len(transition_prob)
n_observations = len(observations)
beta_memoization = np.zeros((n_observations, n_states))
for t in range(n_observations - 1, -1, -1):
for j in range(n_states):
if t == n_observations - 1:
# boundary conditions
beta_memoization[t][j] = 1
else:
for k in range(n_states):
beta_memoization[t][j] += (
beta_memoization[t + 1][k]
* transition_prob[j][k]
* emission_prob[k][observations[t + 1]]
)
lkhood = 0
for k in range(n_states):
lkhood += (
beta_memoization[0][k] * initial_prob[k] * emission_prob[k][observations[0]]
)
return lkhood, beta_memoization