3 minute read

Usually when there is a temporal or sequential structure in the data, the data that are later the sequence are correlated with the data that arrive prior in the sequence. For example, whether today rains is correlated with whether it rained yesterday and in some degree (possibly) decreasing with the day before yesterday and so on. There are many datasets with the sequential property that need to be modeled. Lnaguage is another such dataset in which the word that comes next depends upon the previous word(s). One of the simpler ways, nonetheless effective in many cases, of modeling such sequential data is known as Markov Modeling.

Markov models

A Markov model is a state based model which assumes that the probability distribution of next states in the sequence is completely determined by the current state. Since, such models can be represented as graphs or chains, they are also known as Markov chains. This assumption although restrictive is powerful in modeling complex systems like page ranking. In NLP, Markov chains were one of the first models used to model the natural language. Although, the basic version of the Markov model restricts the dependence of next state on the current state alone, there are n-th order Markov chains which allow the modeling of dependencies on n-previous states.

  • Transition probabilty

Observations and Hidden states

In many cases, however, the Markov chain alone can be insufficient when we are unable to directly observe the states of the system that we are interested in predicting. For example, let us suppose we have the record of a person’s daily water in take (in number of glasses) in the summer of 1997. And we are given the task of predicting the temperature that summer every day. For simplicity let us assume that we have three possible temperatures (cold, warm, hot). The person probably drinks more water on a hot day than on a cold day. However, there is variance in the data that can be due to several other factors like the activities of the person on that particular day.

In such cases, the temperature is modeled as hidden state and the water consumptions are the observations.

  • Transition prbability
  • Emission probability
  • Initial probabilty

Observations

HMM assumptions

  1. Transition between states is a Markov process.
  2. The likehood of an observation depends only on the current state.

Likelihood

\(P(O|\lambda) = \sum_{over h \in H}P(O|h)P(h)\)

Forward trelis

def alpha(
    observations: List[int],
    transition_prob: np.ndarray,
    emission_prob: np.ndarray,
    initial_prob: np.ndarray,
) -> Tuple[float, np.ndarray]:
    """Compute a forward trelis for likelihood of observations given
    HMM parameters
    e.g. alpha[t][i] = P(o_1,o_2,...o_t, q_t=i| HMM parameters)

    Args:
        observations (List[int]): list of sequential observations
        transition_prob (np.ndarray): transition probability matrix for hidden markov states
        emission_prob (np.ndarray): emission probability matrix
        initial_prob (np.ndarray): probability for initial states

    Returns:
        Tuple[float, np.ndarray]: likelihood of the observations, alpha probability matrix
    """
    n_states = len(transition_prob)
    n_observations = len(observations)
    lkhood = 0
    alpha_memoization = np.zeros((n_observations, n_states))
    for i in range(n_observations):
        for j in range(n_states):
            if i == 0:
                # initial conditions
                alpha_memoization[i][j] = (
                    initial_prob[j] * emission_prob[j][observations[i]]
                )
            else:
                for k in range(n_states):
                    alpha_memoization[i][j] += (
                        alpha_memoization[i - 1][k]
                        * transition_prob[k][j]
                        * emission_prob[j][observations[i]]
                    )
    for i in range(n_states):
        lkhood += alpha_memoization[n_observations - 1][i]
    return lkhood, alpha_memoization

Backward trelis

def beta(observations, transition_prob, emission_prob, initial_prob):
    """Compute a backward trelis for likelihood of future observations
    given HMM parameters
    e.g. beta[t][i] = P(o_{t+1},...,o_T, q_t=i| HMM parameters)

    Args:
        observations ([type]): [description]
        transition_prob ([type]): [description]
        emission_prob ([type]): [description]

    Returns:
        [type]: [description]
    """
    n_states = len(transition_prob)
    n_observations = len(observations)
    beta_memoization = np.zeros((n_observations, n_states))

    for t in range(n_observations - 1, -1, -1):
        for j in range(n_states):
            if t == n_observations - 1:
                # boundary conditions
                beta_memoization[t][j] = 1
            else:
                for k in range(n_states):
                    beta_memoization[t][j] += (
                        beta_memoization[t + 1][k]
                        * transition_prob[j][k]
                        * emission_prob[k][observations[t + 1]]
                    )
    lkhood = 0
    for k in range(n_states):
        lkhood += (
            beta_memoization[0][k] * initial_prob[k] * emission_prob[k][observations[0]]
        )
    return lkhood, beta_memoization

Decoding

Learning

Applications

Updated: