**Hidden Markov models (HMMs)** are a formal foundation for making probabilistic models of linear sequence ‘labeling’ problems. They provide a conceptual toolkit for building complex models just by drawing an intuitive picture. They are at the heart of a diverse range of programs, including genefinding, profile searches, multiple sequence alignment and regulatory site identification.

A Markov model is a probabilistic process over a finite set, {S_{1}, …, S_{k}}, usually called its *states*. Each state-transition generates a character from the *alphabet* of the process.

A Hidden Markov Model (HMM) is simply a Markov Model in which the states are hidden. For example, suppose we only had the sequence of throws from the 3-coin example above, and that *the upper-case v. lower-case information had been lost.*

**HTHHTHHTTTHTTTHHTHHHHTTHTTHTTHT**...

We can never be absolutely sure which coin was used at a given point in the sequence but we *can* calculate the probability.

### What’s Hidden in HMM?

It’s useful to imagine an HMM generating a sequence. When we visit a state, we emit a residue from the state’s emission probability distribution. Then, we choose which state to visit next according to the state’s transition probability distribution. The model thus generates two strings of information. One is the underlying *state path* (the labels), as we transition from state to state. The other is the *observed sequence* (the DNA), each residue being emitted from one state in the state path.

The state path is a Markov chain, meaning that what state we go to next depends only on what state we’re in. Since we’re only given the observed sequence, this underlying state path is hidden—these are the residue labels that we’d like to infer. The state path is a *hidden Markov chain*.

Here is a link to an interesting paper on HMMs: http://www.nature.com/nbt/journal/v22/n10/full/nbt1004-1315.html

## Leave a Reply