September 7, 2009

What is a Hidden Markov Model?

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 1:32 am
Tags: , , , ,

Hidden Markov Models

Hidden Markov models (HMMs) are a formal foundation for making probabilistic models of linear sequence ‘labeling’ problems. They provide a conceptual toolkit for building complex models just by drawing an intuitive picture. They are at the heart of a diverse range of programs, including genefinding, profile searches, multiple sequence alignment and regulatory site identification.

A Markov model is a probabilistic process over a finite set, {S1, …, Sk}, usually called its states. Each state-transition generates a character from the alphabet of the process.

A Hidden Markov Model (HMM) is simply a Markov Model in which the states are hidden. For example, suppose we only had the sequence of throws from the 3-coin example above, and that the upper-case v. lower-case information had been lost.


We can never be absolutely sure which coin was used at a given point in the sequence but we can calculate the probability.

What’s Hidden in HMM?

It’s useful to imagine an HMM generating a sequence. When we visit a state, we emit a residue from the state’s emission probability distribution. Then, we choose which state to visit next according to the state’s transition probability distribution. The model thus generates two strings of information. One is the underlying state path (the labels), as we transition from state to state. The other is the observed sequence (the DNA), each residue being emitted from one state in the state path.

The state path is a Markov chain, meaning that what state we go to next depends only on what state we’re in. Since we’re only given the observed sequence, this underlying state path is hidden—these are the residue labels that we’d like to infer. The state path is a hidden Markov chain.

Here is a link to an interesting paper on HMMs: