Biointelligence

August 7, 2009

COBALT: A new tool for Multiple Sequence Alignment

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 9:47 am
Tags:

The simultaneous alignment of multiple sequences (multiple alignment) serves as a building block in several fields of computational biology,  such as phylogenetic studies, detection of conserved motifs, prediction of functional residues and secondary structure, prediction of correlations and even quality assessment of protein sequences. For this an accurate  multiple sequence alignment tool was one of the biggest requirement from a long time.

COBALT (Constraint based Multiple Alignment Tool) is a multiple sequence alignment tool that finds a collection of pairwise constraints derived from conserved domain database, protein motif database, and sequence similarity, using RPS-BLAST, BLASTP, and PHI-BLAST. Pairwise constraints are then incorporated into a progressive multiple alignment.

COBALT has a general framework that uses progressive multiple alignment to combine pairwise constraints from different sources into a multiple alignment. COBALT does not attempt to use all available constraints but uses only a high-scoring consistent subset that can change as the alignment progresses, where a set of constraints is called consistent if all of the constraints in the set can be simultaneously satisfied by a multiple alignment. Using the RPS-BLAST tool, we can quickly search for domains in CDD that match to regions of input sequences. When the same domain matches to multiple sequences, we can infer several potential pairwise constraints based on these domain matches. Furthermore, CDD also contains auxiliary information that allows COBALT to create partial profiles for input sequences before progressive alignment begins, and this avoids computationally expensive procedures for building profiles.

COBALT has a general framework that uses progressive multiple alignment to combine pairwise constraints from different sources into a multiple alignment. COBALT does not attempt to use all available constraints but uses only a high-scoring consistent subset that can change as the alignment progresses, where a set of constraints is called consistent if all of the constraints in the set can be simultaneously satisfied by a multiple alignment. Using the RPS-BLAST tool, we can quickly search for domains in CDD that match to regions of input sequences. When the same domain matches to multiple sequences, we can infer several potential pairwise constraints based on these domain matches. Furthermore, CDD also contains auxiliary information that allows COBALT to create partial profiles for input sequences before progressive alignment begins, and this avoids computationally expensive procedures for building profiles.

COBALT is implemented in NCBI C++ Toolkit. More information on COBALT can be found at:

http://bioinformatics.oxfordjournals.org/cgi/content/full/23/9/1073

To access COBALT use this link: http://www.ncbi.nlm.nih.gov/tools/cobalt/cobalt.cgi?link_loc=BlastHomeAd