Biointelligence

August 7, 2009

COBALT: A new tool for Multiple Sequence Alignment

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 9:47 am
Tags:

The simultaneous alignment of multiple sequences (multiple alignment) serves as a building block in several fields of computational biology,  such as phylogenetic studies, detection of conserved motifs, prediction of functional residues and secondary structure, prediction of correlations and even quality assessment of protein sequences. For this an accurate  multiple sequence alignment tool was one of the biggest requirement from a long time.

COBALT (Constraint based Multiple Alignment Tool) is a multiple sequence alignment tool that finds a collection of pairwise constraints derived from conserved domain database, protein motif database, and sequence similarity, using RPS-BLAST, BLASTP, and PHI-BLAST. Pairwise constraints are then incorporated into a progressive multiple alignment.

COBALT has a general framework that uses progressive multiple alignment to combine pairwise constraints from different sources into a multiple alignment. COBALT does not attempt to use all available constraints but uses only a high-scoring consistent subset that can change as the alignment progresses, where a set of constraints is called consistent if all of the constraints in the set can be simultaneously satisfied by a multiple alignment. Using the RPS-BLAST tool, we can quickly search for domains in CDD that match to regions of input sequences. When the same domain matches to multiple sequences, we can infer several potential pairwise constraints based on these domain matches. Furthermore, CDD also contains auxiliary information that allows COBALT to create partial profiles for input sequences before progressive alignment begins, and this avoids computationally expensive procedures for building profiles.

COBALT has a general framework that uses progressive multiple alignment to combine pairwise constraints from different sources into a multiple alignment. COBALT does not attempt to use all available constraints but uses only a high-scoring consistent subset that can change as the alignment progresses, where a set of constraints is called consistent if all of the constraints in the set can be simultaneously satisfied by a multiple alignment. Using the RPS-BLAST tool, we can quickly search for domains in CDD that match to regions of input sequences. When the same domain matches to multiple sequences, we can infer several potential pairwise constraints based on these domain matches. Furthermore, CDD also contains auxiliary information that allows COBALT to create partial profiles for input sequences before progressive alignment begins, and this avoids computationally expensive procedures for building profiles.

COBALT is implemented in NCBI C++ Toolkit. More information on COBALT can be found at:

http://bioinformatics.oxfordjournals.org/cgi/content/full/23/9/1073

To access COBALT use this link: http://www.ncbi.nlm.nih.gov/tools/cobalt/cobalt.cgi?link_loc=BlastHomeAd

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: