March 31, 2010

QDD: a user-friendly program to select micro satellite markers and design primers from large sequencing projects

Filed under: Bioinformatics,Computational Biology,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 12:02 am
Tags: , , , , ,

QDD is an open access program providing a user-friendly tool for microsatellite detection and primer design from large sets of DNA sequences. The program is designed to deal with all steps of treatment of raw sequences obtained from pyrosequencing of enriched DNA libraries, but it is also applicable to data obtained through other sequencing methods, using FASTA files as input. The following tasks are completed by QDD: tag sorting, adapter/vector removal, elimination of redundant sequences, detection of possible genomic multicopies (duplicated loci or transposable elements), stringent selection of target microsatellites and customizable primer design. It can treat up to one million sequences of a few hundred base pairs in the tag-sorting step, and up to 50 000 sequences in a single input file for the steps involving estimation of sequence similarity.

Availability: QDD is freely available under the GPL licence for Windows and Linux from the following web site:

December 7, 2009

Career in Bioinformatics

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 6:55 am
Tags: , , ,

For many stuidents, bioinformatics is still a puzzle. What before bioinformatics, what is bioinformatics and ahat after bioinformaics? These are some common and the most typical questions which people want to know. While broewing through the latest articles on pubmed central, a paper authored by Shoba Ranganathan caught our attention. Its titled somewhat like this – “Towards a career in bioinformatics“. Wide eyed I started reading the article and no doubt found it informative, intresting and useful. Below is a small summary of the article.

Science is itself a quest for truth and honesty in scientific endeavours is the keystone to a successful career. Scientific integrity in presenting research results and honesty in dealing with colleagues are invaluable to a scientific career, especially one that deals with large datasets. In this context, acknowledging the prior work of other scientists is important.

Domain knowledge is the key to a successful career in bioinformatics. “Computational biology” is not merely a sum of its parts, viz. computer science/informatics and biology. It also requires knowledge of mathematics, statistics, biochemistry and sometimes a nodding acquaintance with physics, chemistry and medical sciences. A career is bioinformatics requires problem solving. Here, you need to show persistence in following your hypothesis, even if others think that you are wrong. At the same time, be prepared to modify your hypothesis if the data suggests otherwise. Reaching your ultimate goal is of principal importance, no matter which path you follow.

Many graduate students simply see their bioinformatics Ph.D. as a goal. For a career, you must make plans for the next year, next three years and maybe even the next five years. Graduate school, your first job, your next job, your publication profile can all be planned as projects using project management tools. Without plans, you are drifting on the internet, without a specific search in mind.

Among the numerous areas of bioinformatics endeavour, traditional avenues such as sequence analysis, genetic and population analysis, structural bioinformatics, text mining and ontologies are represented in this supplement, while chemoinformatics and biodiversity informatics embody emerging bioinformatics themes. In order to carry out bioinformatics research, innovative teaching is a prerequisite. Improvement in bioinformatics learning is evident from the case study using e-learning tools.

This paper covers many areas of bioinformatics which might prove useful for graduates and post graduates. Here is the link to the full article:

Have a promising career in Bioinformatics !!

November 11, 2009

PLAST: Parallel Local Alignment Search Tool for Database Comparison

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 7:17 am
Tags: , , , , ,

Genomic sequence comparison is a central task in computational biology for identifying closely related protein or DNA sequences. Similarities between sequences are commonly used, for instance, to identify functionality of new genes or to annotate new genomes. Algorithms designed to identify such similarities have long been available and still represent an active research domain, since this task remains critical for many bioinformatics studies. Two avenues of research are generally explored to improve these algorithms, depending on the target application.

1. The first aims to increase sensitivity.
2. While the second seeks to minimize computation time.
With next generation sequencing technology, the challenge is not only to develop new algorithms capable of managing large amounts of sequences, but also to imagine new methods for processing this mass of data as quickly as possible.

The PLAST program is a pure software implementation designed to exploit the internal parallel features of modern microprocessors. The sequence comparison algorithm has been structured to group together the most time consuming parts inside small critical sections that have good properties for parallelism. The resulting code is both well-suited for fine-grained (SIMD programming model) and medium-grained parallelization (multithreaded programming model). The first level of parallelism is supported by SSE instructions. The second is exploited with the multicore architecture of the microprocessors.

PLAST has been primarily designed to compare large protein or DNA banks. Unlike BLAST, it is not optimized to perform large database scanning. It is intended more for use in intensive comparison processes such as bioinformatics workflows, for example, to annotate new sequenced genomes. Different versions have been developed based on the BLAST family model: PLASTP for comparing two protein banks, TPLASTN for comparing one protein bank with one translated DNA bank (or genome) and PLASTX for comparing one translated DNA bank with one protein bank. The input format is the well-known FASTA format. No pre-processing (such as formatdb) is required. Like BLAST, the PLAST algorithm detects alignment using a seed heuristic method, but does so in a slightly different way. Consequently, it does not provide the same alignments, especially when there is little similarity between two sequences: some alignments are found by PLAST and not by BLAST, others are found by BLAST and not by PLAST. Nonetheless, comparable selectivity and sensitivity were measured using ROC curve, coverage versus error plot, and missed alignments.

PLAST can be downloaded from here:






October 6, 2009

Dinucleotide Properties Genome Browser: DiProGB

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 6:08 am
Tags: , , , ,

Whole genomes have now been sequenced from a long time. The basic aim of computational genome analysis is to understand the information encoded in the genomes. Apart from the nucleotide sequence, the related physical properties also play an important role.

DiProGB is a standalone computer program written in VC++.It is an easy to use new genome browser that encodes the primary nucleotide sequence by thermodynamical and geometrical dinucleotide properties. The nucleotide sequence is thus converted into a sequence graph. This visualization, supported by different graph manipulation options, facilitates genome analyses, because the human brain can process visual information better than textual information. Also, DiProGB can identify genomic regions where certain physical properties are more conserved than the nucleotide sequence itself.DiProGB adds a new dimension to the common genome analysis approaches by taking into account the physical properties of DNA and RNA.

In DiProGB all annotated features such as genes, exons, introns or repeat regions and the corresponding qualifiers such as gene name, product and function can be separately addressed and specifically colored. All or parts of the annotated information can be displayed for either a single strand or for both strands together. Overlapping features are visualized by stacked bars in the so-called feature graph below the sequence graph.

More about DiProGB can be read from here: