March 31, 2010

QDD: a user-friendly program to select micro satellite markers and design primers from large sequencing projects

Filed under: Bioinformatics,Computational Biology,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 12:02 am
Tags: , , , , ,

QDD is an open access program providing a user-friendly tool for microsatellite detection and primer design from large sets of DNA sequences. The program is designed to deal with all steps of treatment of raw sequences obtained from pyrosequencing of enriched DNA libraries, but it is also applicable to data obtained through other sequencing methods, using FASTA files as input. The following tasks are completed by QDD: tag sorting, adapter/vector removal, elimination of redundant sequences, detection of possible genomic multicopies (duplicated loci or transposable elements), stringent selection of target microsatellites and customizable primer design. It can treat up to one million sequences of a few hundred base pairs in the tag-sorting step, and up to 50 000 sequences in a single input file for the steps involving estimation of sequence similarity.

Availability: QDD is freely available under the GPL licence for Windows and Linux from the following web site:

March 29, 2010

webMGR: an online tool for the multiple genome rearrangement problem

Filed under: Bioinformatics,Computational Biology,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 12:00 am
Tags: , , , ,

The algorithm MGR enables the reconstruction of rearrangement phylogenies based on gene or synteny block order in multiple genomes. Although MGR has been successfully applied to study the evolution of different sets of species, its utilization has been hampered by the prohibitive running time for some applications. In the current work, we have designed new heuristics that significantly speed up the tool without compromising its accuracy. Moreover, we have developed a web server (webMGR) that includes elaborate web output to facilitate navigation through the results.

webMGR can be accessed via

January 14, 2010

EGAN – Exploratory Gene Association Networks

Filed under: Bioinformatics,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 7:27 am
Tags: , , , ,

EGAN (Exploratory Gene Association Networks) is a software tool that allows a domain expert (a biologist) to visualize and interpret the results of high-throughput exploratory assays in an interactive hypergraph of entities (genes), relationships (protein-protein interactions, literature co-occurence, etc.) and meta-data (annotation, signaling pathways, etc.). EGAN provides comprehensive, automated calculation of meta-data coincidence (over-representation, enrichment) for user- and assay-defined entity subsets (gene lists), and provides direct links to web resources and literature (NCBI Entrez
Gene, PubMed, KEGG, Google, etc.).

EGAN has been built using Cytoscape libraries for graph visualization and layout, and is comparable to DAVID, GSEA, Ingenuity IPA and Ariadne Pathway Studio.

For more reading refer this link –

December 14, 2009

Applications of Systems Biology in Drug Discovery

Filed under: Bioinformatics,Chemoinformatics,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 4:33 am
Tags: , , ,

Till date we have made a lot of posts on Systems Biology, its applications and it scope. Indeed, Systems Biology has brought a big revolution in cell biology and pathway analysis. When seen in combination with treatment of diseases and drug discovery, it proves even more handy. Here we discuss Systems Biology in combination with drug discovery.

The goal of modern systems biology is to understand physiology and disease from the level of molecular pathways, regulatory networks, cells, tissues, organs and ultimately the whole organism. As currently employed, the term ‘systems biology’ encompasses many different approaches and models for probing and understanding biological complexity, and studies of many organisms from bacteria to man. Much of the academic focus is on developing fundamental computational and informatics tools required to integrate large amounts of reductionist data (global gene expression, proteomic and metabolomic data) into models of regulatory networks and cell behavior. Because biological complexity is an exponential function of the number of system components and the interactions between them, and escalates at each additional level of organization.

There are basically three advances in the practical applications of systems biology to drug discovery. These are:

1. Informatic integration of ‘omics’ data sets (a bottom-up approach)

Omics approaches to systems biology focus on the building blocks of complex systems (genes, proteins and metabolites). These approaches have been adopted wholeheartedly by the drug industry to complement traditional approaches to target identification and validation, for generating hypotheses and for experimental analysis in traditional hypothesis-based methods.

2. Computer modeling of disease or organ system physiology from cell and organ response level information available in the literature (a top-down approach to target selection, clinical indication and clinical trial design).
The goal of modeling in systems biology is to provide a framework for hypothesis generation and prediction based on in silico simulation of human disease biology across the multiple distance and time scales of an organism. More detailed understanding of the systems behavior of intercellular signaling pathways, such as the identification of key nodes or regulatory points in networks or better understanding of crosstalk between pathways, can also help predict drug target effects and their translation to organ and organism level physiology.

3.  The use of complex human cell systems themselves to interpret and predict the biological activities of drugs and gene targets (a direct experimental approach to cataloguing complex disease-relevant biological responses).

Pathway modeling as yet remains too disconnected from systemic disease biology to have a significant impact on drug discovery. Top-down modeling at the cell-to-organ and organism scale shows promise, but is extremely dependent on contextual cell response data. Moreover, to bridge the gap between omics and modeling, we need to collect a different type of cell biology data—data that incorporate the complexity and emergent properties of cell regulatory systems and yet ideally are reproducible and amenable to storing in databases, sharing and quantitative analysis.

This is how Systems Biology has aided in Drug Discovery Research and paved its path to cure many vital diseases.

Read our other posts on Systems Biology –

December 1, 2009

Machine Learning in Bioinformatics: A Review

Filed under: Bioinformatics,Computational Biology,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 12:12 pm
Tags: , , , ,

Due to continued research there is a continuous groth in the amount of biological data available. The exponential growth of the amount of biological data available raises two problems:

1. Efficient information storage and management and, on the other hand, the extraction of useful information from these data.

2. It requires the development of tools and methods capable of transforming all these heterogeneous data into biological knowledge about the underlying mechanism.

 There are various biological domains where machine learning techniques are applied for knowledge extraction from data. The below figure shows the main areas of biology such as genomics, proteomics, microarrays, evolution and text mining where computational methods are being applied.


In addition to all the above applications, computational techniques are used to solve other problems, such as efficient primer design for PCR, biological image analysis and backtranslation of proteins (which is, given the degeneration of the genetic code, a complex combinatorial problem). Machine learning consists in programming computers to optimize a performance criterion by using example data or past experience. The optimized criterion can be the accuracy provided by a predictive model—in a modelling problem—, and the value of a fitness or evaluation function—in an optimization problem. Machine learning uses statistical theory when building computational models since the objective is to make inferences from a sample. The two main steps in this process are:

 1. To induce the model by processing the huge amount of data

2. To represent the model and making inferences efficiently.

 The process of transforming data into knowledge is both iterative and interactive. The iterative phase consists of several steps. In the first step, we need to integrate and merge the different sources of information into only one format. By using data warehouse techniques, the detection and resolution of outliers and inconsistencies are solved. In the second step, it is necessary to select, clean and transform the data. To carry out this step, we need to eliminate or correct the uncorrected data, as well as decide the strategy to impute missing data. This step also selects the relevant and non-redundant variables; this selection could also be done with respect to the instances. In the third step, called data mining, we take the objectives of the study into account in order to choose the most appropriate analysis for the data. In this step, the type of paradigm for supervised or unsupervised classification should be selected and the model will be induced from the data. Once the model is obtained, it should be evaluated and interpreted—both from statistical and biological points of view—and, if necessary, we should return to the previous steps for a new iteration. This includes the solution of conflicts with the current knowledge in the domain. The model satisfactorily checked—and the new knowledge discovered—are then used to solve the problem.

 An article published in the journal ‘Briefings in Bioinformatics’ gives an insight of various machine learning techniques used in Bioinformatics. It also throws light on some major techniques such as Bayesian classifiers, logistic regression, discriminant analysis, classification trees, nearest neighbour, neural networks, Support vector machines, clustering, Hidden Markov Models and much more.

 The article can be found here:


November 12, 2009

KEGGConverter: Tool for modelling Metabolic Networks

Filed under: Bioinformatics,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 7:47 am
Tags: , , , ,

The Kyoto Encyclopedia of Genes and Genomes (KEGG) PATHWAY database is a valuable comprehensive collection of manually curated pathway maps for metabolism, genetic information processing and other functions. It is an integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information as shown below. Genomic and chemical information represents the molecular building blocks of life in the genomic and chemical spaces, respectively, and systems information represents functional aspects of the biological systems, such as the cell and the organism, that are built from the building blocks. KEGG has been widely used as a reference knowledge base for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.

The KEGG Pathway database is a valuable collection of metabolic pathway maps. Nevertheless, the production of simulation capable metabolic networks from KEGG Pathway data is a challenging complicated work, regardless the already developed tools for this scope. Originally used for illustration purposes, KEGG Pathways through KGML (KEGG Markup Language) files, can provide complete reaction sets and introduce species versioning, which offers advantages for the scope of cellular metabolism simulation modelling.

In order to construct such metabolic pathways, the KEGGConvertor has been implemented. It is a tool implemented in JAVA. KEGGconverter is capable of producing integrated analogues of metabolic pathways appropriate for simulation tasks, by inputting only KGML files. The web application acts as a user friendly shell which transparently enables the automated biochemically correct pathway merging, conversion to SBML format, proper renaming of the species, and insertion of default kinetic properties for the pertaining reactions. It permits the inclusion of additional reactions in the resulting model which represent flux cross-talk with neighbouring pathways, providing in this way improved simulative accuracy.
KEGG Convertor is available here:

October 28, 2009

Useful Bioinformatics Links

Here are some useful and handy bioinformatics links which would aid in study of bioinformatics and various related fields:

October 13, 2009

QuickGo: A browser for Gene Ontology

Filed under: Bioinformatics,Computational Biology,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 12:44 pm
Tags: , , , ,

The Gene Ontology project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The aims of the Gene Ontology project are threefold:
1. Firstly, to maintain and further develop its controlled vocabulary of gene and gene product attributes.
2. Secondly, to annotate genes and gene products, and assimilate and disseminate annotation data.
3. Thirdly, to provide tools to facilitate access to all aspects of the data provided by the Gene Ontology project.

QuickGO is a web-based tool which allows easy browsing of the
Gene Ontology and all associated GO annotations provided by the
GOA group. It provides a comprehensive set of both electronic and
manual annotations from a large number of curation groups.QuickGO users can view and search information provided for GO terms (identifiers, words/phrases in the title or definition, cross-references and synonyms), as well as protein data from Uni- ProtKB (accession numbers, names and gene symbols). Results are ranked so that terms most closely matching the query are returned first. Individual words and combinations of words are scored according to the field in which they occur and their frequency within GO.

QuickGO is updated weekly with protein names, gene symbols, accessions and taxonomy data from UniProtKB. Single or multiple protein accessions can be queried and selected proteins will display all associated GO annotations, both electronic and manual.

QuickGo can be accessed from the EBI website. Here is the link:

October 7, 2009

BioSytems: A New Database for Biological Systems

Filed under: Bioinformatics,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 1:08 pm
Tags: , , , ,

Biological Systems are basically formed when a group of molecules interact together. A type of Biological Systems is a biological pathway. Basically a biological pathway comparises of interacting genes, proteins, and small molecules.An understanding of the components, products, and biological effects of biosystems can lead to better understanding of biological processes in normal and disease states, elucidation of possible drug effects and side effects, and other insights to complex processes that have implications for health and medicine.

NCBI has designed a BioSystems database which has a centralized access to existing pathway databases.

Current source databases supported by Biosystems database are:

1. KEGG: Kyoto Encyclopedia of Genes and Genomes ( by the Kanehisa Laboratory of the Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan.

2. BioCyc ( is a collection of organism-specific pathway/genome databases (PGDBs), and the EcoCyc ( subset of BioCyc is included in the NCBI BioSystems database.

3. Reactome ( is a curated knowledge base of biological pathways, and the human subset of Reactome is included in the NCBI BioSystems database. More about the Biosystems database can be read here:

September 24, 2009

Pathway Databases – A broader view

Filed under: Bioinformatics,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 7:58 am
Tags: , , , ,

Studying Reactome, actually led me to explore some more databases of pathways and reactions. While browing I eventually landed on a paper “Pathway databases and tools for their exploitation: benefits, current limitations and challenges” authored by Anna Bauer-Mehren, Laura I Furlong & Ferran Sanz. So, my todays post gives an abstract of what this paper is talking about.

Cell signalling studies have been going on from over a decade. This process basically refers to the biochemical processes using which cells respond to cues in their internal or external environment. This eventually led to the creation of chain of reactions and development of databases to store them in a compiled manner. Several databases containing information on cell signalling pathways have now been developed in conjunction with methodologies to access and analyse the data. At present, there are several repositories of information on cell signalling pathways that cover a wide range of signal transduction mechanisms and include high quality data in terms of annotation and cross references to biological databases.

Some of the online pathway databases have been nicely listed here:

This table basically lists Reactome, KEGG, Wikipathways, Nature interaction databases, pathway commons and many more….

The paper also explains the main standards for representation of biological networks, BioPAX and SBML. Furthermore, the advantages and drawbacks of current methods for pathway retrieval and integration, using the EGFR signalling as an illustrative example, have been discussed.

The paper is available here:

Next Page »