January 13, 2010

Database of human Protein-DNA Interactions – hPDI

Filed under: Bioinformatics,Computational Biology,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 6:05 am
Tags: , , , ,

The characterization of the protein-DNA interactions usually requires three levels of analysis:

1.Genetic: Determination of the nucleotide sequence of the protein-binding region and the identification of sequence changes that confer a mutated phenotype.

2. Biochemical: Identification of potential protein-DNA contacts using a variety of footprinting and protection experiments such as DNaseI or hydroxyl radical footprinting, and methylation protection or ethylation protection experiments.

3. Physical: Analysis of specific interactions in protein-target sequence fragment co-crystals.

The hPDI database holds experimental protein-DNA interaction data for humans identified by protein microarray assays. The current release of hPDI contains 17,718 protein-DNA interactions for 1013 human DNA-binding proteins. These DNA-binding proteins include 493 human transcription factors (TFs) and 520 unconventional DNA binding proteins (uDBPs). This database is freely accessible for any academic purposes.

hPDI can be accessed from here:

December 15, 2009

Descriptor-based Fold Recognition System

Filed under: Bioinformatics,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 10:43 am
Tags: , , ,

Machine learning-based methods have been proven to be powerful in developing new fold recognition tools.
DescFold(Descriptor-based Fold Recognition System) is a web server for protein fold recognition,which can predict a protein’s fold type from its amino acid sequence. The server combines six effictive descriptors : a profile-sequence-alignment-based descriptor using Psi-blast e-values and bit scores, a sequence-profile-alignment-based descriptor using Rps-blast e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), a descriptor based on the occurrence of PROSITE functional motifs, a descriptor based on profile-profile-alignment(PPA) and a descriptor based on Profile-structural-profile-alignment (PSPA) .

When the PPA and PSPA descriptors were introduced, the new DescFold boosts the performance of fold recognition substantially. Using the SCOP_1.73_40% dataset as the fold library, the DescFold web server based on the trained SVM models was further constructed. To provide a large-scale test for the new DescFold, a stringent test set of 1,866 proteins were selected from the SCOP 1.75 version. At a less than 5% false positive rate control, the new DescFold is able to correctly recognize structural homologs at the fold level for nearly 46% test proteins. Additionally, we also benchmarked the DescFold method against several well-established fold recognition algorithms through the LiveBench targets and Lindahl dataset.

The DESC server is freely available at:


November 30, 2009

QUPE – For Mass Spectrometry based Quantitative Proteomics Research

Filed under: Bioinformatics,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 4:22 am
Tags: , , ,

Mass spectrometry (MS) is an indispensable technique for the fast analysis of proteins and peptides in complex biological samples. One key problem with the quantitative mass spectrometric analysis of peptides and proteins, however, is the fact that the sensitivity of MS instruments is peptide-dependent, leading to an unclear relationship between the observed peak intensity and the peptide concentration in the sample. Various labeling techniques have been developed to circumvent this problem, but are very expensive and time-consuming. A reliable prediction of peptide-specific sensitivies could provide a peptide-specific correction factor, which would be valuable for label-free absolute quantitation.

QUPE is an itegrated platform for storage and analysis of quantitative proteomics data, implemented in JAVA. Its is a repository and an algorithmic framework to store and analyse mass spectrometry based quantitative proteome experiments.QuPE provides an easily extensible and configurable job concept. Using XML, jobs consisting of one or more tools can be defined, where input and output types provided by the implementation of a tool determine the data a job is executed with. Due to specific interfaces, tools can announce their need for an interactive configuration. The job and tool concept allows the integration of routines written in R, a programming language, specifically designed for mathematical and statistical purposes. Below are listed the various features of QUPE:

– Webrowser-based application using Web 2.0 technologies
– Extensive capabilities to securely store and organise experiments and complete projects (fine-grained application-based security, GPMS)
– Import of mzData as well as mzXML
– Data model adapted to suggestions made by the HUPO proteomics standards initiative (PSI)
– Mascot integration, Import of DTASelect results
– Framework supporting analysis of quantitative proteomics data, including: – Quantification of stable-isotope labelled samples
– Significance tests, analysis of variance
– Principal component analysis

QUPE is hosted here:


November 9, 2009

CDD: Database for Interactive Domain Family Analysis

Filed under: Bioinformatics,Computational Biology,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 8:30 am
Tags: , , , , , ,

Protein domains may be viewed as units in the molecular evolution of proteins and can be organized into an evolutionary classification. The set of protein domains characterized so far appears to describe no more than a few thousand superfamilies, where members of each superfamily are related to each other by common descent. Computational annotation of protein function is generally obtained via sequence similarity: once a close neighbor with known function has been identified, its annotation is copied to the sequence with unknown function. This strategy may work very well in functionally homogeneous families and when applied only for very close neighbors or suspected orthologs, but it is doomed to fail often when domain or protein families are sufficiently diverse and when no close neighbors with known function are available.

NCBI’s conserved domain database (CDD) attempts to collate that set and to organize related domain models in a hierarchical fashion, meant to reflect major ancient gene duplication events and subsequent functional diversification. The conserved domain database (CDD) is part of NCBI’s Entrez database system and serves as a primary resource for the annotation of conserved domain footprints on protein sequences in Entrez.CDD provides a strategy toward a more accurate assessment of such neighbor relationships, similar to approaches termed ‘phylogenomic inference. CDD acknowledges that protein domain families may be very diverse and that they may contain sets of related subfamilies.

In CDD curation, we attempt to detect evidence for duplication and functional divergence in domain families by means of phylogenetic analysis. We record the resulting subfamily structure as a set of explicit models, but limit the analysis to ancient duplication events—several hundred million years in the past, as judged by the taxonomic distribution of protein sequences with particular domain subfamily footprints. CDD provides a search tool employing reverse position-specific BLAST (RPS–BLAST), where query sequences are compared to databases of position-specific score matrices (PSSMs), and E-values are obtained in much the same way as in the widely used PSI-BLAST application.

CDD is hosted here:





October 28, 2009

Useful Bioinformatics Links

Here are some useful and handy bioinformatics links which would aid in study of bioinformatics and various related fields:

October 22, 2009

The Structural Genomics Knowledgebase

Filed under: Bioinformatics,Computational Biology,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 5:05 am
Tags: , , , ,

Biology has become an increasingly data-rich subject. Many of the emerging fields of large-scale data-rich biology are designated by adding the suffix ‘-omics’ onto previously used terms. The importance to the life-science community as a whole of such large-scale approaches is reflected in the huge number of citations to many of the key papers in these fields; the human and mouse genome papers being the most obvious examples.

Well, in true sense, “Omics” is a general term for a broad discipline of science and engineering for analyzing the interactions of biological information objects in various ‘omes’. The main focus is on:
1) mapping information objects such as genes, proteins, and ligand.
2) finding interaction relationships among the objects
3) engineering the networks and objects to understand and manipulate the regulatory mechanisms
4) integrating various omes and omics subfields.

Structural Genomics is one such stream where a proper study of cellular and genetic components is performed. The RCSB Protein Data Bank (PDB) offers online tools, summary reports and target information related to the worldwide structural genomics initiatives from its portal at

There are currently three components to this site:
1) Structural Genomics Initiatives contains information and links on each structural genomics site, including progress reports, target lists, target status, targets in the PDB and level of sequence redundancy.
2) Targets provides combined target information, protocols and other data associated with protein structure determination
3) Structures offers an assessment of the progress of structural genomics based on the functional coverage of the human genome by PDB structures, structural genomics targets and homology models.

This is a free, comprehensive resource produced in a collaboration between the Protein Structure Initiative (PSI) and Nature Publishing Group (NPG) and is of a great help to the scinetific research community.

More about this can be read at:

October 12, 2009

MISTRAL: For Multiple Protein Structure Alignment

Filed under: Bioinformatics,Computational Biology,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 9:50 am
Tags: , ,

With a rapidly growing pool of known tertiary structures, the importance of protein structure comparison parallels that of sequence alignment. Detecting structural equivalences in two or more proteins is computationally demanding as it typically entails the exploration of the combinatorial space of all possible amino acid pairings in the parent protein.

A new tool MISTRAL has been developed for multiple protein alignment based on the minimization of an energy function over the low-dimensional space of the relative rotations and translations of the molecules.

An alignment of upto 20 sequences in PDB format can be submitted at a time, where the length of each protein sequence is limited to 500 amino acids. It can be used both a standalone version or can be accessed online. MISTRAL can be accessed online here:

September 17, 2009

ADAN: A Database for Prediction of Protein Protein Interactions

Filed under: Bioinformatics,Computational Biology,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 12:51 pm
Tags: , , ,

In the last post we had given an introductions to MIPS (Mammalian Prtein Protein Interaction database). Most of the structures and functions of proteome globular domains are yet unknown. We can use high-resolution structures from different modular domains in combination with automatic protein design algorithms to predict genome-wide potential interactions of a protein. Todays post introduces a database whcih helps in prediction of such protein interactions.

ADAN database is a collection of different modular protein domains (SH2, SH3, PDZ, WW, etc.). It contains 3505 entries with extensive structural and functional information available, manually integrated, curated and annotated with cross-references to other databases, biochemical and thermodynamical data, simplified coordinate files, sequence files and alignments. Prediadan, a subset of ADAN database, offers position-specific scoring matrices for protein?protein interactions, calculated by FoldX, and predictions of optimum ligands and putative binding partners. Users can also scan a query sequence against selected matrices, or improve a ligand?domain interaction. The ADAN Database can be accessed from here:

September 16, 2009

Database for Protein Protein Interactions

Filed under: Bioinformatics,Computational Biology,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 1:28 pm
Tags: , , , ,


Proteins are organic compounds made of amino acids arranged in a linear chain and folded into a globular form. These molecules are of great importance because of the function they perform.

Protein associations are studied from the perspectives of biochemistry, quantum chemistry, molecular dynamics, signal transduction and other metabolic or genetic/epigenetic networks. Protein-protein interactions are at the core of the entire Interactomics system of any living cell.These interactions involve not only the direct-contact association of protein molecules but also longer range interactions through the electrolyte, aqueous solution medium surrounding neighbor hydrated proteins over distances from less than one nanometer to distances of several tens of nanometers. Furthermore, such protein-protein interactions are thermodynamically linked functions of dynamically bound ions and water that exchange rapidly with the surrounding solution by comparison with the molecular tumbling rate (or correlation times) of the interacting proteins.

The MIPS Mammalian Protein-Protein Interaction Database is a collection of manually curated high-quality Protein Protein Interaction data collected from the scientific literature by expert curators.The content is based on published experimental evidence that has been processed by human expert curators. MIPS provides the full dataset for download and a flexible and powerful web interface for users with various requirements.

Click here to access MIPS:

September 14, 2009

Protein Mutant Database: An Introduction

Filed under: Bioinformatics,Computational Biology,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 7:41 am
Tags: , , ,

Protein structure is one of the most important and popular research topics in todays era. Reaesrch on protein structure, sequence and organization gives a broad view of its functionality.

Compliations of protein mutant data are valuable as a basis for protein engineering. They provide information on what kinds of functional and/or structural influences are brought about by amino acid mutation at a specific position of protein. The Protein Mutant Database (PMD) which is being constructed covers natural as well as artificial mutants, including random and site-directed ones, for all proteins except members of the globin and immunoglobulin families. The PMD is based on literature, not on proteins. That is, each entry in the database corresponds to one article which may describe one, several or a number of protein mutants.

Click here to know more on PMD:

Next Page »