Biointelligence

April 17, 2010

A meta-analysis of two-dimensional electrophoresis pattern of the Parkinson’s disease-related protein DJ-1

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: ,

The two-dimensional electrophoresis (2-DE) pattern of proteins is thought to be specifically related to the physiological or pathological condition at the moment of sample preparation. On this ground, most proteomic studies move to identify specific hallmarks for a number of different conditions. However, the information arising from these investigations is often incomplete due to inherent limitations of the technique, to extensive protein post-translational modifications and sometimes to the paucity of available samples.

The meta-analysis of proteomic data can provide valuable information pertinent to various biological processes that otherwise remains hidden.

Results: Here, we show a meta-analysis of the PD protein DJ-1 in heterogeneous 2-DE experiments. The protein was shown to segregate into specific clusters associated with defined conditions.

Interestingly, the DJ-1 pool from neural tissues displayed a specific and characteristic molecular weight and isoelectric point pattern. Moreover, changes in this pattern have been related to neurodegenerative processes and aging. These results were experimentally validated on human brain specimens from control subjects and PD patients.

Availability: ImageJ is a public domain image processing program developed by the National Institutes of Health and is freely available at http://rsbweb.nih.gov/ij. All the ImageJ macros used in this study are available as supplementary material and upon request at info@biodigitalvalley.com. XLSTAT can be purchased online at http://www.xlstat.com/en/home/ at a current cost of 300 EUR.

April 16, 2010

GO-Bayes: Gene Ontology-based overrepresentation analysis using a Bayesian approach

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , ,

A typical approach for the interpretation of high-throughput experiments, such as gene expression microarrays, is to produce groups of genes based on certain criteria (e.g. genes that are differentially expressed). To gain more mechanistic insights into the underlying biology, overrepresentation analysis (ORA) is often conducted to investigate whether gene sets associated with particular biological functions, for example, as represented by Gene Ontology (GO) annotations, are statistically overrepresented in the identified gene groups. However, the standard ORA, which is based on the hypergeometric test, analyzes each GO term in isolation and does not take into account the dependence structure of the GO-term hierarchy.

Results: We have developed a Bayesian approach (GO-Bayes) to measure overrepresentation of GO terms that incorporates the GO dependence structure by taking into account evidence not only from individual GO terms, but also from their related terms (i.e. parents, children, siblings, etc.). The Bayesian framework borrows information across related GO terms to strengthen the detection of overrepresentation signals. As a result, this method tends to identify sets of closely related GO terms rather than individual isolated GO terms. The advantage of the GO-Bayes approach is demonstrated with a simulation study and an application example.

Contact: song.zhang@utsouthwestern.edu; richard.scheuermann@utsouthwestern.edu

April 14, 2010

MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , , ,

Protein structure prediction is one of the most important problems in structural bioinformatics. Here we describe MULTICOM, a multi-level combination approach to improve the various steps in protein structure prediction. In contrast to those methods which look for the best templates, alignments and models, our approach tries to combine complementary and alternative templates, alignments and models to achieve on average better accuracy.

Results: The multi-level combination approach was implemented via five automated protein structure prediction servers and one human predictor which participated in the eighth Critical Assessment of Techniques for Protein Structure Prediction (CASP8), 2008. The MULTICOM servers and human predictor were consistently ranked among the top predictors on the CASP8 benchmark. The methods can predict moderate- to high-resolution models for most template-based targets and low-resolution models for some template-free targets. The results show that the multi-level combination of complementary templates, alternative alignments and similar models aided by model quality assessment can systematically improve both template-based and template-free protein modeling.

April 13, 2010

SoDA2: a Hidden Markov Model approach for identification of immunoglobulin rearrangements

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , , , ,

The inference of pre-mutation immunoglobulin (Ig) rearrangements is essential in the study of the antibody repertoires produced in response to infection, in B-cell neoplasms and in autoimmune disease. Often, there are several rearrangements that are nearly equivalent as candidates for a given Ig gene, but have different consequences in an analysis. Our aim in this article is to develop a probabilistic model of the rearrangement process and a Bayesian method for estimating posterior probabilities for the comparison of multiple plausible rearrangements.

Results: We have developed SoDA2, which is based on a Hidden Markov Model and used to compute the posterior probabilities of candidate rearrangements and to find those with the highest values among them. We validated the software on a set of simulated data, a set of clonally related sequences, and a group of randomly selected Ig heavy chains from Genbank. In most tests, SoDA2 performed better than other available software for the task. Furthermore, the output format has been redesigned, in part, to facilitate comparison of multiple solutions.

April 10, 2010

Gene function prediction from synthetic lethality networks via ranking on demand

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , , ,

Synthetic lethal interactions represent pairs of genes whose individual mutations are not lethal, while the double mutation of both genes does incur lethality. Several studies have shown a correlation between functional similarity of genes and their distances in networks based on synthetic lethal interactions. However, there is a lack of algorithms for predicting gene function from synthetic lethality interaction networks.

Results: In this article, we present a novel technique called kernelROD for gene function prediction from synthetic lethal interaction networks based on kernel machines. We apply our novel algorithm to Gene Ontology functional annotation prediction in yeast. Our experiments show that our method leads to improved gene function prediction compared with state-of-the-art competitors and that combining genetic and congruence networks leads to a further improvement in prediction accuracy.

April 9, 2010

Viewing cancer genes from co-evolving gene modules

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , ,

Studying the evolutionary conservation of cancer genes can improve our understanding of the genetic basis of human cancers. Functionally related proteins encoded by genes tend to interact with each other in a modular fashion, which may affect both the mode and tempo of their evolution.

Results: In the human PPI network, we searched for subnetworks within each of which all proteins have evolved at similar rates since the human and mouse split. Identified at a given co-evolving level, the subnetworks with non-randomly large sizes were defined as co-evolving modules. We showed that proteins within modules tend to be conserved, evolutionarily old and enriched with housekeeping genes, while proteins outside modules tend to be less-conserved, evolutionarily younger and enriched with genes expressed in specific tissues. Viewing cancer genes from co-evolving modules showed that the overall conservation of cancer genes should be mainly attributed to the cancer proteins enriched in the conserved modules. Functional analysis further suggested that cancer proteins within and outside modules might play different roles in carcinogenesis, providing a new hint for studying the mechanism of cancer.

April 8, 2010

Modeling macro–molecular interfaces with Intervor

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: ,

Intervor is a software computing a parameter-free representation of macro–molecular interfaces, based on the -complex of the atoms. Given two interacting partners, possibly with water molecules squeezed in-between them, Intervor computes an interface model which has the following characteristics: (i) it identifies the atoms of the partners which are in direct contact and those whose interaction is water mediated, (ii) it defines a geometric complex separating the partners, the Voronoi interface, whose geometric and topological descriptions are straightforward (surface area, number of patches, curvature), (iii) it allows the definition of the depth of atoms at the interface, thus going beyond the traditional dissection of an interface into a core and a rim. These features can be used to investigate correlations between structural parameters and key properties such as the conservation of residues, their polarity, the water dynamics at the interface, mutagenesis data, etc.

Availability: Intervor can be run from the web site http://cgal.inria.fr/abs/Intervor or upon downloading the binary file. Plugins are also made available for VMD and Pymol.

April 7, 2010

EBImage—an R package for image processing with applications to cellular phenotypes

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , ,

EBImage provides general purpose functionality for reading, writing, processing and analysis of images. Furthermore, in the context of microscopy-based cellular assays, EBImage offers tools to segment cells and extract quantitative cellular descriptors. This allows the automation of such tasks using the R programming language and use of existing tools in the R environment for signal processing, statistical modeling, machine learning and data visualization.

Availability: EBImage is free and open source, released under the LGPL license and available from the Bioconductor project (http://www.bioconductor.org/packages/release/bioc/html/EBImage.html).

April 6, 2010

Metscape: a Cytoscape plug-in for visualizing and interpreting metabolomic data in the context of human metabolic networks

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: ,

Metscape is a plug-in for Cytoscape, used to visualize and interpret metabolomic data in the context of human metabolic networks. We have developed a metabolite database by extracting and integrating information from several public sources. By querying this database, Metscape allows users to trace the connections between metabolites and genes, visualize compound networks and display compound structures as well as information for reactions, enzymes, genes and pathways. Applying the pathway filter, users can create subnetworks that consist of compounds and reactions from a given pathway. Metscape allows users to upload experimental data, and visualize and explore compound networks over time, or experimental conditions. Color and size of the nodes are used to visualize these dynamic changes. Metscape can display the entire metabolic network or any of the pathway-specific networks that exist in the database.

Availability: Metscape can be installed from within Cytoscape 2.6.x under ‘Network and Attribute I/O’ category. For more information, please visit http://metscape.ncibi.org/tryplugin.html

April 5, 2010

CandiSNPer: a web tool for the identification of candidate SNPs for causal variants

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , , , ,

Human single nucleotide polymorphism (SNP) chips which are used in genome-wide association studies (GWAS) permit the genotyping of up to 4 million SNPs simultaneously. To date, about 1000 human SNPs have been identified as statistically significantly associated with a disease or another trait of interest. The identified SNP is not necessarily the causal variant, but it is rather in linkage disequilibrium (LD) with it. CandiSNPer is a software tool that determines the LD region around a significant SNP from a GWAS. It provides a list with functional annotation and LD values for the SNPs found in the LD region. This list contains not only the SNPs for which genotyping data are available, but all SNPs with rs-IDs, thus increasing the likelihood to include the causal variant. Furthermore, plots showing the LD values are generated. CandiSNPer facilitates the preselection of candidate SNPs for causal variants.

Availability and Implementation: The CandiSNPer server is freely available at http://www2.hu-berlin.de/wikizbnutztier/software/CandiSNPer. The source code is available to academic users ‘as is’ upon request. The web site is implemented in Perl and R and runs on an Apache server. The Ensembl database is queried for SNP data via Perl APIs.

Next Page »