August 6, 2010

EpiTOP—a proteochemometric tool for MHC class II binding prediction

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 10:28 am

T-cell epitope identification is a critical immunoinformatic problem within vaccine design. To be an epitope, a peptide must bind an MHC protein.

Results: Here, we present EpiTOP, the first server predicting MHC class II binding based on proteochemometrics, a QSAR approach for ligands binding to several related proteins. EpiTOP uses a quantitative matrix to predict binding to 12 HLA-DRB1 alleles. It identifies 89% of known epitopes within the top 20% of predicted binders, reducing laboratory labour, materials and time by 80%. EpiTOP is easy to use, gives comprehensive quantitative predictions and will be expanded and updated with new quantitative matrices over time.

Availability: EpiTOP is freely accessible at

August 4, 2010

MiRror: a combinatorial analysis web tool for ensembles of microRNAs and their targets

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 7:33 am

The miRror application provides insights on microRNA (miRNA) regulation. It is based on the notion of a combinatorial regulation by an ensemble of miRNAs or genes. miRror integrates predictions from a dozen of miRNA resources that are based on complementary algorithms into a unified statistical framework. For miRNAs set as input, the online tool provides a ranked list of targets, based on set of resources selected by the user, according to their significance of being coordinately regulated. Symmetrically, a set of genes can be used as input to suggest a set of miRNAs. The user can restrict the analysis for the preferred tissue or cell line. miRror is suitable for analyzing results from miRNAs profiling, proteomics and gene expression arrays.


July 23, 2010

Structure-based variable selection for survival data

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 6:20 am

Variable selection is a typical approach used for molecular-signature and biomarker discovery; however, its application to survival data is often complicated by censored samples. We propose a new algorithm for variable selection suitable for the analysis of high-dimensional, right-censored data called Survival Max–Min Parents and Children (SMMPC). The algorithm is conceptually simple, scalable, based on the theory of Bayesian networks (BNs) and the Markov blanket and extends the corresponding algorithm (MMPC) for classification tasks. The selected variables have a structural interpretation: if T is the survival time (in general the time-to-event), SMMPC returns the variables adjacent to T in the BN representing the data distribution. The selected variables also have a causal interpretation that we discuss.

Results: We conduct an extensive empirical analysis of prototypical and state-of-the-art variable selection algorithms for survival data that are applicable to high-dimensional biological data. SMMPC selects on average the smallest variable subsets (less than a dozen per dataset), while statistically significantly outperforming all of the methods in the study returning a manageable number of genes that could be inspected by a human expert.

Availability: Matlab and R code are freely available from

June 26, 2010

Learning combinatorial transcriptional dynamics from gene expression data

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 5:49 am

mRNA transcriptional dynamics is governed by a complex network of transcription factor (TF) proteins. Experimental and theoretical analysis of this process is hindered by the fact that measurements of TF activity in vivo is very challenging. Current models that jointly infer TF activities and model parameters rely on either of the two main simplifying assumptions: either the dynamics is simplified (e.g. assuming quasi-steady state) or the interactions between TFs are ignored, resulting in models accounting for a single TF.

Results: We present a novel approach to reverse engineer the dynamics of multiple TFs jointly regulating the expression of a set of genes. The model relies on a continuous time, differential equation description of transcriptional dynamics where TFs are treated as latent on/off variables and are modelled using a switching stochastic process (telegraph process). The model can not only incorporate both activation and repression, but allows any non-trivial interaction between TFs, including AND and OR gates. By using a factorization assumption within a variational Bayesian treatment we formulate a framework that can reconstruct both the activity profiles of the TFs and the type of regulation from time series gene expression data. We demonstrate the identifiability of the model on a simple but non-trivial synthetic example, and then use it to formulate non-trivial predictions about transcriptional control during yeast metabolism.


June 18, 2010

GPU computing for systems biology

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 4:40 am

The development of detailed, coherent, models of complex biological systems is recognized as a key requirement for integrating the increasing amount of experimental data. In addition, in-silico simulation of bio-chemical models provides an easy way to test different experimental conditions, helping in the discovery of the dynamics that regulate biological systems. However, the computational power required by these simulations often exceeds that available on common desktop computers and thus expensive high performance computing solutions are required. An emerging alternative is represented by general-purpose scientific computing on graphics processing units (GPGPU), which offers the power of a small computer cluster at a cost of $400. Computing with a GPU requires the development of specific algorithms, since the programming paradigm substantially differs from traditional CPU-based computing. In this paper, we review some recent efforts in exploiting the processing power of GPUs for the simulation of biological systems

May 18, 2010

An integer programming formulation to identify the sparse network architecture governing differentiation of embryonic stem cells

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:30 am
Tags: , ,

Primary purpose of modeling gene regulatory networks for developmental process is to reveal pathways governing the cellular differentiation to specific phenotypes. Knowledge of differentiation network will enable generation of desired cell fates by careful alteration of the governing network by adequate manipulation of cellular environment.

Results: We have developed a novel integer programming-based approach to reconstruct the underlying regulatory architecture of differentiating embryonic stem cells from discrete temporal gene expression data. The network reconstruction problem is formulated using inherent features of biological networks: (i) that of cascade architecture which enables treatment of the entire complex network as a set of interconnected modules and (ii) that of sparsity of interconnection between the transcription factors. The developed framework is applied to the system of embryonic stem cells differentiating towards pancreatic lineage. Experimentally determined expression profile dynamics of relevant transcription factors serve as the input to the network identification algorithm. The developed formulation accurately captures many of the known regulatory modes involved in pancreatic differentiation. The predictive capacity of the model is tested by simulating an in silico potential pathway of subsequent differentiation. The predicted pathway is experimentally verified by concurrent differentiation experiments. Experimental results agree well with model predictions, thereby illustrating the predictive accuracy of the proposed algorithm.

May 13, 2010

Pathway discovery in metabolic networks by subgraph extraction

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:30 am
Tags: , ,

Subgraph extraction is a powerful technique to predict pathways from biological networks and a set of query items (e.g. genes, proteins, compounds, etc.). It can be applied to a variety of different data types, such as gene expression, protein levels, operons or phylogenetic profiles. In this article, we investigate different approaches to extract relevant pathways from metabolic networks. Although these approaches have been adapted to metabolic networks, they are generic enough to be adjusted to other biological networks as well.

Results: We comparatively evaluated seven sub-network extraction approaches on 71 known metabolic pathways from Saccharomyces cerevisiae and a metabolic network obtained from MetaCyc. The best performing approach is a novel hybrid strategy, which combines a random walk-based reduction of the graph with a shortest paths-based algorithm, and which recovers the reference pathways with an accuracy of 77%.

Availability: Most of the presented algorithms are available as part of the network analysis tool set (NeAT). The kWalks method is released under the GPL3 license.

April 13, 2010

SoDA2: a Hidden Markov Model approach for identification of immunoglobulin rearrangements

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , , , ,

The inference of pre-mutation immunoglobulin (Ig) rearrangements is essential in the study of the antibody repertoires produced in response to infection, in B-cell neoplasms and in autoimmune disease. Often, there are several rearrangements that are nearly equivalent as candidates for a given Ig gene, but have different consequences in an analysis. Our aim in this article is to develop a probabilistic model of the rearrangement process and a Bayesian method for estimating posterior probabilities for the comparison of multiple plausible rearrangements.

Results: We have developed SoDA2, which is based on a Hidden Markov Model and used to compute the posterior probabilities of candidate rearrangements and to find those with the highest values among them. We validated the software on a set of simulated data, a set of clonally related sequences, and a group of randomly selected Ig heavy chains from Genbank. In most tests, SoDA2 performed better than other available software for the task. Furthermore, the output format has been redesigned, in part, to facilitate comparison of multiple solutions.

April 10, 2010

Gene function prediction from synthetic lethality networks via ranking on demand

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , , ,

Synthetic lethal interactions represent pairs of genes whose individual mutations are not lethal, while the double mutation of both genes does incur lethality. Several studies have shown a correlation between functional similarity of genes and their distances in networks based on synthetic lethal interactions. However, there is a lack of algorithms for predicting gene function from synthetic lethality interaction networks.

Results: In this article, we present a novel technique called kernelROD for gene function prediction from synthetic lethal interaction networks based on kernel machines. We apply our novel algorithm to Gene Ontology functional annotation prediction in yeast. Our experiments show that our method leads to improved gene function prediction compared with state-of-the-art competitors and that combining genetic and congruence networks leads to a further improvement in prediction accuracy.

April 9, 2010

Viewing cancer genes from co-evolving gene modules

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , ,

Studying the evolutionary conservation of cancer genes can improve our understanding of the genetic basis of human cancers. Functionally related proteins encoded by genes tend to interact with each other in a modular fashion, which may affect both the mode and tempo of their evolution.

Results: In the human PPI network, we searched for subnetworks within each of which all proteins have evolved at similar rates since the human and mouse split. Identified at a given co-evolving level, the subnetworks with non-randomly large sizes were defined as co-evolving modules. We showed that proteins within modules tend to be conserved, evolutionarily old and enriched with housekeeping genes, while proteins outside modules tend to be less-conserved, evolutionarily younger and enriched with genes expressed in specific tissues. Viewing cancer genes from co-evolving modules showed that the overall conservation of cancer genes should be mainly attributed to the cancer proteins enriched in the conserved modules. Functional analysis further suggested that cancer proteins within and outside modules might play different roles in carcinogenesis, providing a new hint for studying the mechanism of cancer.

Next Page »