Biointelligence

June 4, 2010

ParaSAM: a parallelized version of the significance analysis of microarrays algorithm

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , , ,

Significance analysis of microarrays (SAM) is a widely used permutation-based approach to identifying differentially expressed genes in microarray datasets. While SAM is freely available as an Excel plug-in and as an R-package, analyses are often limited for large datasets due to very high memory requirements.

Summary: We have developed a parallelized version of the SAM algorithm called ParaSAM to overcome the memory limitations. This high performance multithreaded application provides the scientific community with an easy and manageable client-server Windows application with graphical user interface and does not require programming experience to run. The parallel nature of the application comes from the use of web services to perform the permutations. Our results indicate that ParaSAM is not only faster than the serial version, but also can analyze extremely large datasets that cannot be performed using existing implementations.

Availability:A web version open to the public is available at http://bioanalysis.genomics.mcg.edu/parasam. For local installations, both the windows and web implementations of ParaSAM are available for free at http://www.amdcc.org/bioinformatics/software/parasam.aspx

May 31, 2010

Arcadia: a visualization tool for metabolic pathways

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: ,

Arcadia translates text-based descriptions of biological networks (SBML files) into standardized diagrams (SBGN PD maps). Users can view the same model from different perspectives and easily alter the layout to emulate traditional textbook representations.

Availability and Implementation: Arcadia is written in C++. The source code is available (along with Mac OS and Windows binaries) under the GPL from http://arcadiapathways.sourceforge.net/

Contact: alice.villeger@manchester.ac.uk

May 30, 2010

ParaSAM: a parallelized version of the significance analysis of microarrays algorithm

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: ,

Significance analysis of microarrays (SAM) is a widely used permutation-based approach to identifying differentially expressed genes in microarray datasets. While SAM is freely available as an Excel plug-in and as an R-package, analyses are often limited for large datasets due to very high memory requirements.

Summary: We have developed a parallelized version of the SAM algorithm called ParaSAM to overcome the memory limitations. This high performance multithreaded application provides the scientific community with an easy and manageable client-server Windows application with graphical user interface and does not require programming experience to run. The parallel nature of the application comes from the use of web services to perform the permutations. Our results indicate that ParaSAM is not only faster than the serial version, but also can analyze extremely large datasets that cannot be performed using existing implementations.

Availability:A web version open to the public is available at http://bioanalysis.genomics.mcg.edu/parasam. For local installations, both the windows and web implementations of ParaSAM are available for free at http://www.amdcc.org/bioinformatics/software/parasam.aspx

May 17, 2010

Meta-analysis for pathway enrichment analysis when combining multiple genomic studies

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:30 am
Tags: , ,

Many pathway analysis (or gene set enrichment analysis) methods have been developed to identify enriched pathways under different biological states within a genomic study. As more and more microarray datasets accumulate, meta-analysis methods have also been developed to integrate information among multiple studies. Currently, most meta-analysis methods for combining genomic studies focus on biomarker detection and meta-analysis for pathway analysis has not been systematically pursued.

Results: We investigated two approaches of meta-analysis for pathway enrichment (MAPE) by combining statistical significance across studies at the gene level (MAPE_G) or at the pathway level (MAPE_P). Simulation results showed increased statistical power of meta-analysis approaches compared to a single study analysis and showed complementary advantages of MAPE_G and MAPE_P under different scenarios. We also developed an integrated method (MAPE_I) that incorporates advantages of both approaches. Comprehensive simulations and applications to real data on drug response of breast cancer cell lines and lung cancer tissues were evaluated to compare the performance of three MAPE variations. MAPE_P has the advantage of not requiring gene matching across studies. When MAPE_G and MAPE_P show complementary advantages, the hybrid version of MAPE_I is generally recommended.

Availability: http://www.biostat.pitt.edu/bioinfo/

May 16, 2010

Quality assessment of protein model-structures using evolutionary conservation

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:30 am
Tags: , ,

Programs that evaluate the quality of a protein structural model are important both for validating the structure determination procedure and for guiding the model-building process. Such programs are based on properties of native structures that are generally not expected for faulty models. One such property, which is rarely used for automatic structure quality assessment, is the tendency for conserved residues to be located at the structural core and for variable residues to be located at the surface.

Results: We present ConQuass, a novel quality assessment program based on the consistency between the model structure and the protein’s conservation pattern. We show that it can identify problematic structural models, and that the scores it assigns to the server models in CASP8 correlate with the similarity of the models to the native structure. We also show that when the conservation information is reliable, the method’s performance is comparable and complementary to that of the other single-structure quality assessment methods that participated in CASP8 and that do not use additional structural information from homologs.

Availability: A perl implementation of the method, as well as the various perl and R scripts used for the analysis are available at http://bental.tau.ac.il/ConQuass/.

May 15, 2010

Supervised normalization of microarrays

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:30 am
Tags: , ,

A major challenge in utilizing microarray technologies to measure nucleic acid abundances is ‘normalization’, the goal of which is to separate biologically meaningful signal from other confounding sources of signal, often due to unavoidable technical factors. It is intuitively clear that true biological signal and confounding factors need to be simultaneously considered when performing normalization. However, the most popular normalization approaches do not utilize what is known about the study, both in terms of the biological variables of interest and the known technical factors in the study, such as batch or array processing date.

Results: We show here that failing to include all study-specific biological and technical variables when performing normalization leads to biased downstream analyses. We propose a general normalization framework that fits a study-specific model employing every known variable that is relevant to the expression study. The proposed method is generally applicable to the full range of existing probe designs, as well as to both single-channel and dual-channel arrays. We show through real and simulated examples that the method has favorable operating characteristics in comparison to some of the most highly used normalization methods.

Availability: An R package called snm implementing the methodology will be made available from Bioconductor (http://bioconductor.org).

May 11, 2010

Integrative mixture of experts to combine clinical factors and gene markers

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:30 am
Tags: , , ,

Microarrays are being increasingly used in cancer research to better characterize and classify tumors by selecting marker genes. However, as very few of these genes have been validated as predictive biomarkers so far, it is mostly conventional clinical and pathological factors that are being used as prognostic indicators of clinical course. Combining clinical data with gene expression data may add valuable information, but it is a challenging task due to their categorical versus continuous characteristics. We have further developed the mixture of experts (ME) methodology, a promising approach to tackle complex non-linear problems. Several variants are proposed in integrative ME as well as the inclusion of various gene selection methods to select a hybrid signature.

Results: We show on three cancer studies that prediction accuracy can be improved when combining both types of variables. Furthermore, the selected genes were found to be of high relevance and can be considered as potential biomarkers for the prognostic selection of cancer therapy.

Availability: Integrative ME is implemented in the R package integrativeME (http://cran.r-project.org/).

April 16, 2010

GO-Bayes: Gene Ontology-based overrepresentation analysis using a Bayesian approach

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , ,

A typical approach for the interpretation of high-throughput experiments, such as gene expression microarrays, is to produce groups of genes based on certain criteria (e.g. genes that are differentially expressed). To gain more mechanistic insights into the underlying biology, overrepresentation analysis (ORA) is often conducted to investigate whether gene sets associated with particular biological functions, for example, as represented by Gene Ontology (GO) annotations, are statistically overrepresented in the identified gene groups. However, the standard ORA, which is based on the hypergeometric test, analyzes each GO term in isolation and does not take into account the dependence structure of the GO-term hierarchy.

Results: We have developed a Bayesian approach (GO-Bayes) to measure overrepresentation of GO terms that incorporates the GO dependence structure by taking into account evidence not only from individual GO terms, but also from their related terms (i.e. parents, children, siblings, etc.). The Bayesian framework borrows information across related GO terms to strengthen the detection of overrepresentation signals. As a result, this method tends to identify sets of closely related GO terms rather than individual isolated GO terms. The advantage of the GO-Bayes approach is demonstrated with a simulation study and an application example.

Contact: song.zhang@utsouthwestern.edu; richard.scheuermann@utsouthwestern.edu

April 10, 2010

Gene function prediction from synthetic lethality networks via ranking on demand

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , , ,

Synthetic lethal interactions represent pairs of genes whose individual mutations are not lethal, while the double mutation of both genes does incur lethality. Several studies have shown a correlation between functional similarity of genes and their distances in networks based on synthetic lethal interactions. However, there is a lack of algorithms for predicting gene function from synthetic lethality interaction networks.

Results: In this article, we present a novel technique called kernelROD for gene function prediction from synthetic lethal interaction networks based on kernel machines. We apply our novel algorithm to Gene Ontology functional annotation prediction in yeast. Our experiments show that our method leads to improved gene function prediction compared with state-of-the-art competitors and that combining genetic and congruence networks leads to a further improvement in prediction accuracy.

April 4, 2010

Skyline: an open source document editor for creating and analyzing targeted proteomics experiments

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , ,

Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. It is open source and freely available for academic and commercial use. The Skyline user interface simplifies the development of mass spectrometer methods and the analysis of data from targeted proteomics experiments performed using selected reaction monitoring (SRM). Skyline supports using and creating MS/MS spectral libraries from a wide variety of sources to choose SRM filters and verify results based on previously observed ion trap data. Skyline exports transition lists to and imports the native output files from Agilent, Applied Biosystems, Thermo Fisher Scientific and Waters triple quadrupole instruments, seamlessly connecting mass spectrometer output back to the experimental design document. The fast and compact Skyline file format is easily shared, even for experiments requiring many sample injections. A rich array of graphs displays results and provides powerful tools for inspecting data integrity as data are acquired, helping instrument operators to identify problems early. The Skyline dynamic report designer exports tabular data from the Skyline document model for in-depth analysis with common statistical tools.

Availability: Single-click, self-updating web installation is available at http://proteome.gs.washington.edu/software/skyline. This web site also provides access to instructional videos, a support board, an issues list and a link to the source code project.

Next Page »