Biointelligence

August 3, 2010

LOX: inferring Level Of eXpression from diverse methods of census sequencing

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 7:13 am
Tags:

We present LOX (Level Of eXpression) that estimates the Level Of gene eXpression from high-throughput-expressed sequence datasets with multiple treatments or samples. Unlike most analyses, LOX incorporates a gene bias model that facilitates integration of diverse transcriptomic sequencing data that arises when transcriptomic data have been produced using diverse experimental methodologies. LOX integrates overall sequence count tallies normalized by total expressed sequence count to provide expression levels for each gene relative to all treatments as well as Bayesian credible intervals.

Availability: http://www.yale.edu/townsend/software.html

August 2, 2010

Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 5:31 am
Tags:

Count is a software package for the analysis of numerical profiles on a phylogeny. It is primarily designed to deal with profiles derived from the phyletic distribution of homologous gene families, but is suited to study any other integer-valued evolutionary characters. Count performs ancestral reconstruction, and infers family- and lineage-specific characteristics along the evolutionary tree. It implements popular methods employed in gene content analysis such as Dollo and Wagner parsimony, propensity for gene loss, as well as probabilistic methods involving a phylogenetic birth-and-death model.

Availability: Count is available as a stand-alone Java application, as well as an application bundle for MacOS X, at the web site http://www.iro.umontreal.ca/csuros/gene_content/count.html. It can also be launched using Java Webstart from the same site. The software is distributed under a BSD-style license. Source code is available upon request from the author.

July 28, 2010

ACNE: a summarization method to estimate allele-specific copy numbers for Affymetrix SNP arrays

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 5:11 am
Tags:

Current algorithms for estimating DNA copy numbers (CNs) borrow concepts from gene expression analysis methods. However, single nucleotide polymorphism (SNP) arrays have special characteristics that, if taken into account, can improve the overall performance. For example, cross hybridization between alleles occurs in SNP probe pairs. In addition, most of the current CN methods are focused on total CNs, while it has been shown that allele-specific CNs are of paramount importance for some studies. Therefore, we have developed a summarization method that estimates high-quality allele-specific CNs.

Results: The proposed method estimates the allele-specific DNA CNs for all Affymetrix SNP arrays dealing directly with the cross hybridization between probes within SNP probesets. This algorithm outperforms (or at least it performs as well as) other state-of-the-art algorithms for computing DNA CNs. It better discerns an aberration from a normal state and it also gives more precise allele-specific CNs.

Availability: The method is available in the open-source R package ACNE, which also includes an add on to the aroma.affymetrix framework (http://www.aroma-project.org/).

July 27, 2010

JAMIE: joint analysis of multiple ChIP-chip experiments

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 5:16 am
Tags:

Chromatin immunoprecipitation followed by genome tiling array hybridization (ChIP-chip) is a powerful approach to identify transcription factor binding sites (TFBSs) in target genomes. When multiple related ChIP-chip datasets are available, analyzing them jointly allows one to borrow information across datasets to improve peak detection. This is particularly useful for analyzing noisy datasets.

Results: We propose a hierarchical mixture model and develop an R package JAMIE to perform the joint analysis. The genome is assumed to consist of background and potential binding regions (PBRs). PBRs have context-dependent probabilities to become bona fide binding sites in individual datasets. This model captures the correlation among datasets, which provides basis for sharing information across experiments. Real data tests illustrate the advantage of JAMIE over a strategy that analyzes individual datasets separately.

Availability: JAMIE is freely available from http://www.biostat.jhsph.edu/hji/jamie

July 26, 2010

Cassis: detection of genomic rearrangement breakpoints

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 4:55 am
Tags:

Genomes undergo large structural changes that alter their organization. The chromosomal regions affected by these rearrangements are called breakpoints, while those which have not been rearranged are called synteny blocks. Lemaitre et al. presented a new method to precisely delimit rearrangement breakpoints in a genome by comparison with the genome of a related species. Receiving as input a list of one2one orthologous genes found in the genomes of two species, the method builds a set of reliable and non-overlapping synteny blocks and refines the regions that are not contained into them. Through the alignment of each breakpoint sequence against its specific orthologous sequences in the other species, we can look for weak similarities inside the breakpoint, thus extending the synteny blocks and narrowing the breakpoints. The identification of the narrowed breakpoints relies on a segmentation algorithm and is statistically assessed. Here, we present the package Cassis that implements this method of precise detection of genomic rearrangement breakpoints.

Availability: Perl and R scripts are freely available for download at http://pbil.univ-lyon1.fr/software/Cassis/. Documentation with methodological background, technical aspects, download and setup instructions, as well as examples of applications are available together with the package. The package was tested on Linux and Mac OS environments and is distributed under the GNU GPL License

July 24, 2010

COMA server for protein distant homology search

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 8:07 am
Tags:

Detection of distant homology is a widely used computational approach for studying protein evolution, structure and function. Here, we report a homology search web server based on sequence profile–profile comparison. The user may perform searches in one of several regularly updated profile databases using either a single sequence or a multiple sequence alignment as an input. The same profile databases can also be downloaded for local use. The capabilities of the server are illustrated with the identification of new members of the highly diverse PD-(D/E)XK nuclease superfamily.

Availability: http://www.ibt.lt/bioinformatics/coma

July 23, 2010

Structure-based variable selection for survival data

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 6:20 am
Tags:

Variable selection is a typical approach used for molecular-signature and biomarker discovery; however, its application to survival data is often complicated by censored samples. We propose a new algorithm for variable selection suitable for the analysis of high-dimensional, right-censored data called Survival Max–Min Parents and Children (SMMPC). The algorithm is conceptually simple, scalable, based on the theory of Bayesian networks (BNs) and the Markov blanket and extends the corresponding algorithm (MMPC) for classification tasks. The selected variables have a structural interpretation: if T is the survival time (in general the time-to-event), SMMPC returns the variables adjacent to T in the BN representing the data distribution. The selected variables also have a causal interpretation that we discuss.

Results: We conduct an extensive empirical analysis of prototypical and state-of-the-art variable selection algorithms for survival data that are applicable to high-dimensional biological data. SMMPC selects on average the smallest variable subsets (less than a dozen per dataset), while statistically significantly outperforming all of the methods in the study returning a manageable number of genes that could be inspected by a human expert.

Availability: Matlab and R code are freely available from http://www.mensxmachina.org

July 22, 2010

MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:54 am
Tags:

Finding association between genetic variants and phenotypes related to disease has become an important vehicle for the study of complex disorders. In this context, multi-loci genetic association might unravel additional information when compared with single loci search. The main goal of this work is to propose a non-linear methodology based on information theory for finding combinatorial association between multi-SNPs and a given phenotype.

Results: The proposed methodology, called MISS (mutual information statistical significance), has been integrated jointly with a feature selection algorithm and has been tested on a synthetic dataset with a controlled phenotype and in the particular case of the F7 gene. The MISS methodology has been contrasted with a multiple linear regression (MLR) method used for genetic association in both, a population-based study and a sib-pairs analysis and with the maximum entropy conditional probability modelling (MECPM) method, which searches for predictive multi-locus interactions. Several sets of SNPs within the F7 gene region have been found to show a significant correlation with the FVII levels in blood. The proposed multi-site approach unveils combinations of SNPs that explain more significant information of the phenotype than their individual polymorphisms. MISS is able to find more correlations between SNPs and the phenotype than MLR and MECPM. Most of the marked SNPs appear in the literature as functional variants with real effect on the protein FVII levels in blood.

Availability: The code is available at http://sisbio.recerca.upc.edu/R/MISS_0.2.tar.gz

July 21, 2010

JAMIE: joint analysis of multiple ChIP-chip experiments

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 8:10 am
Tags:

Chromatin immunoprecipitation followed by genome tiling array hybridization (ChIP-chip) is a powerful approach to identify transcription factor binding sites (TFBSs) in target genomes. When multiple related ChIP-chip datasets are available, analyzing them jointly allows one to borrow information across datasets to improve peak detection. This is particularly useful for analyzing noisy datasets.

Results: We propose a hierarchical mixture model and develop an R package JAMIE to perform the joint analysis. The genome is assumed to consist of background and potential binding regions (PBRs). PBRs have context-dependent probabilities to become bona fide binding sites in individual datasets. This model captures the correlation among datasets, which provides basis for sharing information across experiments. Real data tests illustrate the advantage of JAMIE over a strategy that analyzes individual datasets separately.

Availability: JAMIE is freely available from http://www.biostat.jhsph.edu/hji/jamie

July 20, 2010

adephylo: new tools for investigating the phylogenetic signal in biological traits

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 8:42 am
Tags:

adephylo is a package for the R software dedicated to the analysis of comparative evolutionary data. Phylogenetic comparative methods initially aimed at accounting for or removing the effects of phylogenetic signal in the analysis of biological traits. However, recent approaches have shown that considerable information can be gathered from the study of the phylogenetic signal. In particular, close examination of phylogenetic structures can unveil interesting evolutionary patterns. For this purpose, we developed the package adephylo that provides tools for quantifying and describing the phylogenetic structures of biological traits. adephylo implements tests of phylogenetic signal, phylogenetic distances and proximities, and novel methods for describing further univariate and multivariate phylogenetic structures. These tools open up new perspectives in the analysis of evolutionary comparative data.

Availability: The stable version is available from CRAN: http:/cran.r-project.org/web/packages/adephylo/. The development version is hosted by R-Forge: http://r-forge.r-project.org/projects/adephylo/. Both versions can be installed directly from R. adephylo is distributed under the GNU General Public Licence (2).

« Previous PageNext Page »