Biointelligence

August 10, 2010

Savant: genome browser for high-throughput sequencing data

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 8:17 am
Tags:

The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals’ genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets.

Results: We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations.

Availability: Savant is freely available at http://compbio.cs.toronto.edu/savant

July 28, 2010

ACNE: a summarization method to estimate allele-specific copy numbers for Affymetrix SNP arrays

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 5:11 am
Tags:

Current algorithms for estimating DNA copy numbers (CNs) borrow concepts from gene expression analysis methods. However, single nucleotide polymorphism (SNP) arrays have special characteristics that, if taken into account, can improve the overall performance. For example, cross hybridization between alleles occurs in SNP probe pairs. In addition, most of the current CN methods are focused on total CNs, while it has been shown that allele-specific CNs are of paramount importance for some studies. Therefore, we have developed a summarization method that estimates high-quality allele-specific CNs.

Results: The proposed method estimates the allele-specific DNA CNs for all Affymetrix SNP arrays dealing directly with the cross hybridization between probes within SNP probesets. This algorithm outperforms (or at least it performs as well as) other state-of-the-art algorithms for computing DNA CNs. It better discerns an aberration from a normal state and it also gives more precise allele-specific CNs.

Availability: The method is available in the open-source R package ACNE, which also includes an add on to the aroma.affymetrix framework (http://www.aroma-project.org/).

July 27, 2010

JAMIE: joint analysis of multiple ChIP-chip experiments

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 5:16 am
Tags:

Chromatin immunoprecipitation followed by genome tiling array hybridization (ChIP-chip) is a powerful approach to identify transcription factor binding sites (TFBSs) in target genomes. When multiple related ChIP-chip datasets are available, analyzing them jointly allows one to borrow information across datasets to improve peak detection. This is particularly useful for analyzing noisy datasets.

Results: We propose a hierarchical mixture model and develop an R package JAMIE to perform the joint analysis. The genome is assumed to consist of background and potential binding regions (PBRs). PBRs have context-dependent probabilities to become bona fide binding sites in individual datasets. This model captures the correlation among datasets, which provides basis for sharing information across experiments. Real data tests illustrate the advantage of JAMIE over a strategy that analyzes individual datasets separately.

Availability: JAMIE is freely available from http://www.biostat.jhsph.edu/hji/jamie

July 21, 2010

JAMIE: joint analysis of multiple ChIP-chip experiments

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 8:10 am
Tags:

Chromatin immunoprecipitation followed by genome tiling array hybridization (ChIP-chip) is a powerful approach to identify transcription factor binding sites (TFBSs) in target genomes. When multiple related ChIP-chip datasets are available, analyzing them jointly allows one to borrow information across datasets to improve peak detection. This is particularly useful for analyzing noisy datasets.

Results: We propose a hierarchical mixture model and develop an R package JAMIE to perform the joint analysis. The genome is assumed to consist of background and potential binding regions (PBRs). PBRs have context-dependent probabilities to become bona fide binding sites in individual datasets. This model captures the correlation among datasets, which provides basis for sharing information across experiments. Real data tests illustrate the advantage of JAMIE over a strategy that analyzes individual datasets separately.

Availability: JAMIE is freely available from http://www.biostat.jhsph.edu/hji/jamie

June 19, 2010

Modeling RNA loops using sequence homology and geometric constraints

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: ,

RNA loop regions are essential structural elements of RNA molecules influencing both their structural and functional properties. We developed RLooM, a web application for homology-based modeling of RNA loops utilizing template structures extracted from the PDB. RLooM allows the insertion and replacement of loop structures of a desired sequence into an existing RNA structure. Furthermore, a comprehensive database of loops in RNA structures can be accessed through the web interface.

Availability and Implementation: The application was implemented in Python, MySQL and Apache. A web interface to the database and loop modeling application is freely available at http://rloom.mpimp-golm.mpg.de

May 10, 2010

Alignment-free local structural search by writhe decomposition

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:30 am
Tags: ,

Rapid methods for protein structure search enable biological discoveries based on flexibly defined structural similarity, unleashing the power of the ever greater number of solved protein structures. Projection methods show promise for the development of fast structural database search solutions. Projection methods map a structure to a point in a high-dimensional space and compare two structures by measuring distance between their projected points. These methods offer a tremendous increase in speed over residue-level structural alignment methods. However, current projection methods are not practical, partly because they are unable to identify local similarities.

Results: We propose a new projection-based approach that can rapidly detect global as well as local structural similarities. Local structural search is enabled by a topology-inspired writhe decomposition protocol that produces a small number of fragments while ensuring that similar structures are cut in a similar manner. In benchmark tests, we show that our method, writher, improves accuracy over existing projection methods in terms of recognizing SCOP domains out of multi-domain proteins, while maintaining accuracy comparable with existing projection methods in a standard single-domain benchmark test.

Availability: The source code is available at the following website: http://compbio.berkeley.edu/proj/writher/

May 9, 2010

A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:30 am
Tags: , ,

Accurately predicting the binding affinities of large sets of diverse protein–ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of molecular docking, which in turn is an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterize the complex, which also include parameters fitted to experimental or simulation data and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions.

Results: We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Score’s performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score.

Contact: pedro.ballester@ebi.ac.uk; jbom@st-andrews.ac.uk

May 8, 2010

ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:03 am
Tags: ,

Exploitation of locally similar 3D patterns of physicochemical properties on the surface of a protein for detection of binding sites that may lack sequence and global structural conservation.

Results: An algorithm, ProBiS is described that detects structurally similar sites on protein surfaces by local surface structure alignment. It compares the query protein to members of a database of protein 3D structures and detects with sub-residue precision, structurally similar sites as patterns of physicochemical properties on the protein surface. Using an efficient maximum clique algorithm, the program identifies proteins that share local structural similarities with the query protein and generates structure-based alignments of these proteins with the query. Structural similarity scores are calculated for the query protein’s surface residues, and are expressed as different colors on the query protein surface. The algorithm has been used successfully for the detection of protein–protein, protein–small ligand and protein–DNA binding sites.

Availability: The software is available, as a web tool, free of charge for academic users at http://probis.cmm.ki.si

April 14, 2010

MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 9:00 am
Tags: , , ,

Protein structure prediction is one of the most important problems in structural bioinformatics. Here we describe MULTICOM, a multi-level combination approach to improve the various steps in protein structure prediction. In contrast to those methods which look for the best templates, alignments and models, our approach tries to combine complementary and alternative templates, alignments and models to achieve on average better accuracy.

Results: The multi-level combination approach was implemented via five automated protein structure prediction servers and one human predictor which participated in the eighth Critical Assessment of Techniques for Protein Structure Prediction (CASP8), 2008. The MULTICOM servers and human predictor were consistently ranked among the top predictors on the CASP8 benchmark. The methods can predict moderate- to high-resolution models for most template-based targets and low-resolution models for some template-free targets. The results show that the multi-level combination of complementary templates, alternative alignments and similar models aided by model quality assessment can systematically improve both template-based and template-free protein modeling.