March 31, 2010

QDD: a user-friendly program to select micro satellite markers and design primers from large sequencing projects

Filed under: Bioinformatics,Computational Biology,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 12:02 am
Tags: , , , , ,

QDD is an open access program providing a user-friendly tool for microsatellite detection and primer design from large sets of DNA sequences. The program is designed to deal with all steps of treatment of raw sequences obtained from pyrosequencing of enriched DNA libraries, but it is also applicable to data obtained through other sequencing methods, using FASTA files as input. The following tasks are completed by QDD: tag sorting, adapter/vector removal, elimination of redundant sequences, detection of possible genomic multicopies (duplicated loci or transposable elements), stringent selection of target microsatellites and customizable primer design. It can treat up to one million sequences of a few hundred base pairs in the tag-sorting step, and up to 50 000 sequences in a single input file for the steps involving estimation of sequence similarity.

Availability: QDD is freely available under the GPL licence for Windows and Linux from the following web site:

March 30, 2010

On the beta-binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 12:00 am
Tags: , , ,

Spectral count data generated from label-free tandem mass spectrometry-based proteomic experiments can be used to quantify protein’s abundances reliably. Comparing spectral count data from different sample groups such as control and disease is an essential step in statistical analysis for the determination of altered protein level and biomarker discovery. The Fisher’s exact test, the G-test, the t-test and the local-pooled-error technique (LPE) are commonly used for differential analysis of spectral count data. However, our initial experiments in two cancer studies show that the current methods are unable to declare at 95% confidence level a number of protein markers that have been judged to be differential on the basis of the biology of the disease and the spectral count numbers. A shortcoming of these tests is that they do not take into account within- and between-sample variations together. Hence, our aim is to improve upon existing techniques by incorporating both the within- and between-sample variations.

We propose to use the beta-binomial distribution to test the significance of differential protein abundances expressed in spectral counts in label-free mass spectrometry-based proteomics. The beta-binomial test naturally normalizes for total sample count. Experimental results show that the beta-binomial test performs favorably in comparison with other methods on several datasets in terms of both true detection rate and false positive rate. In addition, it can be applied for experiments with one or more replicates, and for multiple condition comparisons. Finally, we have implemented a software package for parameter estimation of two beta-binomial models and the associated statistical tests.

Availability: A software package implemented in R is freely available for download at

March 29, 2010

webMGR: an online tool for the multiple genome rearrangement problem

Filed under: Bioinformatics,Computational Biology,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 12:00 am
Tags: , , , ,

The algorithm MGR enables the reconstruction of rearrangement phylogenies based on gene or synteny block order in multiple genomes. Although MGR has been successfully applied to study the evolution of different sets of species, its utilization has been hampered by the prohibitive running time for some applications. In the current work, we have designed new heuristics that significantly speed up the tool without compromising its accuracy. Moreover, we have developed a web server (webMGR) that includes elaborate web output to facilitate navigation through the results.

webMGR can be accessed via

March 28, 2010

Tablet—next generation sequence assembly visualization

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 12:00 am
Tags: , , , ,

Tablet is a lightweight, high-performance graphical viewer for next-generation sequence assemblies and alignments. Supporting a range of input assembly formats, Tablet provides high-quality visualizations showing data in packed or stacked views, allowing instant access and navigation to any region of interest, and whole contig overviews and data summaries. Tablet is both multi-core aware and memory efficient, allowing it to handle assemblies containing millions of reads, even on a 32-bit desktop machine.

Availability: Tablet is freely available for Microsoft Windows, Apple Mac OS X, Linux and Solaris.

March 27, 2010

MARTA: a suite of Java-based tools for assigning taxonomic status to DNA sequences

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 12:00 am
Tags: , ,

We have created a suite of Java-based software to better provide taxonomic assignments to DNA sequences. We anticipate that the program will be useful for protistologists, virologists, mycologists and other microbial ecologists. The program relies on NCBI utilities including the BLAST software and Taxonomy database and is easily manipulated at the command-line to specify a BLAST candidate’s query-coverage or percent identity requirements; other options include the ability to set minimal consensus requirements (%) for each of the eight major taxonomic ranks (Domain, Kingdom, Phylum, …) and whether to consider lower scoring candidates when the top-hit lacks taxonomic classification.


March 26, 2010

Bisque: a platform for bioimage analysis and management

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 4:49 am
Tags: , , ,

Advances in the field of microscopy have brought about the need for better image management and analysis solutions. Novel imaging techniques have created vast stores of images and metadata that are difficult to organize, search, process and analyze. These tasks are further complicated by conflicting and proprietary image and metadata formats, that impede analyzing and sharing of images and any associated data. These obstacles have resulted in research resources being locked away in digital media and file cabinets. Current image management systems do not address the pressing needs of researchers who must quantify image data on a regular basis.

Results: We present Bisque, a web-based platform specifically designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend Bisque with both data model and analysis extensions in order to adapt the system to local needs. Bisque’s extensibility stems from two core concepts: flexible metadata facility and an open web-based architecture. Together these empower researchers to create, develop and share novel bioimage analyses. Several case studies using Bisque with specific applications are presented as an indication of how users can expect to extend Bisque for their own purposes.

Availability: Bisque is web based, cross-platform and open source. The system is also available as software-as-a-service through the Center of Bioimage Informatics at UCSB.

March 21, 2010

FineStr: a web server for single-base-resolution nucleosome positioning

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 12:57 pm
Tags: , , ,

The DNA in eukaryotic cells is packed into the chromatin that is composed of nucleosomes. Positioning of the nucleosome core particles on the sequence is a problem of great interest because of the role nucleosomes play in different cellular processes including gene regulation.

Using the sequence structure of 10.4 base DNA repeat presented in our previous works and nucleosome core DNA sequences database, we have derived the complete nucleosome DNA bendability matrix of Caenorhabditis elegans.

We have developed a web server named FineStr that allows users to upload genomic sequences in FASTA format and to perform a single-base-resolution nucleosome mapping on them.

Availability: FineStr server is freely available for use on the web at http:/ The site contains a help file with explanation regarding the exact usage.

March 20, 2010

BEDTools: a flexible suite of utilities for comparing genomic features

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 12:58 pm
Tags: , , ,

Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner.

Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets.

Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at

March 19, 2010

Pandora, a PAthway and Network DiscOveRy Approach based on common biological evidence

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 12:05 pm
Tags: , , ,

Many biological phenomena involve extensive interactions between many of the biological pathways present in cells. However, extraction of all the inherent biological pathways remains a major challenge in systems biology. With the advent of high-throughput functional genomic techniques, it is now possible to infer biological pathways and pathway organization in a systematic way by integrating disparate biological information.

Results: Here, we propose a novel integrated approach that uses network topology to predict biological pathways. We integrated four types of biological evidence (protein–protein interaction, genetic interaction, domain–domain interaction and semantic similarity of Gene Ontology terms) to generate a functionally associated network. This network was then used to develop a new pathway finding algorithm to predict biological pathways in yeast. Our approach discovered 195 biological pathways and 31 functionally redundant pathway pairs in yeast. By comparing our identified pathways to three public pathway databases (KEGG, BioCyc and Reactome), we observed that our approach achieves a maximum positive predictive value of 12.8% and improves on other predictive approaches. This study allows us to reconstruct biological pathways and delineates cellular machinery in a systematic view.

Availability: The method has been implemented in Perl and is available for downloading from It is distributed under the terms of GPL (

March 17, 2010

BRAT: bisulfite-treated reads analysis tool

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 1:05 pm
Tags: , , ,

We present a new, accurate and efficient tool for mapping short reads obtained from the Illumina Genome Analyzer following sodium bisulfite conversion. Our tool, BRAT, supports single and paired-end reads and handles input files containing reads and mates of different lengths. BRAT is faster, maps more unique paired-end reads and has higher accuracy than existing programs. The software package includes tools to end-trim low-quality bases of the reads and to report nucleotide counts for mapped reads on the reference genome.

Availability: The source code is freely available for download at and is distributed as Open Source software under the GPLv3.0.

Next Page »