September 29, 2010

2 Days workshop on Gene Regulation Analysis Using Computational Tools

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 5:27 am

2 Days Workshop on
23rd -24th October 2010
Regulation of gene expression at the post-transcriptional level, including the control of splicing, localization, and translation, is widespread in eukaryotes. There have been several tools developed in order to study the
regulation of gene expression in detail. The goal of our workshop is to introduce individuals with the basics of
gene regulation mechanism and provide them with a hands-on experience of in silico gene regulation analysis
tools. The workshop would explain some recent work in the area of post transcriptional regulation to the
molecular biology community and some of the unique computational developments in this area. In particular, we will focus on emerging computational and large-scale experimental strategies in gene expression regulation and its emerging trends in the bio industry.
10:00 am – 10:30 am Inaugural and Introduction session
10:30 am – 11:30 am Evolution of Life: An Introduction to Gene Expression
11:30 am – 1:00 pm Regulation of Gene Expression
1:00 pm – 2:00 pm Lunch Break
2:00 pm – 3:00 pm The Operon Model of Gene Regulation
3:00 pm – 3:45 pm Transcription Start Sites and Promoters
3:45 pm – 4:00 pm Tea Break
4:00 pm – 5:30 pm Hands on Sessions – Promoter Prediction Tools
10:00 am – 11:00 am Post Transcriptional Gene Regulation
11:00 am – 12:00 am Introduction to Systems Biology
12:00 am – 1:00 pm Protein Protein Interactions & related databases– The Bioinformatics Approach
1:00 pm – 2:00 pm Lunch Break
2:00 pm – 3:00 pm Hands on Sessions – Protein Protein Interaction Databases
3:00 pm – 4:00 pm Hands on Sessions – Systems Biology Tools
4:00 pm – 4:15 pm Tea Break
4:15 pm – 5:00 pm Gene Regulation – The Industry Aspect
5:00 pm – 5:30 pm Valedictory Session
Candidates pursuing /completed their Graduation or Post Graduation in the field of Biotechnology,
Bioinformatics, Chemistry, Zoology, Botany, Biochemistry, Microbiology, Genetic Engineering and Computer
Lectures, Demos, Hands-on Session
Last Date of Registration : 20th October 2010
On Spot Registration** : 23rd October 2010
Workshop Duration: 23-24 October 2010
Registration forms can be collected from the Biointelligence Center, Indore or can be downloaded from the
First Floor, Devdarshan Building,
20/1 South Tukoganj,
Opp. Sanghi Motors (TATA) AB Road, Palasia
Square, Indore – 452 001 (M.P.)
Registeration Details FEE
Registration for Students Rs. 1800
Registration Phd /Research Associate/
Rs. 2500
• Lunch and snacks charges included in the fees.
•Fee could be payable by DD or cash, along with completely filled Registration form.
•On spot registration would be closed 30 minutes prior to the inaugural session.
•Each participant will be provided with the workshop kit containing study material.
•Accommodation facilities are available on prior request and advance payment.
•Certificate of Participation will be provided to all registered participants.
•Candidates Sponsored by Biointelligence would have to pay lunch charges / arrange their own lunch .
First Floor, Devdarshan Building,
20/1 South Tukoganj,
Opp. Sanghi Motors (TATA) AB Road, Palasia
Square, Indore – 452 001 (M.P.)
Ms Shruti Bhide
Desk Phone: 0731 – 4202793
Mobile: 98932-84855

August 19, 2010

METAL: fast and efficient meta-analysis of genomewide association scans

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 5:29 am

METAL provides a computationally efficient tool for meta-analysis of genome-wide association scans, which is a commonly used approach for improving power complex traits gene mapping studies. METAL provides a rich scripting interface and implements efficient memory management to allow analyses of very large data sets and to support a variety of input file formats.

Availability and implementation: METAL, including source code, documentation, examples, and executables, is available at

August 18, 2010

Dealing with sparse data in predicting outcomes of HIV combination therapies

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 6:26 am

As there exists no cure or vaccine for the infection with human immunodeficiency virus (HIV), the standard approach to treating HIV patients is to repeatedly administer different combinations of several antiretroviral drugs. Because of the large number of possible drug combinations, manually finding a successful regimen becomes practically impossible. This presents a major challenge for HIV treatment. The application of machine learning methods for predicting virological responses to potential therapies is a possible approach to solving this problem. However, due to evolving trends in treating HIV patients the available clinical datasets have a highly unbalanced representation, which might negatively affect the usefulness of derived statistical models.

Results: This article presents an approach that tackles the problem of predicting virological response to combination therapies by learning a separate logistic regression model for each therapy. The models are fitted by using not only the data from the target therapy but also the information from similar therapies. For this purpose, we introduce and evaluate two different measures of therapy similarity. The models are also able to incorporate phenotypic knowledge on the therapy outcomes through a Gaussian prior. With our approach we balance the uneven therapy representation in the datasets and produce higher quality models for therapies with very few training samples. According to the results from the computational experiments our therapy similarity model performs significantly better than training separate models for each therapy by using solely their examples. Furthermore, the model’s performance is as good as an approach that encodes therapy information in the input feature space with the advantage of delivering better results for therapies with very few training samples.

Availability: Code of the efficient logistic regression is available from

August 17, 2010

A probabilistic framework for aligning paired-end RNA-seq data

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 6:56 am

The RNA-seq paired-end read (PER) protocol samples transcript fragments longer than the sequencing capability of today’s technology by sequencing just the two ends of each fragment. Deep sampling of the transcriptome using the PER protocol presents the opportunity to reconstruct the unsequenced portion of each transcript fragment using end reads from overlapping PERs, guided by the expected length of the fragment.

Methods: A probabilistic framework is described to predict the alignment to the genome of all PER transcript fragments in a PER dataset. Starting from possible exonic and spliced alignments of all end reads, our method constructs potential splicing paths connecting paired ends. An expectation maximization method assigns likelihood values to all splice junctions and assigns the most probable alignment for each transcript fragment.

Results: The method was applied to 2 x 35 bp PER datasets from cancer cell lines MCF-7 and SUM-102. PER fragment alignment increased the coverage 3-fold compared to the alignment of the end reads alone, and increased the accuracy of splice detection. The accuracy of the expectation maximization (EM) algorithm in the presence of alternative paths in the splice graph was validated by qRT–PCR experiments on eight exon skipping alternative splicing events. PER fragment alignment with long-range splicing confirmed 8 out of 10 fusion events identified in the MCF-7 cell line in an earlier study by (Maher et al., 2009).

Availability: Software available at

August 16, 2010

Bridges: a tool for identifying local similarities in long sequences

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 11:52 am

Bridges is a heuristic search tool that uses short word matches to rapidly identify local similarities between sequences. It consists of three stages: filtering input sequences, identifying local similarities and post-processing local similarities. As input sequence data are released from memory after the filtering stage, genome-scale datasets can be efficiently compared in a single run. Bridges also includes 20 parameters, which enable the user to dictate the sensitivity and specificity of a search.

Availability: Bridges is implemented in the C programming language and can be run on all platforms. Source code and documentation are available at

August 11, 2010

Over-optimism in bioinformatics: an illustration

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 5:45 am

In statistical bioinformatics research, different optimization mechanisms potentially lead to ‘over-optimism’ in published papers. So far, however, a systematic critical study concerning the various sources underlying this over-optimism is lacking.

Results: We present an empirical study on over-optimism using high-dimensional classification as example. Specifically, we consider a ‘promising’ new classification algorithm, namely linear discriminant analysis incorporating prior knowledge on gene functional groups through an appropriate shrinkage of the within-group covariance matrix. While this approach yields poor results in terms of error rate, we quantitatively demonstrate that it can artificially seem superior to existing approaches if we ‘fish for significance’. The investigated sources of over-optimism include the optimization of datasets, of settings, of competing methods and, most importantly, of the method’s characteristics. We conclude that, if the improvement of a quantitative criterion such as the error rate is the main contribution of a paper, the superiority of new algorithms should always be demonstrated on independent validation data.

Availability: The R codes and relevant data can be downloaded from, such that the study is completely reproducible

August 10, 2010

Savant: genome browser for high-throughput sequencing data

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 8:17 am

The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals’ genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets.

Results: We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations.

Availability: Savant is freely available at

August 7, 2010

CplexA: a Mathematica package to study macromolecular-assembly control of gene expression

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 7:38 am

Macromolecular assembly coordinates essential cellular processes, such as gene regulation and signal transduction. A major challenge for conventional computational methods to study these processes is tackling the exponential increase of the number of configurational states with the number of components. CplexA is a Mathematica package that uses functional programming to efficiently compute probabilities and average properties over such exponentially large number of states from the energetics of the interactions. The package is particularly suited to study gene expression at complex promoters controlled by multiple, local and distal, DNA binding sites for transcription factors.

Availability: CplexA is freely available together with documentation at

August 6, 2010

EpiTOP—a proteochemometric tool for MHC class II binding prediction

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 10:28 am

T-cell epitope identification is a critical immunoinformatic problem within vaccine design. To be an epitope, a peptide must bind an MHC protein.

Results: Here, we present EpiTOP, the first server predicting MHC class II binding based on proteochemometrics, a QSAR approach for ligands binding to several related proteins. EpiTOP uses a quantitative matrix to predict binding to 12 HLA-DRB1 alleles. It identifies 89% of known epitopes within the top 20% of predicted binders, reducing laboratory labour, materials and time by 80%. EpiTOP is easy to use, gives comprehensive quantitative predictions and will be expanded and updated with new quantitative matrices over time.

Availability: EpiTOP is freely accessible at

August 4, 2010

MiRror: a combinatorial analysis web tool for ensembles of microRNAs and their targets

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 7:33 am

The miRror application provides insights on microRNA (miRNA) regulation. It is based on the notion of a combinatorial regulation by an ensemble of miRNAs or genes. miRror integrates predictions from a dozen of miRNA resources that are based on complementary algorithms into a unified statistical framework. For miRNAs set as input, the online tool provides a ranked list of targets, based on set of resources selected by the user, according to their significance of being coordinately regulated. Symmetrically, a set of genes can be used as input to suggest a set of miRNAs. The user can restrict the analysis for the preferred tissue or cell line. miRror is suitable for analyzing results from miRNAs profiling, proteomics and gene expression arrays.


Next Page »