Biointelligence

March 20, 2010

BEDTools: a flexible suite of utilities for comparing genomic features

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 12:58 pm
Tags: , , ,

Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner.

Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets.

Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools

January 14, 2010

EGAN – Exploratory Gene Association Networks

Filed under: Bioinformatics,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 7:27 am
Tags: , , , ,

EGAN (Exploratory Gene Association Networks) is a software tool that allows a domain expert (a biologist) to visualize and interpret the results of high-throughput exploratory assays in an interactive hypergraph of entities (genes), relationships (protein-protein interactions, literature co-occurence, etc.) and meta-data (annotation, signaling pathways, etc.). EGAN provides comprehensive, automated calculation of meta-data coincidence (over-representation, enrichment) for user- and assay-defined entity subsets (gene lists), and provides direct links to web resources and literature (NCBI Entrez
Gene, PubMed, KEGG, Google, etc.).

EGAN has been built using Cytoscape libraries for graph visualization and layout, and is comparable to DAVID, GSEA, Ingenuity IPA and Ariadne Pathway Studio.

For more reading refer this link – http://akt.ucsf.edu/EGAN/

December 14, 2009

Applications of Systems Biology in Drug Discovery

Filed under: Bioinformatics,Chemoinformatics,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 4:33 am
Tags: , , ,

Till date we have made a lot of posts on Systems Biology, its applications and it scope. Indeed, Systems Biology has brought a big revolution in cell biology and pathway analysis. When seen in combination with treatment of diseases and drug discovery, it proves even more handy. Here we discuss Systems Biology in combination with drug discovery.

The goal of modern systems biology is to understand physiology and disease from the level of molecular pathways, regulatory networks, cells, tissues, organs and ultimately the whole organism. As currently employed, the term ‘systems biology’ encompasses many different approaches and models for probing and understanding biological complexity, and studies of many organisms from bacteria to man. Much of the academic focus is on developing fundamental computational and informatics tools required to integrate large amounts of reductionist data (global gene expression, proteomic and metabolomic data) into models of regulatory networks and cell behavior. Because biological complexity is an exponential function of the number of system components and the interactions between them, and escalates at each additional level of organization.

There are basically three advances in the practical applications of systems biology to drug discovery. These are:

1. Informatic integration of ‘omics’ data sets (a bottom-up approach)

Omics approaches to systems biology focus on the building blocks of complex systems (genes, proteins and metabolites). These approaches have been adopted wholeheartedly by the drug industry to complement traditional approaches to target identification and validation, for generating hypotheses and for experimental analysis in traditional hypothesis-based methods.

2. Computer modeling of disease or organ system physiology from cell and organ response level information available in the literature (a top-down approach to target selection, clinical indication and clinical trial design).
The goal of modeling in systems biology is to provide a framework for hypothesis generation and prediction based on in silico simulation of human disease biology across the multiple distance and time scales of an organism. More detailed understanding of the systems behavior of intercellular signaling pathways, such as the identification of key nodes or regulatory points in networks or better understanding of crosstalk between pathways, can also help predict drug target effects and their translation to organ and organism level physiology.

3.  The use of complex human cell systems themselves to interpret and predict the biological activities of drugs and gene targets (a direct experimental approach to cataloguing complex disease-relevant biological responses).

Pathway modeling as yet remains too disconnected from systemic disease biology to have a significant impact on drug discovery. Top-down modeling at the cell-to-organ and organism scale shows promise, but is extremely dependent on contextual cell response data. Moreover, to bridge the gap between omics and modeling, we need to collect a different type of cell biology data—data that incorporate the complexity and emergent properties of cell regulatory systems and yet ideally are reproducible and amenable to storing in databases, sharing and quantitative analysis.

This is how Systems Biology has aided in Drug Discovery Research and paved its path to cure many vital diseases.

Read our other posts on Systems Biology – https://biointelligence.wordpress.com/category/systems-biology/

November 12, 2009

KEGGConverter: Tool for modelling Metabolic Networks

Filed under: Bioinformatics,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 7:47 am
Tags: , , , ,

The Kyoto Encyclopedia of Genes and Genomes (KEGG) PATHWAY database is a valuable comprehensive collection of manually curated pathway maps for metabolism, genetic information processing and other functions. It is an integrated database resource consisting of 16 main databases, broadly categorized into systems information, genomic information, and chemical information as shown below. Genomic and chemical information represents the molecular building blocks of life in the genomic and chemical spaces, respectively, and systems information represents functional aspects of the biological systems, such as the cell and the organism, that are built from the building blocks. KEGG has been widely used as a reference knowledge base for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.

The KEGG Pathway database is a valuable collection of metabolic pathway maps. Nevertheless, the production of simulation capable metabolic networks from KEGG Pathway data is a challenging complicated work, regardless the already developed tools for this scope. Originally used for illustration purposes, KEGG Pathways through KGML (KEGG Markup Language) files, can provide complete reaction sets and introduce species versioning, which offers advantages for the scope of cellular metabolism simulation modelling.

In order to construct such metabolic pathways, the KEGGConvertor has been implemented. It is a tool implemented in JAVA. KEGGconverter is capable of producing integrated analogues of metabolic pathways appropriate for simulation tasks, by inputting only KGML files. The web application acts as a user friendly shell which transparently enables the automated biochemically correct pathway merging, conversion to SBML format, proper renaming of the species, and insertion of default kinetic properties for the pertaining reactions. It permits the inclusion of additional reactions in the resulting model which represent flux cross-talk with neighbouring pathways, providing in this way improved simulative accuracy.
KEGG Convertor is available here: http://www.grissom.gr/keggconverter/

October 7, 2009

BioSytems: A New Database for Biological Systems

Filed under: Bioinformatics,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 1:08 pm
Tags: , , , ,

Biological Systems are basically formed when a group of molecules interact together. A type of Biological Systems is a biological pathway. Basically a biological pathway comparises of interacting genes, proteins, and small molecules.An understanding of the components, products, and biological effects of biosystems can lead to better understanding of biological processes in normal and disease states, elucidation of possible drug effects and side effects, and other insights to complex processes that have implications for health and medicine.

NCBI has designed a BioSystems database which has a centralized access to existing pathway databases.

Current source databases supported by Biosystems database are:

1. KEGG: Kyoto Encyclopedia of Genes and Genomes (http://www.genome.jp/kegg/) by the Kanehisa Laboratory of the Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan.

2. BioCyc (http://biocyc.org/) is a collection of organism-specific pathway/genome databases (PGDBs), and the EcoCyc (http://ecocyc.org/) subset of BioCyc is included in the NCBI BioSystems database.

3. Reactome (http://www.reactome.org/) is a curated knowledge base of biological pathways, and the human subset of Reactome is included in the NCBI BioSystems database. More about the Biosystems database can be read here: http://www.ncbi.nlm.nih.gov/Structure/biosystems/docs/biosystems_help.html

September 23, 2009

Reactome: A database for pathways and Reactions

Filed under: Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 7:20 am
Tags: , , ,

While studying about Biological pathways and databases, I landed on the home the Reactome Database, Indeed its a great creation. Here is a small introduction to “Reactome”.

Reactome is a free, online, open-source, curated resource of core pathways and reactions in human biology.It is a database which is maintained by the Reactome editorial staff and cross-referenced to the NCBI Entrez Gene, Ensembl and UniProt databases, the UCSC and HapMap Genome Browsers, the KEGG Compound and ChEBI small molecule databases, PubMed, and GO.curated human data are used to infer orthologous events in 22 non-human species including mouse, rat, chicken, puffer fish, worm, fly, yeast, two plants and E.coli.

The Reactome website (www.reactome.org) can be browsed like an online textbook. The website’s front page features a large ‘reaction map’ that summarizes all of the currently curated or inferred pathways, and a table of contents that describes each of the top-level pathways in the database. In the reaction map, each reaction is represented as a small arrow, and arrows are joined end to end to indicate that the output of one reaction becomes the input of the next. The reactions are organized in distinctive patterns to allow researchers to become familiar with the different parts of the reaction network.

Here is a article which talk about Reactome in detail: http://genomebiology.com/2007/8/3/r39

Reactome can be accessed from here: www.reactome.org

Reactome also hosts some tools for data analysis. These are Skypainter and Boiomart. Most probably, my next post would be on these tools. So, keep visiting…!!!

September 7, 2009

What is a Hidden Markov Model?

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 1:32 am
Tags: , , , ,

Hidden Markov Models

Hidden Markov models (HMMs) are a formal foundation for making probabilistic models of linear sequence ‘labeling’ problems. They provide a conceptual toolkit for building complex models just by drawing an intuitive picture. They are at the heart of a diverse range of programs, including genefinding, profile searches, multiple sequence alignment and regulatory site identification.

A Markov model is a probabilistic process over a finite set, {S1, …, Sk}, usually called its states. Each state-transition generates a character from the alphabet of the process.

A Hidden Markov Model (HMM) is simply a Markov Model in which the states are hidden. For example, suppose we only had the sequence of throws from the 3-coin example above, and that the upper-case v. lower-case information had been lost.

HTHHTHHTTTHTTTHHTHHHHTTHTTHTTHT...

We can never be absolutely sure which coin was used at a given point in the sequence but we can calculate the probability.

What’s Hidden in HMM?

It’s useful to imagine an HMM generating a sequence. When we visit a state, we emit a residue from the state’s emission probability distribution. Then, we choose which state to visit next according to the state’s transition probability distribution. The model thus generates two strings of information. One is the underlying state path (the labels), as we transition from state to state. The other is the observed sequence (the DNA), each residue being emitted from one state in the state path.

The state path is a Markov chain, meaning that what state we go to next depends only on what state we’re in. Since we’re only given the observed sequence, this underlying state path is hidden—these are the residue labels that we’d like to infer. The state path is a hidden Markov chain.

Here is a link to an interesting paper on HMMs: http://www.nature.com/nbt/journal/v22/n10/full/nbt1004-1315.html

August 5, 2009

BioGRID: A repository useful for Systems Biology

Filed under: Bioinformatics,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 6:04 am
Tags: , , , ,
Systems Biology is emerging as one of the biggest research trends these days. Talking about pathways, metabolomics, cellular cycles, interactions is common in this field.
While reading on Interaction Datasets , I came across “BioGrid”. Here is a small post on the same.
BioGRID can be explained as Biological General Repository for Interaction Datasets. It distributes collections of protein and genetic interactions from major model organism species. BioGRID currently contains over 198 000 interactions from six different species, as derived from both high-throughput studies and conventional focused studies.
BioGRID interactions are recorded as relationships between two proteins or genes (i.e. they are binary relationships) with an evidence code that supports the interaction and a publication reference. The term “interaction” includes, as well as direct physical binding of two proteins, co-existence in a stable complex and genetic interaction. It should not be assumed that the interaction reported in BioGRID is direct and physical in nature; the experimental system definitions below indicate the nature of the supporting evidence for an interaction between the two biological entities. It should also be noted that some interactions in BioGRID have various levels of evidential support. BioGRID simply curates the result of the experiment from the publication and we do not guarantee that any individual interaction is true, well-established or the current consensus view of the community. Curating all available evidence supporting for an interaction enables orthogonal data from various sources to be collated, allowing users of the database to decide confidence in the existence and/or physiological relevance of that interaction.
More information on Biogrid can be found at: http://www.thebiogrid.org

Systems Biology is emerging as one of the biggest research trends these days. Talking about pathways, metabolomics, cellular cycles, interactions is common in this field.

While reading on Interaction Datasets , I came across “BioGrid“. Here is a small post on the same.

BioGRID can be explained as Biological General Repository for Interaction Datasets. It distributes collections of protein and genetic interactions from major model organism species. BioGRID currently contains over 198 000 interactions from six different species, as derived from both high-throughput studies and conventional focused studies.

BioGRID interactions are recorded as relationships between two proteins or genes (i.e. they are binary relationships) with an evidence code that supports the interaction and a publication reference. The term “interaction” includes, as well as direct physical binding of two proteins, co-existence in a stable complex and genetic interaction. It should not be assumed that the interaction reported in BioGRID is direct and physical in nature; the experimental system definitions below indicate the nature of the supporting evidence for an interaction between the two biological entities. It should also be noted that some interactions in BioGRID have various levels of evidential support. BioGRID simply curates the result of the experiment from the publication and we do not guarantee that any individual interaction is true, well-established or the current consensus view of the community. Curating all available evidence supporting for an interaction enables orthogonal data from various sources to be collated, allowing users of the database to decide confidence in the existence and/or physiological relevance of that interaction.

More information on Biogrid can be found at: www.thebiogrid.org

August 4, 2009

Synthetic Biology… Are you Ready ???

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 8:39 am
Tags: , , , ,

I came accross a “partially” new term today. Had heard about it, but didn’t know what does it exactly do. Yes, The new term is SYNTHETIC BIOLOGY. Thought to share it with you people. Here is a small article on Synthetic Biology and its prospects.


Synthetic biology, the synthesis of biological components and devices and the redesign or creation of new life forms, has enormous potential. Today, many scientists are not content merely to analyze and understand life. They want to create it. Today, synthetic biology is still in its infancy. The job market and the availability of training opportunities reflect the field’s immaturity. But the field is growing and opportunities are emerging for talented scientists with an interdisciplinary focus who are willing to look at things in new ways.

Synthetic biology enables researchers to tackle a huge and diverse range of applied problems: building a cell with the smallest possible genome; synthesizing proteins with extra amino acids–more than the 20 found in nature; using bacteria to produce medicines previously too complex to synthesize; even decomposing living organisms into standard, off-the-shelf ‘biobricks’ that can be assembled on demand. According to scientists

“You truly have to be a jack of all trades, when working with Synthetic Biology”.

It involves concepts of Systems biology, biochemistry, synthetic chemistry, microbiology, and enzymology, along with evolutionary, bioinformatics and what not. Above all, synthetic biology “requires a new way of thinking about biology: the idea that cells are machines and they can be rebuilt the way that electrical engineers now design circuits and instruments.

Synthetic Biology : As a Career

Scientists interested in training in the field should join a lab with expertise in synthetic biology. If you can’t find such a lab, join a lab that has expertise complementary to yours and can provide you with the skills you need. Search for scholarships and research labs and tell people that you are interested in applying either your biological knowledge to the mathematical techniques or the computational mathematical techniques to their biology projects and that you want to give a synthetic biology flavor.

Entering such a multidisciplinar field along invokes many challenges, some many concern etical issues also. Despite these challenges, most experts see synthetic biology as a safe career bet for a talented scientist.

So, people are you ready to explore ????

August 3, 2009

Proteomics: Challenges and Approaches

Filed under: Bioinformatics,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 9:05 am
Tags: , , , ,

Proteomics is the study of the function of all expressed proteins. The term proteome was first coined to describe the set of proteins encoded by the genome1. The study of the proteome, called proteomics, now evokes not only all the proteins in any given cell, but also the set of all protein isoforms and modifications, the interactions between them, the structural description of proteins and their higher-order complexes, and for that matter almost everything ‘post-genomic’. In this overview we will use proteomics in an overall sense to mean protein biochemistry on an unprecedented, high-throughput scale.

Proteomics complements other functional genomics approaches, including microarray-based expression profiles, systematic phenotypic profiles at the cell and organism level, systematic genetics and small-molecule-based arrays. Integration of these data sets through bioinformatics will yield a comprehensive database of gene function that will serve as a powerful reference of protein properties and functions, and a useful tool for the individual researcher to both build and test hypotheses. Moreover,this large-scale data sets will be of utmost importance for the emerging field of systems biology.

Platforms for Proteomics 

Challenges and Approaches in Proteomics

Proteomics would not be possible without the previous achievements of genomics, which provided the ‘blueprint’ of possible gene products that are the focal point of proteomics studies. Some of the recent approaches used in the field of proteomics are:

1. Mass spectrometry-based proteomics

2. Array Based Proteomics

3. Structural Proteomics

4. Proteome informatics

5. Clinical Proteomics

To read more on this visit check out- http://www.nature.com/nature/journal/v422/n6928/full/nature01510.html

« Previous Page