Biointelligence

September 24, 2009

Pathway Databases – A broader view

Filed under: Bioinformatics,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 7:58 am
Tags: , , , ,

Studying Reactome, actually led me to explore some more databases of pathways and reactions. While browing I eventually landed on a paper “Pathway databases and tools for their exploitation: benefits, current limitations and challenges” authored by Anna Bauer-Mehren, Laura I Furlong & Ferran Sanz. So, my todays post gives an abstract of what this paper is talking about.

Cell signalling studies have been going on from over a decade. This process basically refers to the biochemical processes using which cells respond to cues in their internal or external environment. This eventually led to the creation of chain of reactions and development of databases to store them in a compiled manner. Several databases containing information on cell signalling pathways have now been developed in conjunction with methodologies to access and analyse the data. At present, there are several repositories of information on cell signalling pathways that cover a wide range of signal transduction mechanisms and include high quality data in terms of annotation and cross references to biological databases.

Some of the online pathway databases have been nicely listed here: http://www.nature.com/msb/journal/v5/n1/fig_tab/msb200947_T2.html

This table basically lists Reactome, KEGG, Wikipathways, Nature interaction databases, pathway commons and many more….

The paper also explains the main standards for representation of biological networks, BioPAX and SBML. Furthermore, the advantages and drawbacks of current methods for pathway retrieval and integration, using the EGFR signalling as an illustrative example, have been discussed.

The paper is available here: http://www.nature.com/msb/journal/v5/n1/full/msb200947.html

September 23, 2009

Reactome: A database for pathways and Reactions

Filed under: Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 7:20 am
Tags: , , ,

While studying about Biological pathways and databases, I landed on the home the Reactome Database, Indeed its a great creation. Here is a small introduction to “Reactome”.

Reactome is a free, online, open-source, curated resource of core pathways and reactions in human biology.It is a database which is maintained by the Reactome editorial staff and cross-referenced to the NCBI Entrez Gene, Ensembl and UniProt databases, the UCSC and HapMap Genome Browsers, the KEGG Compound and ChEBI small molecule databases, PubMed, and GO.curated human data are used to infer orthologous events in 22 non-human species including mouse, rat, chicken, puffer fish, worm, fly, yeast, two plants and E.coli.

The Reactome website (www.reactome.org) can be browsed like an online textbook. The website’s front page features a large ‘reaction map’ that summarizes all of the currently curated or inferred pathways, and a table of contents that describes each of the top-level pathways in the database. In the reaction map, each reaction is represented as a small arrow, and arrows are joined end to end to indicate that the output of one reaction becomes the input of the next. The reactions are organized in distinctive patterns to allow researchers to become familiar with the different parts of the reaction network.

Here is a article which talk about Reactome in detail: http://genomebiology.com/2007/8/3/r39

Reactome can be accessed from here: www.reactome.org

Reactome also hosts some tools for data analysis. These are Skypainter and Boiomart. Most probably, my next post would be on these tools. So, keep visiting…!!!

September 22, 2009

Minimum Information about a Microarray Experiment

Filed under: Bioinformatics,Microarray — Biointelligence: Education,Training & Consultancy Services @ 11:20 am
Tags: , , ,

After genome sequencing, DNA microarray analysis has become the most widely used source of genome scale data in the life sciences. Microarray expression studies are producing massive qunatities of gene expression and other functional genomics data, which priomise to provide an insight into gene function and inetractions within and across metabolic pathways. Unlike genome sequence data, however, which have standard formats for presentation and widely used tools and databases, much of the microarray daa generated so far remain inaccessible.

To make this information accesible in a proper format MIAME (Minimum Information About a Microarray Experiment) was introduced. MIAME format was introduced to address the ned for comprehensive annotation necessary to interpret the results of microarray data. It is platform independent but includes essential evidence about how the gene expression level measurements have been obtained.

Although the goal of MIAME is to specify only the content of the information and not the technical format, MIAME includes recommendations for which parts of the information should be provided as controlled vocabularies. MIAME includes a description of the six sections which need to be included:

1. Experimental Design
2. Array Design
3. Samples
4. Hybridizations
5. Measurements
6. Normalization controls

This specific format would really make it easy to interpret microarray data obtained from various experiments.
To read more on MIAME click here-  http://www.mged.org/Workgroups/MIAME/miame.html

 

 
After genome sequencing, DNA microarray analysis has become the most widely used source of genome scale data in the life sciences. Microarray expression studies are producing massive qunatities of gene expression and other functional genomics data, which priomise to provide an insight into gene function and inetractions within and across metabolic pathways. Unlike genome sequence data, however, which have standard formats for presentation and widely used tools and databases, much of the microarray daa generated so far remain inaccessible.

To make this information accesible in a proper format MIAME (Minimum Information About a Microarray Experiment) was introduced. MIAME format was introduced to address the ned for comprehensive annotation necessary to interpret the results of microarray data. It is platform independent but includes essential evidence about how the gene expression level measurements have been obtained.

Although the goal of MIAME is to specify only the content of the information and not the technical format, MIAME includes recommendations for which parts of the information should be provided as controlled vocabularies. MIAME includes a description of the six sections which need to be included:

1. Experimental Design
2. Array Design
3. Samples
4. Hybridizations
5. Measurements
6. Normalization controls

This specific format would really make it easy to interpret microarray data obtained from various experiments.
To read more on MIAME click here-  http://www.mged.org/Workgroups/MIAME/miame.html

 

 
After genome sequencing, DNA microarray analysis has become the most widely used source of genome scale data in the life sciences. Microarray expression studies are producing massive qunatities of gene expression and other functional genomics data, which priomise to provide an insight into gene function and inetractions within and across metabolic pathways. Unlike genome sequence data, however, which have standard formats for presentation and widely used tools and databases, much of the microarray daa generated so far remain inaccessible.

To make this information accesible in a proper format MIAME (Minimum Information About a Microarray Experiment) was introduced. MIAME format was introduced to address the ned for comprehensive annotation necessary to interpret the results of microarray data. It is platform independent but includes essential evidence about how the gene expression level measurements have been obtained.

Although the goal of MIAME is to specify only the content of the information and not the technical format, MIAME includes recommendations for which parts of the information should be provided as controlled vocabularies. MIAME includes a description of the six sections which need to be included:

1. Experimental Design
2. Array Design
3. Samples
4. Hybridizations
5. Measurements
6. Normalization controls

This specific format would really make it easy to interpret microarray data obtained from various experiments.
To read more on MIAME click here-  http://www.mged.org/Workgroups/MIAME/miame.html

September 21, 2009

A New miRNA Knowledgebase: miRò

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 1:59 am
Tags: , , ,

Post Transcriptional Gene Silencing (PTGS) is a highly conserved mechanism of gene expression regulation and microRNAs (miRNAs) are its main actors. These little RNA molecules are able to bind to specific sites located in the 3′ untranslated regions (UTRs) of target transcripts, inhibiting their translation or promoting their degradation. Although much effort has demonstrated their crucial role in several physiological and pathological processes, their mechanisms of action still remain unclear.

miRò is a web-based knowledge base that provides users with miRNA–phenotype associations in humans. It integrates data from various online sources, such as databases of miRNAs, ontologies, diseases and targets, into a unified database equipped with an intuitive and flexible query interface and data mining facilities. The main goal of miRò is the establishment of a knowledge base which allows non-trivial analysis through sophisticated mining techniques and the introduction of a new layer of associations between genes and phenotypes inferred based on miRNAs annotations. Furthermore, a specificity function applied to validated data highlights the most significant associations.
The miRò web site is available at: http://ferrolab.dmi.unict.it/miro.

September 18, 2009

Chemoinformatics Companies Worldwide

Here is a list of companies working in Chemoinformatics and Drug Discovery.

Also check out these:

September 17, 2009

ADAN: A Database for Prediction of Protein Protein Interactions

Filed under: Bioinformatics,Computational Biology,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 12:51 pm
Tags: , , ,

In the last post we had given an introductions to MIPS (Mammalian Prtein Protein Interaction database). Most of the structures and functions of proteome globular domains are yet unknown. We can use high-resolution structures from different modular domains in combination with automatic protein design algorithms to predict genome-wide potential interactions of a protein. Todays post introduces a database whcih helps in prediction of such protein interactions.

ADAN database is a collection of different modular protein domains (SH2, SH3, PDZ, WW, etc.). It contains 3505 entries with extensive structural and functional information available, manually integrated, curated and annotated with cross-references to other databases, biochemical and thermodynamical data, simplified coordinate files, sequence files and alignments. Prediadan, a subset of ADAN database, offers position-specific scoring matrices for protein?protein interactions, calculated by FoldX, and predictions of optimum ligands and putative binding partners. Users can also scan a query sequence against selected matrices, or improve a ligand?domain interaction. The ADAN Database can be accessed from here:

http://adan-embl.ibmc.umh.es/

September 16, 2009

Database for Protein Protein Interactions

Filed under: Bioinformatics,Computational Biology,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 1:28 pm
Tags: , , , ,

 

Proteins are organic compounds made of amino acids arranged in a linear chain and folded into a globular form. These molecules are of great importance because of the function they perform.

Protein associations are studied from the perspectives of biochemistry, quantum chemistry, molecular dynamics, signal transduction and other metabolic or genetic/epigenetic networks. Protein-protein interactions are at the core of the entire Interactomics system of any living cell.These interactions involve not only the direct-contact association of protein molecules but also longer range interactions through the electrolyte, aqueous solution medium surrounding neighbor hydrated proteins over distances from less than one nanometer to distances of several tens of nanometers. Furthermore, such protein-protein interactions are thermodynamically linked functions of dynamically bound ions and water that exchange rapidly with the surrounding solution by comparison with the molecular tumbling rate (or correlation times) of the interacting proteins.

The MIPS Mammalian Protein-Protein Interaction Database is a collection of manually curated high-quality Protein Protein Interaction data collected from the scientific literature by expert curators.The content is based on published experimental evidence that has been processed by human expert curators. MIPS provides the full dataset for download and a flexible and powerful web interface for users with various requirements.

Click here to access MIPS: http://mips.helmholtz-muenchen.de/proj/ppi/

September 14, 2009

Protein Mutant Database: An Introduction

Filed under: Bioinformatics,Computational Biology,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 7:41 am
Tags: , , ,

Protein structure is one of the most important and popular research topics in todays era. Reaesrch on protein structure, sequence and organization gives a broad view of its functionality.

Compliations of protein mutant data are valuable as a basis for protein engineering. They provide information on what kinds of functional and/or structural influences are brought about by amino acid mutation at a specific position of protein. The Protein Mutant Database (PMD) which is being constructed covers natural as well as artificial mutants, including random and site-directed ones, for all proteins except members of the globin and immunoglobulin families. The PMD is based on literature, not on proteins. That is, each entry in the database corresponds to one article which may describe one, several or a number of protein mutants.

Click here to know more on PMD: http://pmd.ddbj.nig.ac.jp/~pmd/whatpmd.html

September 13, 2009

AN INTRODUCTION TO HAPMAP

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 1:32 pm
Tags: , ,

THE INTERNATIONAL HAPMAP PROJECT: http://www.hapmap.org/

The HapMap is a catalog of common genetic variants that occur in human beings. It describes what these variants are, where they occur in our DNA, and how they are distributed among people within populations and among populations in different parts of the world. The International HapMap Project is not using the information in the HapMap to establish connections between particular genetic variants and diseases. Rather, the Project is designed to provide information that other researchers can use to link genetic variants to the risk for specific illnesses, which will lead to new methods of preventing, diagnosing, and treating disease.
The DNA in our cells contains long chains of four chemical building blocks — adenine, thymine, cytosine, and guanine, abbreviated A, T, C, and G. More than 6 billion of these chemical bases, strung together in 23 pairs of chromosomes, exist in a human cell. (See http://www.dnaftb.org/dnaftb/ for basic information about genetics.) These genetic sequences contain information that influences our physical traits, our likelihood of suffering from disease, and the responses of our bodies to substances that we encounter in the environment.
The genetic sequences of different people are remarkably similar. When the chromosomes of two humans are compared, their DNA sequences can be identical for hundreds of bases. But at about one in every 1,200 bases, on average, the sequences will differ (Figure 1). One person might have an A at that location, while another person has a G, or a person might have extra bases at a given location or a missing segment of DNA. Each distinct “spelling” of a chromosomal region is called an allele, and a collection of alleles in a person’s chromosomes is known as a genotype.

Differences in individual bases are by far the most common type of genetic variation. These genetic differences are known as single nucleotide polymorphisms, or SNPs (pronounced “snips”). By identifying most of the approximately 10 million SNPs estimated to occur commonly in the human genome, the International HapMap Project is identifying the basis for a large fraction of the genetic diversity in the human species.

For geneticists, SNPs act as markers to locate genes in DNA sequences. Say that a spelling change in a gene increases the risk of suffering from high blood pressure, but researchers do not know where in our chromosomes that gene is located. They could compare the SNPs in people who have high blood pressure with the SNPs of people who do not. If a particular SNP is more common among people with hypertension, that SNP could be used as a pointer to locate and identify the gene involved in the disease.

However, testing all of the 10 million common SNPs in a person’s chromosomes would be extremely expensive. The development of the HapMap will enable geneticists to take advantage of how SNPs and other genetic variants are organized on chromosomes. Genetic variants that are near each other tend to be inherited together. For example, all of the people who have an A rather than a G at a particular location in a chromosome can have identical genetic variants at other SNPs in the chromosomal region surrounding the A. These regions of linked variants are known as haplotypes.

In many parts of our chromosomes, just a handful of haplotypes are found in humans. [See The Origins of Haplotypes.] In a given population, 55 percent of people may have one version of a haplotype, 30 percent may have another, 8 percent may have a third, and the rest may have a variety of less common haplotypes. The International HapMap Project is identifying these common haplotypes in four populations from different parts of the world. It also is identifying “tag” SNPs that uniquely identify these haplotypes. By testing an individual’s tag SNPs (a process known as genotyping), researchers will be able to identify the collection of haplotypes in a person’s DNA. The number of tag SNPs that contain most of the information about the patterns of genetic variation is estimated to be about 300,000 to 600,000, which is far fewer than the 10 million common SNPs.

Once the information on tag SNPs from the HapMap is available, researchers will be able to use them to locate genes involved in medically important traits. Consider the researcher trying to find genetic variants associated with high blood pressure. Instead of determining the identity of all SNPs in a person’s DNA, the researcher would genotype a much smaller number of tag SNPs to determine the collection of haplotypes present in each subject. The researcher could focus on specific candidate genes that may be associated with a disease, or even look across the entire genome to find chromosomal regions that may be associated with a disease. If people with high blood pressure tend to share a particular haplotype, variants contributing to the disease might be somewhere within or near that haplotype.

September 12, 2009

Proteomics: Prospects and Challenges

Filed under: Bioinformatics,Computational Biology,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 6:34 am

Proteomics is one of the fastest growing areas in areas of research, largely because the global-scale analysis of proteins is expected to yield more direct understanding of function and regulation than analysis of genes. Although significant advances in the comprehensive profiling, functional analysis, and regulation of proteins has occurred in model organisms such as yeast (Saccharomyces cerevisiae) and in humans, proteomics research in plants has not advanced at the same pace. The availability of the complete Arabidopsis (Arabidopsis thaliana) genome, which is small compared to that of other plants, along with an increasingly comprehensive catalog of protein-coding information from large-scale cDNA sequencing (Seki et al., 2004) and transcript mapping experiments, set it apart as a complex but accessible model organism to study plant proteomics. The application of proteomic approaches to plants entails three major challenges: (1) comprehensive identification of proteins, their isoforms, and their prevalence in each tissue; (2) characterizing the biochemical and cellular functions of each protein (3) the analysis of protein regulation and its relation to other regulatory networks. Click to read an article on Prospects of Proteomics and its challenges: http://www.plantphysiol.org/cgi/content/full/138/2/560

Next Page »