Biointelligence

August 14, 2009

Python for Biologists

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 3:27 pm
Tags: , , , ,

While reading through a PLOS journal, found an article on the importance of Python for Life Science Researchers.

An article written by Sebastian Bassi, gave a proper insight of what is Python and how it can be of great use for the Life Science Research Community. Here is a summary of that article.


Python is a modern programming language developed in the early 1990s. It is a dynamic high-level language with an easily readable syntax. Python programs are interpreted, meaning that there is no need for compilation into a binary form before executing the programs. This makes Python programs a little slower than programs written in a compiled language, but at current computer speeds and for most tasks this is not an issue and the portability that Python gains as a result of being interpreted is a worthwhile tradeoff.

Basically, this language is easy to learn, easy to read, interpret and multiplatform. Its simplicity is a design choice, made in order to facilitate the learning and use of the language. Another advantage well-suited to newcomers is the optional interactive mode that gives immediate feedback of each statement.

Python can be used to solve several problems that research laboratories face almost everyday. Data manipulation, biological data retrieval and parsing, automation, and simulation of biological problems are some of the tasks that can be performed in an effective way with computers and a suitable programming language.

For a detailed information on python click the link below:

http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0030199

August 11, 2009

EMBRACE – Active registry for bioinformatics web services

Filed under: Bioinformatics — Biointelligence: Education,Training & Consultancy Services @ 12:47 pm
Tags: , , , , ,

Web services’ have become important tools in bioinformatics, allowing databases and algorithms to be accessed programmatically as computational components in programs, workflows and interactive analysis tools. Although these services are becoming common, with an growing adoption of standard protocols and technologies, the mechanisms for collecting and publicizing them are less mature.
A number of mechanisms for finding services have emerged over recent years but they have various limitations.

The EMBRACE (A European Model for Bioinformatics Research and Community Education) Network of Excellence has produced a web service registry that attempts to tackle the limitations and drawbacks of the already existing systems.The EMBRACE Service Registry is a collection of life-science web services with built-in service testing.

The EMBRACE registry is a collection of life-science web services originating from the EMBRACE Network of Excellence. As a potential user of these, you can search the registry for services that match your needs, and find example client software to help you use them in your own programs or workflows. The registry periodically monitors the status and behaviour of the registered services, collating and logging this information so that you can see how reliable services are, and whether they are currently functioning as the service providers expect them to.

To access EMBRACE click on the following link – http://www.embraceregistry.net/

Bioinformatics In Pharma Industry

Bioinformatics provides the computational support for functional genomics which will link the behavior of cells, organism amd population to the information encoded in the genomes, as well as structural genomics. The utility of bioinformatics lies in the identification of useful genes leading to the development of new gene products. The subject covers topics such as protein modeling and sequence alignment, expression data analysis, and comparartive genomics. It combines algorithmic, statistical and database methods for studying biological problems also.

The greatest achievement of bioinformatics methods, the Human Genome Project. Because of this the nature and priorities of bioinformatics research and applications are changing. Many experts believe that this will affect bioinformatics in several ways. For instance some scientists also believe what some people refer to as research or medical informatics, the management of all biomedical experimental data associated with particular molecules or patients – from mass spectroscopy, to in vitro assays to clinical side-effects-move from the concern of those working in drug company and hospital IT (information technology) into the mainstream of cell and molecular biology and migrate from the commercial and clinical to academic sectors.

Drug Development

Only 10% of drug molecules identified in research make it through development. This means that many potential drugs do not make it to market, and expensive time and resources are invested m molecules that will generate no revenue. Simulation and informatics can significantly increase these odds by improving the efficiency of drug development, cutting costs, and improving margins.

Formulation Design

Formulation is the process of mixing Ingredients in such a way as to produce a new or improved product. The formulation department must balance the different marketing and deliverability requirements with cost and chemical constraints to come up with the best possible drug delivery method at the best price. With laboratory results stored in legacy systems, it takes expert company knowledge and experience to know which methods and suppliers are available, let alone to locate them quickly. In many cases scientists find that it is easier to repeat an experiment than to find previous results. This situation is compounded in global R&D set-ups, and after mergers and acquisitions.

Crystallisation and Structure Determination

Determining the crystal structure of an active compound is one of the first steps in pharmaceutical development. The crystal structure of a drug affects how easy it is to formulate, its bio-avail- ability, and its shelf life. Knowledge of the different possible polymorphs of a crystal can also give better patent protection for a drug.

Polymer Modeling

Drug delivery is a complex task. The drug must be delivered in a way that transports the active component intact to the appropriate part of the body. The way the cell takes up the drug is also very important: drugs that go to parts of the body other than the intended target are wasted and may lead to unwanted side effects.

Many delivery devices are polymeric with the drug either solubilised or emulsified in the polymer. Drug delivery systems have mesoscale structures; between 10 to 1000 nm. The amount of computing power required to model these systems at an atomistic level is prohibitive, and macroscale techniques such as Finite element analysis or computational fluid dynamics do not give the required level of detail. Mesoscale modeling, focusing on the nanometer length scale, is helping scientists to develop colloidal delivery systems for drugs.

The great advances in human healthcare that are presaged by the Human Genome Project can be realized by the pharmaceutical industry. A prerequisite for this will be the successful integration of bioinformatics into most aspects of drug discovery. Although, from a scientific viewpoint, this is not a difficult problem, there are formidable technological obstacles. Once these are overcome, rapid progress can be expected.

August 9, 2009

Bioinformatics Tools: NCBI Tools for Data Mining – Part I

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 11:41 am
Tags: , , , , ,

Here is a list of Tools hosted by NCBI for data mining:

Tools for Nucleotide Sequence Analysis

BLAST:

The Basic Local Alignment Search Tool for comparing gene and protein sequences against others in public databases, now comes in several types including PSI-BLAST, PHI-BLAST, and BLAST 2 sequences. Specialized BLASTs are also available for human, microbial, malaria, and other genomes, as well as for vector contamination, immunoglobulins, and tentative human consensus sequences.

Electronic PCR :

It allows you to search your DNA sequence for sequence tagged sites (STSs) that have been used as landmarks in various types of genomic maps. It compares the query sequence against data in NCBI’s UniSTS, a unified, non-redundant view of STSs from a wide range of sources.

Entrez Gene:

Each Entrez Gene record encapsulates a wide range of information for a given gene and organism. When possible, the information includes results of analyses that have been done on the sequence data. The amount and type of information presented depend on what is available for a particular gene and organism and can include: (1) graphic summary of the genomic context, intron/exon structure, and flanking genes, (2) link to a graphic view of the mRNA sequence, which in turn shows biological features such as CDS, SNPs, etc., (3) links to gene ontology and phenotypic information, (4) links to corresponding protein sequence data and conserved domains, (5) links to related resources, such as mutation databases. Entrez Gene is a successor to LocusLink.

Model Maker:

allows you to view the evidence (mRNAs, ESTs, and gene predictions) that was aligned to assembled genomic sequence to build a gene model and to edit the model by selecting or removing putative exons. You can then view the mRNA sequence and potential ORFs for the edited model and save the mRNA sequence data for use in other programs. Model Maker is accessible from sequence maps that were analyzed at NCBI and displayed in Map Viewer.

ORF Finder:

ORF Finder identifies all possible ORFs in a DNA sequence by locating the standard and alternative stop and start codons. The deduced amino acid sequences can then be used to BLAST against GenBank. ORF finder is also packaged in the sequence submission software Sequin.

SAGEMAP:

It is a tool for performing statistical tests designed specifically for differential-type analyses of SAGE (Serial Analysis of Gene Expression) data. The data include SAGE libraries generated by individual labs as well as those generated by the Cancer Genome Anatomy Project (CGAP), which have been submitted to Gene Expression Omnibus (GEO). Gene expression profiles that compare the expression in different SAGE libraries are also available on the Entrez GEO Profiles pages. It is possible to enter a query sequence in the SAGEmap resource to determine what SAGE tags are in the sequence, then map to associated SAGEtag records and view the expression of those tags in different CGAP SAGE libraries.

Spidey:

It aligns one or more mRNA sequences to a single genomic sequence. Spidey will try to determine the exon/intron structure, returning one or more models of the genomic structure, including the genomic/mRNA alignments for each exon.

VecScreen:

It is a tool for identifying segments of a nucleic acid sequence that may be of vector, linker, or adapter origin prior to sequence analysis or submission. VecScreen was developed to combat the problem of vector contamination in public sequence databases.

Part II of NCBI Tools in the next post… Keep Visiting !!!!

August 8, 2009

What is Kiosk Viewer ?

Filed under: Bioinformatics,Proteomics — Biointelligence: Education,Training & Consultancy Services @ 2:00 am
Tags: , , , , ,

Want to study protein s and their structures ?? Want to have an insight of the chosen protein structure ?? Then Kiosk Viewer is the right tool for you.

The Molecules in Motion Kiosk Viewer is a full-screen animation program that displays structures from different angles and perspectives, and focuses on chemical components within the structure. The Kiosk Viewer can be launched for any structure from the “Other Viewers” menu on the structure summary page, in PDB.

Here is a screenshot of the summary page from where Kiosk can be accessed.

Access Molecules in Motion using Kiosk Viewer

The Kiosk program runs on Mac, Windows and some versions of Linux (i.e. CentOS 5) only, and requires the latest version of Java. The program automatically downloads coordinate files into a folder, which lets users run Kiosk on an offline computer.

Here is a screen shot for the same.

A screenshot of 1KYSK protein Molecule in Kiosk Viewer

To customize the list of structures displayed in Kiosk, right click on the Kiosk Viewer link and save the file with a new name (with the .jnlp extension), for example myFavorites.jnlp. Edit the PDB IDs listed in the file and save it. Double click to launch Kiosk Viewer; press the Esc key to exit Kiosk Viewer.

To access Kiosk Viewer, follow this link: http://www.rcsb.org/pdb/static.do?p=general_information/news_publications/news/news_2009.html#20090804

August 7, 2009

COBALT: A new tool for Multiple Sequence Alignment

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 9:47 am
Tags:

The simultaneous alignment of multiple sequences (multiple alignment) serves as a building block in several fields of computational biology,  such as phylogenetic studies, detection of conserved motifs, prediction of functional residues and secondary structure, prediction of correlations and even quality assessment of protein sequences. For this an accurate  multiple sequence alignment tool was one of the biggest requirement from a long time.

COBALT (Constraint based Multiple Alignment Tool) is a multiple sequence alignment tool that finds a collection of pairwise constraints derived from conserved domain database, protein motif database, and sequence similarity, using RPS-BLAST, BLASTP, and PHI-BLAST. Pairwise constraints are then incorporated into a progressive multiple alignment.

COBALT has a general framework that uses progressive multiple alignment to combine pairwise constraints from different sources into a multiple alignment. COBALT does not attempt to use all available constraints but uses only a high-scoring consistent subset that can change as the alignment progresses, where a set of constraints is called consistent if all of the constraints in the set can be simultaneously satisfied by a multiple alignment. Using the RPS-BLAST tool, we can quickly search for domains in CDD that match to regions of input sequences. When the same domain matches to multiple sequences, we can infer several potential pairwise constraints based on these domain matches. Furthermore, CDD also contains auxiliary information that allows COBALT to create partial profiles for input sequences before progressive alignment begins, and this avoids computationally expensive procedures for building profiles.

COBALT has a general framework that uses progressive multiple alignment to combine pairwise constraints from different sources into a multiple alignment. COBALT does not attempt to use all available constraints but uses only a high-scoring consistent subset that can change as the alignment progresses, where a set of constraints is called consistent if all of the constraints in the set can be simultaneously satisfied by a multiple alignment. Using the RPS-BLAST tool, we can quickly search for domains in CDD that match to regions of input sequences. When the same domain matches to multiple sequences, we can infer several potential pairwise constraints based on these domain matches. Furthermore, CDD also contains auxiliary information that allows COBALT to create partial profiles for input sequences before progressive alignment begins, and this avoids computationally expensive procedures for building profiles.

COBALT is implemented in NCBI C++ Toolkit. More information on COBALT can be found at:

http://bioinformatics.oxfordjournals.org/cgi/content/full/23/9/1073

To access COBALT use this link: http://www.ncbi.nlm.nih.gov/tools/cobalt/cobalt.cgi?link_loc=BlastHomeAd

August 5, 2009

Bioinformatics Companies

Here is a list of Bioinformatics Companies worldwide. Would be soon posting on companies working in specialised areas of bioinformatics.


List of Bioinformatics Companies World Wide

Australia

  1. Nucleics
  2. Australian Genome Research Facility
  3. IBM Healthcare and Life Sciences
  4. CSIRO Bioinformatics
  5. Minomic
  6. Proteome Systems

Austria

  1. ProCeryon Biosciences GmbH
  2. Lambda Labor für Molekularbiologische DNA-Analysen GmbH
  3. Upper Austria Research
  4. DSM fine Chemicals Austria
  5. Pfitzer
  6. ARC Seibersdorf Research GmbH
  7. Roche Austria
  8. CD Labor f. Genomik und Bioinformatik
  9. Gen-au, Genomforschung Austria
  10. Inte:Ligand

Belgium

  1. Algonomics
  2. Bayer Bioscience
  3. BioXpr – computer science & molecular biology
  4. Tibotec
  5. VircoLab
  6. Biodata
  7. Applied Maths

Canada

  1. Caprion Proteomics
  2. Zymeworks
  3. BioMolTech
  4. Biotools Inc
  5. Molecular Mining Corporation
  6. Base4 Bioinformatics Inc.
  7. Bioinformatics Solutions
  8. Chemical Computing Group

Denmark

  1. CLC Bio
  2. Bioinformatics ApS

Finland

  1. Genolyze Ltd

France

  1. Partner Chip
  2. BioSolution
  3. Korilog

Germany

  1. Cubic Design
  2. Biomax Informatics
  3. BIOBASE Biological databases

IceLand

  1. deCODEme

India

  1. HH Biotechnologies
  2. BIOBASE Biological Databases
  3. Astrazeneca
  4. Avesthagen
  5. Cell Lines
  6. Monsanto
  7. INFOVALLEY Biosystem India Pvt Ltd
  8. Strand Life Sciences (formerly Strand Genomics)
  9. Connexios Life Sciences Pvt. Ltd.
  10. GVK Biosciences Pvt Ltd
  11. IBM Life Sciences
  12. Metahelix Life Sciences Pvt Ltd
  13. Biocon, Ltd
  14. Genbios
  15. BioCOS Life Sciences
  16. Jubilant Biosys
  17. Jigsaw Bio Solutions
  18. Nectar Lifesciences Ltd
  19. Orchid Chemicals & Pharmaceuticals Ltd
  20. Neozene Bio Sciences
  21. Neogen Biosolutions
  22. ATGC Labs
  23. Ranbaxy Laboratories Limited
  24. TATA Consultancy Service
  25. Ocimum Biosolutions
  26. Dr.Reddy’s Pharmaceutical Company
  27. BioMinds Life Sciences Pvt. Ltd
  28. BioMed Informatics
  29. Ingenovis
  30. GlaxoSmithKline Pharmaceuticals Ltd.
  31. Sun Pharmaceutical Industries Ltd
  32. Rishi Biotech
  33. C-DAC: Centre for Development of Advanced Computing
  34. SooryaKiran Bioinformatics

Ireland

  1. SlidePath

Israel

  1. Evogene Ltd
  2. Compugen
  3. Optimata

Italy

  1. ICGEB

Malaysia

  1. Synamatix

New Zealand

  1. Biomatters
  2. Hoare Research Software
  3. HortResearch

Norway

  1. Interagon
  2. MolMine
  3. PubGene
  4. Sencel Bioinformatics

Russia

  1. GeneGo

Singapore

  1. Lilly Singapore Centre for Drug Discovery

South Africa

  1. ICGEB

Spain

  1. Integromics™ | IT for Life Sciences
  2. Bioalma
  3. Ariadne Genomics Europe

Sweden

  1. Qlucore
  2. Agile Molecule

Switzerland

  1. Merck Serono International
  2. Detectorvision
  3. Genedata
  4. Geneva Bioinformatics(GeneBio)

United Kingdom

  1. Astex Technology
  2. ePitope Informatics Ltd
  3. InfoQuant
  4. SimuGen
  5. ProGeniq
  6. BlueGnome
  7. etrials
  8. IDBS
  9. InforSense
  10. Matrix Science

United States of America

  1. 23andme
  2. Accelrys
  3. Navigenics
  4. Rosetta Biosoftware
  5. GeneSifter
  6. Seralogix
  7. Ariadne Genomics
  8. ATGCLabs
  9. BioAnalytics Group
  10. Bio-Rad
  11. Geospiza
  12. VigeneTech
  13. Allometra
  14. Ariadne Genomics
  15. Axcell
  16. Biodiscovery
  17. Biopharm Systems
  18. Biotique Systems
  19. BioWisdom
  20. Cellnomica
  21. Cira Discovery Sciences
  22. Cognia
  23. IBM (Bioinformatics and Pattern Discovery Group)
  24. Ocimum Biosolutions

Please keep adding if you are know a company working in this stream !!

BioGRID: A repository useful for Systems Biology

Filed under: Bioinformatics,Systems Biology — Biointelligence: Education,Training & Consultancy Services @ 6:04 am
Tags: , , , ,
Systems Biology is emerging as one of the biggest research trends these days. Talking about pathways, metabolomics, cellular cycles, interactions is common in this field.
While reading on Interaction Datasets , I came across “BioGrid”. Here is a small post on the same.
BioGRID can be explained as Biological General Repository for Interaction Datasets. It distributes collections of protein and genetic interactions from major model organism species. BioGRID currently contains over 198 000 interactions from six different species, as derived from both high-throughput studies and conventional focused studies.
BioGRID interactions are recorded as relationships between two proteins or genes (i.e. they are binary relationships) with an evidence code that supports the interaction and a publication reference. The term “interaction” includes, as well as direct physical binding of two proteins, co-existence in a stable complex and genetic interaction. It should not be assumed that the interaction reported in BioGRID is direct and physical in nature; the experimental system definitions below indicate the nature of the supporting evidence for an interaction between the two biological entities. It should also be noted that some interactions in BioGRID have various levels of evidential support. BioGRID simply curates the result of the experiment from the publication and we do not guarantee that any individual interaction is true, well-established or the current consensus view of the community. Curating all available evidence supporting for an interaction enables orthogonal data from various sources to be collated, allowing users of the database to decide confidence in the existence and/or physiological relevance of that interaction.
More information on Biogrid can be found at: http://www.thebiogrid.org

Systems Biology is emerging as one of the biggest research trends these days. Talking about pathways, metabolomics, cellular cycles, interactions is common in this field.

While reading on Interaction Datasets , I came across “BioGrid“. Here is a small post on the same.

BioGRID can be explained as Biological General Repository for Interaction Datasets. It distributes collections of protein and genetic interactions from major model organism species. BioGRID currently contains over 198 000 interactions from six different species, as derived from both high-throughput studies and conventional focused studies.

BioGRID interactions are recorded as relationships between two proteins or genes (i.e. they are binary relationships) with an evidence code that supports the interaction and a publication reference. The term “interaction” includes, as well as direct physical binding of two proteins, co-existence in a stable complex and genetic interaction. It should not be assumed that the interaction reported in BioGRID is direct and physical in nature; the experimental system definitions below indicate the nature of the supporting evidence for an interaction between the two biological entities. It should also be noted that some interactions in BioGRID have various levels of evidential support. BioGRID simply curates the result of the experiment from the publication and we do not guarantee that any individual interaction is true, well-established or the current consensus view of the community. Curating all available evidence supporting for an interaction enables orthogonal data from various sources to be collated, allowing users of the database to decide confidence in the existence and/or physiological relevance of that interaction.

More information on Biogrid can be found at: www.thebiogrid.org

August 4, 2009

Synthetic Biology… Are you Ready ???

Filed under: Bioinformatics,Computational Biology — Biointelligence: Education,Training & Consultancy Services @ 8:39 am
Tags: , , , ,

I came accross a “partially” new term today. Had heard about it, but didn’t know what does it exactly do. Yes, The new term is SYNTHETIC BIOLOGY. Thought to share it with you people. Here is a small article on Synthetic Biology and its prospects.


Synthetic biology, the synthesis of biological components and devices and the redesign or creation of new life forms, has enormous potential. Today, many scientists are not content merely to analyze and understand life. They want to create it. Today, synthetic biology is still in its infancy. The job market and the availability of training opportunities reflect the field’s immaturity. But the field is growing and opportunities are emerging for talented scientists with an interdisciplinary focus who are willing to look at things in new ways.

Synthetic biology enables researchers to tackle a huge and diverse range of applied problems: building a cell with the smallest possible genome; synthesizing proteins with extra amino acids–more than the 20 found in nature; using bacteria to produce medicines previously too complex to synthesize; even decomposing living organisms into standard, off-the-shelf ‘biobricks’ that can be assembled on demand. According to scientists

“You truly have to be a jack of all trades, when working with Synthetic Biology”.

It involves concepts of Systems biology, biochemistry, synthetic chemistry, microbiology, and enzymology, along with evolutionary, bioinformatics and what not. Above all, synthetic biology “requires a new way of thinking about biology: the idea that cells are machines and they can be rebuilt the way that electrical engineers now design circuits and instruments.

Synthetic Biology : As a Career

Scientists interested in training in the field should join a lab with expertise in synthetic biology. If you can’t find such a lab, join a lab that has expertise complementary to yours and can provide you with the skills you need. Search for scholarships and research labs and tell people that you are interested in applying either your biological knowledge to the mathematical techniques or the computational mathematical techniques to their biology projects and that you want to give a synthetic biology flavor.

Entering such a multidisciplinar field along invokes many challenges, some many concern etical issues also. Despite these challenges, most experts see synthetic biology as a safe career bet for a talented scientist.

So, people are you ready to explore ????

Education in Chemoinformatics

Filed under: Chemoinformatics — Biointelligence: Education,Training & Consultancy Services @ 2:47 am
Tags: , , , ,
Chemoinformatics is rapidly becoming a core part of drug design informatics, yet the educational
opportunities in the field are currently limited.Like many of today’s emerging life science fields, chemoinformatics
has become a ‘hot topic’ while it is still in the process of
finding its identity. Indeed it is not yet clear how to spell the name
of the field: some prefer cheminformatics – no ‘o’ – and others,
including ourselves, use entirely different terms, such as chemical
informatics. What is clear is that the techniques that this field
concerns itself with – the processing of chemical and related
information on computers – are becoming central to the processes
of modern drug discovery.Here

Here is a small post which gives an overview of the current requirements and the courses available in Cheminformatics.

Chemoinformatics is rapidly becoming a core part of drug design informatics, yet the educational opportunities in the field are currently limited.

Like many of today’s emerging life science fields, chemoinformatics has become a ‘hot topic’ while it is still in the process of finding its identity.

Indeed it is not yet clear how to spell the name of the field: some prefer cheminformatics – no ‘o’ – and others, including ourselves, use entirely different terms, such as chemical informatics. What is clear is that the techniques that this field concerns itself with – the processing of chemical and related information on computers – are becoming central to the processes of modern drug discovery.

Interest in chemoinformatics is now becoming widespread, but this greatly increased exposure has highlighted the fact that there are very few people with high-level chemoinformatics skills. The principal source of such individuals in the past has been doctoral students and post-doctoral staff who have spent time in one of the few academic groups world-wide who carry out research in this area, with job opportunities also becoming available to individuals who have worked in areas of chemistry that involve significant computation – such as X-ray crystallography or computational chemistry – or in related areas such as bioinformatics or computational biology. However, there are still too few trained staff available to meet the emerging need, and this has spurred the development of university courses that can provide students with the necessary skills, at both undergraduate and postgraduate levels.

Academic Programs in Chemoinformatics

A small number of universities have established chemoinformatics programs . The most widely recognized and well-established research and teaching base in the field is the Department of Information Studies at the University of Sheffield, which offers Master of science (MSc, or MS) degree and PhD qualifications in chemoinformatics. Subsequent programs have been developed at the University of Manchester Institute of Science and Technology (UMIST), now merged with the University of Manchester, UK, and the School of Informatics at Indiana University (IU), IN, USA.

For a more detailed view refer to the following links:
Next Page »