The use of computer to store, retrieve, analyse or predict the composition or structure of bio-molecules
Bioinformatics is the application of computational techniques and information technology to the organisation and management of biological data
Classical bioinformatics deals primarily with sequence analysis
Aims of bioinformatics
Development of database containing all biological information
Development of better tools for data designing, annotation and mining
Design and development of drugs by using simulation software
Design and development of software tools for protein structure prediction function, annotation and docking analysis
Creation and development of software to improve tools for analysing sequences for their function and similarity with other sequences
Applications of bioinformatics
Gene therapy
Drug designing
Antibiotic resistance
Crop development
Medicine biotechnology
Drought resistance
Evolutionary studies
Forensic analysis
Veterinary science
Weather analysis
Waste cleanup
Biological database
A collection of biological data arranged in computer readable form that enhances the speed of search and retrieval and convenient to use
Biological data are complex, exception-ridden, vast and incomplete
A good database must have updated information
Importance of biological database
Retrieve a range of information like biological sequences, structures, binding sites, metabolic interactions, molecular action, functional relationships, protein families, motifs and homologous
Types of biological database
Nucleotide sequence database
Protein sequence database
Structure database
Domain and motif database
Gene expression database
Metabolic pathway database
Primary database
Contains only sequence or structural information
Secondary database
Derived from the analysis or treatment of primary data
Secondary databases are very important for inferring protein function
GeneBank
One of the fastest growing repositories of known nucleotide sequences, has a flat file structure, readable by both humans and computers
GeneBank contains information such as accession numbers and gene names, phylogenetic classification and references to published literature
GeneBank has been developed and maintained at the NCBI, Bethesda, MD, USA, as a part of International Sequence Database Collaboration (INSDC)
GeneBank is an open access sequence database
GeneBank coordinates with individual laboratories and other sequence databases like EMBL and DDBJ
GeneBank is an annotated collection of all nucleotide sequences that are available to the public
The nucleotide database was divided into three databases at NCBI: CoreNucleotide database, Expressed Sequence Tag (EST) and Genome Survey Sequence (GSS)
CoreNucleotide database has most of the nucleotide sequences used. It also encloses all nucleotide records that are not in the EST and GSS databases
Submission of sequences to GeneBank can be done using BankIt, Sequin and tbl2asn tools
EMBL (European Molecular Biology Laboratory)
A comprehensive database of DNA and RNA sequences, collected from scientific literature, patient offices and is directly submitted by researchers
EMBL has been prepared in collaboration with GeneBank (USA) and the DNA Database of Japan (DDBJ)
EMBL is established in 1980 and maintained by EBI (European Bioinformatics Institute)
Swiss-Port
A curated protein sequence database that offers a high level of integration with other databases and also has a very low level of redundancy
Swiss-Port strives to provide protein sequences with a high level of annotation (for instance, the description of protein function, domain structure and post translational modifications, etc.)
Swiss-Port is established in 1986 and maintained collaboratively, since 1987, by the department of Medical Biochemistry of the University of Geneva and the EMBL data Library
TrEMBL is a computer–annotated supplement of Swiss-Port that contains all translations of EMBL nucleotide sequence entries, which is not yet integrated in Swiss-Port
Currently Swiss-Port have 0.5 and TrEMBL have 7.6 million sequences
Protein Information Resource (PIR)
An integrated public bioinformatics resource to support genomic and proteomic research and scientific studies
PIR offers a wide variety of resources mainly oriented to assisting the propagation and consistency of protein annotations like PIRSF, ProClass and ProLINK
Protein sequence motif
A set of conserved amino acid residues that are important for protein function and are located within a certain distance from one another
PROSITE database
Consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them
PRINT
A database for protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family
Protein domain
An independently folded, structurally compact unit that forms a steady three-dimensional structure and shows a certain level of evolutionary conservation
ProDom
A protein domain database automatically generated from the Swiss-Port and TrEMBL sequence database
SMART
A highly reliable and sensitive tool for domain identification
COG
A database and a convenient tool for motif and domain identification
PDB (Protein Data Bank)
The main primary database for 3D structures of biological macromolecules determined by X-ray, crystallography and NMR