Bioinformatics

Cards (25)

  • Bioinformatics - field which uses computers to store and analyze molecular biological information.
  • Bioinformatics is the marriage of biology and informatics
  • Bioinformatics - is about finding and interpreting biological data online.
  • Bioinformatics is an interdisciplinary field which harnesses different fields that are combined altogether to form bioinformatics.
    • Computer Science
    • Statistics
    • Mathematics
    • Biology
    • Infotechnology
  • 3 Principal Components
    1. Creation of databases - allows storage and management of large biological data sets
    2. Development of algorithms and statistics
    3. Use for analysis and interpretation of various types of biological data
  • Branches of Bioinformatics
    1. Transcriptomics - about RNA molecules of living organisms
    2. Microbiomics - genomes of bacteria, viruses, fungi, or parasites
    3. Metabolomics - chemical process of metabolites
    4. Genomics
    5. Proteomics - sequence and 3D structure, and other properties of proteins
  • Bioinformatics Application
    1. Retrieving DNA sequences from databases
    2. Computing nucleotide compositions
    3. Identifying restriction sites
    4. Designing polymerase chain-reaction (PCR) primers
    5. Identifying open reading frames (ORFs)
    6. Predicting elements of DNA/RNA secondary structure
    7. Finding repeats
    8. Computing the optimal alignment between two or more DNA sequences
    9. Finding polymorphic sites
    10. Assembling sequence fragments
    11. Creation and visualization of 3D structure models
  • in silico - virtual experimentation, done in a computer instead of a real laboratory
    • Ex: primer designing
  • Earliest DNA Sequences Protein Databases
    1. International Nucleotide Sequence Database Collaboration (INSDC)
    • GenBank (from NCBI)
    • EMBL (European Molecular Biology Lab) from EBI
    • DDBJ (DNA DataBank of Japan)
    1. Worldwide Protein Database (WPDB)
    • PDBj (Japan)
    • PDBe (Europe)
    • RCSB PDB (USA)
  • Ensemble - an automatic annotation database that determines the boundary of an exon and
    intron of eukaryotic gene.
  • True
    (T/F) GenBank can provide the nucleotide and protein sequences of organisms
  • Data Inclusions in GenBank:
    • number of base pairs
    • Accession Number
    • Organism
    • Sources
    • Authors
    • Nucleotide or Protein Sequence
  • Features of GenBank
    • Pick Primers - for designing of primers
    • Run BLAST - to identify query sequences
    • Find in This Sequence
  • PBD is the main database used for the predication of the 3D structures of proteins and nucleic acids
  • Sequence Alignment - way of rearranging sequences of DNA, RNA, or protein to identify regions of similarity
  • Query Sequence - unknown sequence
    Reference Sequence - known sequence
  • Importance of regions of similarity:
    • To understand functional, structural, or evolutionary relationships between the sequences
    • It may also help identify dissimilar regions of the DNA sequence useful for designing primers
  • Types of Sequence Alignment
    1. Pairwise - compare 2 sequences
    2. Multiple - compare 2 or more sequences
  • Types of Pairwise Alignment
    1. Global Alignment - Matching the residues (bases or amino acids) of two sequences across their entire length
    2. Local Alignment - Matching of two sequences from regionswhich have more similarity with each other
  • False
    (T/F) In EMBOSS water, dissimilar bases are indicated by an asterisk
  • MUSCLE – Multiple Sequence Comparison by Log Expectation
  • MAFFT – Multiple Alignment using Fast Fourier Transform
  • True
    T or F In Clustal Omega, residues are colored and similarities are designated with asterisks
  • False
    T or F In designing primers, you have to look for a part where there is a more similarities or asterisks
  • True
    T or F MUSCLE uses dash lines for gaps and asterisks for similarities