1- Genomics

Cards (39)

  • Genomics
    Study of genomes, or ALL the DNA of an organism
  • Types of genomics
    • Structural genomics
    • Comparative genomics
    • Functional genomics
  • Structural genomics

    • Architecture, genetic mapping, sequencing & assembly
  • Comparative genomics

    Multiple genomes allow for comparisons
  • Functional genomics

    What do all of the genes do?
  • Genome studies are revolutionizing evolutionary biology to a greater extent than DNA sequencing has up to this point
  • The Human Genome project was initiated in 1990 and completed 13 years later, but now genomes can be sequenced faster
  • Comparative genomics started only 15 years ago for simple genomes, and only 10 years ago for complex eukaryotes
  • Every year a new vertebrate genome is sequenced, every week a microbial genome of ca. 2 million bp is sequenced; This rate of sequencing is increasing
  • Genomes sequenced by 2009
    • Eukaryotes: ~60
    • Bacteria: ~500
    • Archea: ~40
  • Genomes sequenced by 2024
    • Eukaryotes: ~37K
    • Prokaryotes: ~707K
    • Viruses: ~75K
  • The Villanova ABI sequencer was used in the Human Genome Project, but more modern sequencers completely automate the process with robots from DNA extraction to PCR to sequencing
  • 1 robot can accomplish a day's work for a human in 1 hour
  • Two approaches to sequencing genomes
    • The mapping or hierarchical approach: Divide the genome into segments with genetic and physical maps, then home in on the details
    • The whole-genome or shotgun approach: Entire genome broken into random, overlapping segments that are then sequenced
  • Genetic map
    Genetic crosses and frequency of crossing over are used with polymorphic genetic markers to map the location of genes on chromosomes
  • Humans have 24 genetic maps - 22 autosomal (non sex) chromosomes, the X and Y chromosomes
  • Physical map
    More detailed information about genetic markers obtained from genome sequence data
  • Restriction enzymes with large restriction sites
    Can be used to develop a physical map of chromosomes, but have low resolution
  • Sequence-tagged site
    Unique genetic markers in genome, very helpful for genetic maps
  • Clone Contig Map
    Higher resolution than restriction maps, can be used to sequence entire genome. Get a bunch of YAC or BACs with partially overlapping clones that are continuous for genome's chromosome, then sequence the inserts.
  • Shotgun approach
    1. Take whole genome
    2. Shear it, put it in 2 kb vectors & 10 kb vectors
    3. Sequence it all (500-1000 bp at a time)
    4. Presto genome
  • The shotgun approach is very fast, but limited by computer & repetitive DNA
  • Annotating a genome
    1. Open Reading Frames (ORFs): Computer searches for start codons and stop codons to identify areas that are potential genes
    2. Only ORFs with more than 100 codons are likely genes
  • Over 35% of genes in ANY organism (including Human) have no deducible function
  • The human genome:
    • Genome sequenced in 2003
    • Genes encode noncoding RNA or proteins
    • Approximately 21,000 protein-coding human genes
    • Approximately 22,000 other human genes
    • Repeat sequences are > 50% of genome.
    • Ethnicities have few unique alleles of genes.
    • Greatest amount of genetic variation is in Africa
  • The number of human genes is likely to change in the future as additional scrutiny is required for some genes identified in recent studies
  • 80,000 years ago there were only 10,000 humans on the planet
  • Human genomes vary by at least 9 million base pairs
  • There is more genetic diversity within races than between them in most cases
  • Caenorhabditis elegans (C. elegans)

    • A hermaphroditic roundworm (1 mm) that lives in soils throughout the world, with a genome of 6 chromosomes sequenced in 1998 and found to have 20,000 genes
    • From egg to adult in 3 days
    • Entire genome (6 chromosomes) sequenced in 1998
    • Genome has 20,000 genes
    • 15% genes code for gene expression
  • Genome characteristics
    • GC content varies (e.g. Plasmodium falciparum 20% GC, Thermus thermophilus 70% GC)
    • Gene content and density varies (e.g. Mycoplasma gentialium 480 genes (tiny genome), Homo sapiens ~21,000 genes)
  • Methanococcus jannaschii
    An archaea that is a hyperthermophilic methanogen (bacteria that thrives in high temperatures and pressure without oxygen) with genes for energy, cell division and metabolism like bacteria, and genes for DNA replication, transcription & translation like eukaryotes, confirming archaea as a unique branch of life
  • Arabidopsis thaliana
    First flowering plant genome sequenced in 2000, a model organism with 25,500 genes (more than humans), including 100 similar to human disease-causing genes
  • Bacterial genome trends
    • Mycoplasma has smallest genome at 0.58 Mb
    • Bradyrhizobium soil bacteria has largest at 9.11 Mb
    • On average, about 1 gene per 1-2 kb
    • 85-90% of genomes are coding genes
  • Archaea genome trends
    • Used to be considered bacteria- all live in extreme environments
    • Thermoplasma genome is 1.56 Mb
    • Methanosarcina genome is 5.75 Mb
    • About one gene per 1-1.2 Kb
    • 85-90% of genome is coding genes
  • Eukaryote genome trends
    • Many examples of partial or whole genome duplications, lots of repetitive DNA
    • Gene density decreases with complexity (e.g. fruit fly 1 per 13 kb, humans 1 per 116 kb)
    • Average mammal genome is 3.1 Gb (Gigabase = billion base pairs), frogs 5.0 Gb, salamanders 34.5 Gb and reach 150Gb
  • Fugu
    An unusual vertebrate with a genome size of only 400 Mb, very few introns, and few gene deserts (regions with little genes), with many genes similar to humans so finding a gene in Fugu makes it easier to find in humans
  • Bioinformatics
    A marriage between biology with math and computer science, used to find genes, align sequences, predict structure and function, and figure out evolutionary relationships
  • GenBank
    A database containing millions of DNA sequences for every organism, used for comparative genomics