Save
BOT
bot4
Save
Share
Learn
Content
Leaderboard
Learn
Created by
ANU
Visit profile
Cards (13)
Genome
annotation
Identifying the locations of genes, their structures, and their
functions
in a genome
Genome annotation process
1. Sequences
decoded
2. Identify
gene locations
3. Identify
coding
and
non-coding regions
4. Identify
start
and
stop points
of genes
5. Identify
functions
of
genes
Databases used for genome annotation
GenBank
EMBL
WormBase
FlyBase
Types of genome annotation
Structural
annotation: Identification of genomic elements, 3D/4D protein structures, regulatory regions, coding regions, ORFs
Functional annotation: Adding biological information, gene expression, protein
function
Structural
annotation
Differentiate
coding
and
non-coding
regions
Identify
start
and
stop
codons
Structural annotation methods
Experimental data like expressed sequence tags (
ESTs
)
Bioinformatic analyses
(
ab initio
)
Ab initio annotation
Annotation methods that start with just the sequence to be
annotated
Open reading frame (
ORF
)
A portion of a
genome
that contains a sequence of bases that could potentially encode a protein, located between start and
stop codons
Reading frames
DNA is translated per
codon
(
nucleotide triplet
)
Useful ORFs
Based on
homology
approach,
BLAST
and sequence alignments, degeneracy of codons
Predicting gene functions
Using
gene knockout
approaches like RNAi and
CRISPR
Prokaryotic genes
Small genomes have
high gene density
, no introns, operons (one transcript, many genes),
open reading frames
(ORFs)
The NCBI Prokaryotic Genome Annotation Pipeline (
PGAP
) is designed to annotate
bacterial
and archaeal genomes