Empirical algorithm developed for the prediction of protein secondary structure
If the sequence holds the secrets of folding, can we figure it out?
Many protein chemists have tried to predict structure based on sequence
Experimental methods used by biotechnologists to determine the structures of proteins demand sophisticated equipment & time
A host of computational methods are developed to predict the location of secondary structure elements in proteins for complementing or creating insights into experimental results
Chou-Fasman algorithm: each amino acid is assigned a "propensity" for forming helices or sheets
Homologues
Proteins with sequences that exhibit significant similarity are assumed to be evolutionarily related
Orthologs
Genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution.
Paralogs
Genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.
Protein with sequences identical by more than 30% are almost certain to be homologous
The degree of similarity needed depends on length of sequence. A shorter sequence has a higher probability of being similar just by chance, so it needs greater similarity for confidence in homology.
When two homologous proteins are aligned, there are one or more regions where sequence identity is particularly high, and these regions frequently enable the definition of motifs or signature sequences that are diagnostic
Domain
Polypeptide chain (or part of one) that can independently fold into a stable compact tertiary structure or fold. It is the fundamental building blocks of proteins.
Module
Highly conserved sequence that is observed in different contexts in multidomain proteins
A substantial proportion of all proteins are composed of more than one domain
Chimeric/mosaic protein
Protein consisting of multiple modules
Fold
3D arrangement or topology of secondary structure elements
Every domain or module adopts some kind of fold
Many proteins are constructed as a composite of two or more "modules" or domains, each of which is a recognizable domain that can also be found in other proteins
Modules are sometimes used repeatedly in the same protein
Low complexity regions
Regions of multidomain proteins that separate the individual domains, also referred to as linker sequences
Long stretches of repeated residues, particularly proline, glutamine, serine or threonine often indicate linker sequences
Secondary (pattern) databases
Databases that contain sequence patterns (motifs, signatures, blocks, profiles) common tohomologous proteins or protein modules
These motifs, usually of ~10-20 amino acids length, commonly correspond to key functional or structural elements, often domains/modules, and are extremely useful in identifying such features in new uncharacterized proteins
An unknown protein can potentially be identified by the occurrence in its sequence of a particular motif, even if it is too distantly related to any protein of known sequence to detect its resemblance by overall sequence alignment
Pfam
A collection of multiple alignments and profile hidden Markov models of protein domain families, based on proteins from both SWISS-PROT and SP-TrEMBL
SMART
A tool that allows the identification and annotation of genetically mobile domains and the analysis of domain architectures
PROSITE
A database of protein families and domains, consisting of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs
Lysozyme
Sometimes described as having 2 domains (α-helical region + β-sheet region), but they cannot fold independently into stable units and are never found separately in proteins. Therefore, it is classified as a single domain with subdomains.
Ankyrin domain from the Notch receptor in Drosophila
Composed of multiple copies of small structural units (helix-turn units), but the individual units have no structure or biological function. Therefore, Ankyrin is a single domain with subdomains.
Rossmann fold
Functions to bind nucleotides (NADP/ATP), typically consists of βαβαβ structure motif with a conserved sequence element (GXGXXG) that binds to the phosphate group of the nucleotide. The domain structure is so closely linked to the nucleotide-binding function that the presence of this structure is a good indicator of function.
TIM barrel
A common domain structure found in 10% of all enzymes, is a β-barrel formed by a series of a βαβ structure motifs. The active site is always in the same position, in the cleft defined by the loops at the C-terminus, but the function can vary widely.
HTH domain
A major structural motif observed in proteins capable of binding DNA, consists of a "helix-turn-helix" (HTH) motif. The HTH motif folds independently and is thus a genuine domain, with one helix contributing to DNA recognition and the second helix stabilizing the interaction between protein and DNA.