10

Cards (33)

  • Chou-Fasman algorithm

    Empirical algorithm developed for the prediction of protein secondary structure
  • If the sequence holds the secrets of folding, can we figure it out?
  • Many protein chemists have tried to predict structure based on sequence
  • Experimental methods used by biotechnologists to determine the structures of proteins demand sophisticated equipment & time
  • A host of computational methods are developed to predict the location of secondary structure elements in proteins for complementing or creating insights into experimental results
  • Chou-Fasman algorithm: each amino acid is assigned a "propensity" for forming helices or sheets
  • Homologues
    Proteins with sequences that exhibit significant similarity are assumed to be evolutionarily related
  • Orthologs
    Genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution.
  • Paralogs
    Genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.
  • Protein with sequences identical by more than 30% are almost certain to be homologous
  • The degree of similarity needed depends on length of sequence. A shorter sequence has a higher probability of being similar just by chance, so it needs greater similarity for confidence in homology.
  • When two homologous proteins are aligned, there are one or more regions where sequence identity is particularly high, and these regions frequently enable the definition of motifs or signature sequences that are diagnostic
  • Domain
    Polypeptide chain (or part of one) that can independently fold into a stable compact tertiary structure or fold. It is the fundamental building blocks of proteins.
  • Module
    Highly conserved sequence that is observed in different contexts in multidomain proteins
  • A substantial proportion of all proteins are composed of more than one domain
  • Chimeric/mosaic protein

    Protein consisting of multiple modules
  • Fold
    3D arrangement or topology of secondary structure elements
  • Every domain or module adopts some kind of fold
  • Many proteins are constructed as a composite of two or more "modules" or domains, each of which is a recognizable domain that can also be found in other proteins
  • Modules are sometimes used repeatedly in the same protein
  • Low complexity regions

    Regions of multidomain proteins that separate the individual domains, also referred to as linker sequences
  • Long stretches of repeated residues, particularly proline, glutamine, serine or threonine often indicate linker sequences
  • Secondary (pattern) databases
    Databases that contain sequence patterns (motifs, signatures, blocks, profiles) common to homologous proteins or protein modules
  • These motifs, usually of ~10-20 amino acids length, commonly correspond to key functional or structural elements, often domains/modules, and are extremely useful in identifying such features in new uncharacterized proteins
  • An unknown protein can potentially be identified by the occurrence in its sequence of a particular motif, even if it is too distantly related to any protein of known sequence to detect its resemblance by overall sequence alignment
  • Pfam
    A collection of multiple alignments and profile hidden Markov models of protein domain families, based on proteins from both SWISS-PROT and SP-TrEMBL
  • SMART
    A tool that allows the identification and annotation of genetically mobile domains and the analysis of domain architectures
  • PROSITE
    A database of protein families and domains, consisting of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs
  • Lysozyme
    • Sometimes described as having 2 domains (α-helical region + β-sheet region), but they cannot fold independently into stable units and are never found separately in proteins. Therefore, it is classified as a single domain with subdomains.
  • Ankyrin domain from the Notch receptor in Drosophila

    • Composed of multiple copies of small structural units (helix-turn units), but the individual units have no structure or biological function. Therefore, Ankyrin is a single domain with subdomains.
  • Rossmann fold
    • Functions to bind nucleotides (NADP/ATP), typically consists of βαβαβ structure motif with a conserved sequence element (GXGXXG) that binds to the phosphate group of the nucleotide. The domain structure is so closely linked to the nucleotide-binding function that the presence of this structure is a good indicator of function.
  • TIM barrel
    • A common domain structure found in 10% of all enzymes, is a β-barrel formed by a series of a βαβ structure motifs. The active site is always in the same position, in the cleft defined by the loops at the C-terminus, but the function can vary widely.
  • HTH domain
    • A major structural motif observed in proteins capable of binding DNA, consists of a "helix-turn-helix" (HTH) motif. The HTH motif folds independently and is thus a genuine domain, with one helix contributing to DNA recognition and the second helix stabilizing the interaction between protein and DNA.