Used to be a manual process, however now it has become automated
Entire genomes can now be read
DNA sequencing allows for the nucleotide base sequence of an organism's genetic material to be identified and recorded
Advances in technology have enabled the development of high-throughput-sequencing methods which allow scientists to rapidly sequence the genomes of organisms
The use of a method called capillary electrophoresis enables the chain termination method to be carried out in a high-throughput way
The newest high-throughput methods do not involve electrophoresis and are known as next-generation sequencing methods e.g. nanopore sequencing and pyrosequencing
In the 1970s the chain termination method of sequencing was developed by Frederick Sanger and his colleagues
The chain termination method is also known as Sanger sequencing
The chain termination method of DNA sequencing uses modified nucleotides called dideoxynucleotides
Dideoxynucleotides can pair with nucleotides on the template strand during DNA replication
They will pair with nucleotides that have a complementary base
When DNA polymerase encounters a dideoxynucleotide on the developing strand it stops replicating, hence this method of sequencing is referred to as the chain termination method
Once the dideoxynucleotide is added to the developing strand DNA polymerase stops the replication of the developing DNA strand to produce a shortened DNA chain
A) single
B) primer
C) polymerase
D) dideoxynucleotide
E) stop
Chain termination method (1):
4 test tubes prepared that contain the DNA to be sequenced (in the form of a single-stranded template), DNA polymerase, DNA primers, free nucleotides A, C, T, and G, and 1 of the 4 types of dideoxynucleotide; A*, C*, T*, or G*
Test tubes incubated at a temperature that allows DNA polymerase to function
Primer anneals to start of single stranded template, producing a short section of double stranded DNA at the start of the sequence
DNA polymerase attaches to double stranded section and begins DNA replication using free nucleotides in test tube
Chain termination method (2):
At any time, DNA polymerase can insert one of the dideoxynucleotides by chance which results in the termination of DNA replication
Because each test tube only contains 1 type of dideoxynucleotide, it is possible to know what the terminal nucleotide of each fragment is (i.e. if the test tube contains A*, then researchers will know that the final nucleotide of every chain in that test tube is A)
Because the point at which the dideoxynucleotide is inserted varies with every strand, complementary DNA chains of varying lengths are produced
Chain termination method (3):
The new, complementary, DNA chains are separated from template DNA
Resulting single-stranded DNA chains are separated according to length using gel electrophoresis
Gel will have 4 wells, one each for A*, C*, T*, and G*
A fragment with only 1 nucleotide will travel all the way to the bottom of the gel, and every band above this on the gel represents the addition of 1 more base.
This allows the base sequence to be built up one base at a time
Gel electrophoresis - methods to seperate fragments of DNA by length by applying a voltage across a gel matrix. DNA fragments are negatively charged, so move through the gel towards the positive electrode. Smaller fragments travel faster through the gel, so will travel further in a given amount of time.
High-throughput sequencing (1):
Each type of dideoxynucleotide is labelled using a specific fluorescent dye
Dideoxynucleotides with adenine base (ddNA) labelled green
Dideoxynucleotides with thymine base (ddNT) labelled red
Dideoxynucleotides with cytosine base (ddNC) labelled blue
Dideoxynucleotides with guanine base (ddNG) labelled yellow
Single-stranded DNA chains separated according to mass using capillary electrophoresis
This has a very high resolution - capable of separating chains of DNA that vary by only 1 nucleotide in length
High-throughput sequencing (2):
A laser beam is used to illuminate all of the dideoxynucleotides, and a detector then reads the colour and position of each fluorescence
The detector feeds the information into a computer where it is stored or printed out for analysis
Note that because capillary electrophoresis is essentially still Sanger sequencing, it cannot be referred to as next-generation sequencing
Increase in speed enabled by high-throughput sequencing allows scientists to sequence and analyse genomes of many organisms
Scientists can determine the function of sections of DNA by 'knocking out' genes to see how this affects an organism
Genes can be rewritten to alter their function, and then inserted into cells using genetic engineering techniques; this means that scientists can potentially design new molecules with huge potential for drug production (synthetic biology)
Genome sequence data can also provide information about evolutionary relationships
Next-generation sequencing
Any method of DNA sequencing that has replaced the Sanger method is referred to as next-generation sequencing (NGS)
Thousands to millions of DNA molecules can be sequenced at the same time
NGS methods can be one thousand times faster than older methods of sequencing
The reduction in time required for sequencing means that costs are also greatly reduced
NGS methods cost roughly 0.1% of the cost of chain-termination methods
Nanopore sequencing is currently being developed by scientists
This method of sequencing will be extremely rapid and allow for sequence data to be obtained outside the lab and used for a range of applications
Examiners may ask you which DNA strand the base sequence has been obtained for. In Sanger sequencing methods, it is the base sequence of the developing/test strand that is being identified, not the template strand that was initially provided.
Give some benefits of genome-wide comparisons.
Comparing between species allows us to determine evolutionary relationships
Comparing between individuals of the same species allows us to tailor medical treatment to the individual
How can DNA sequencing be used in synthetic biology?
Knowing the sequence of a gene allows us to predict the sequence of amino acids that will make up the polypeptide it produces. This in turn allows for development of synthetic biology.
A genome contains all of the genes within an organism
Advances in technology have allowed scientists to sequence the genes within an organism's genome
Sequencing projects have read the genomes of a wide range of organisms from flatworms to humans
Genome-wide comparisons can be made between individuals and between species
The genetic code can be used to predict the amino acid sequence within a protein
Once scientists know the amino acid sequence they can predict how the new protein will fold into its tertiary structure
This information can be used for a range of applications, such as in synthetic biology
Bioinformatics is a field of biology that involves the storage, retrieval, and analysis of data from biological studies
These studies may generate data on DNA sequences, RNA sequences, and protein sequences, as well as on the relationship between genotype and phenotype
High-power computers are required to create databases
The large databases contain information about an organism's gene sequences and amino acid/protein sequences
Once a genome is sequenced, bioinformatics allows scientists to make comparisons with the genomes of other organisms using the many databases available
This can help to find the degree of similarity between organisms which then gives an indication of how closely related the organisms are
This can be useful for scientists looking for organisms that could be used in experiments as a model organism for humans
The nematode worm Caenorhabditis elegans is an animal that has been used as a model organism for studying the genetics of organ development, neurone development and cell death. It was the first multicellular organism to have its genome fully sequenced and as it has few cells (less than 1000), and is transparent, it has been a useful model organism
Bioinformatics has contributed to the study of genetic variation,evolutionary relationships, genotype-phenotype relationships, and epidemiology
The genetic variation within a species can be investigated
Many individuals of the same species have their genomes sequenced and compared
A species that has a high level of genetic variation will exhibit a large number of differences in base sequences between individuals
The evolutionary relationships between species can be investigated by comparing the genomes of different species
Species with a small number of differences between their genomes are likely to share a more recent common ancestor than species with a large number of differences
The protein cytochrome c is involved in respiration, and so is found in a large number of species (including plants, animals, and unicellular organisms). For this reason it is especially useful for making comparisons between different species
Genome sequencing can aid the understanding of gene function and interaction
Genotype-phenotype relationships are explored by "knocking out" different genes (stopping their expression) and observing the effect it has on the phenotype of an organism
When an organism's genome sequence is known, scientists can target specific base sequences to knock out
Epidemiologists study the spread of infectious disease within populations
The genomes of pathogens can be sequenced and analysed to aid research and disease control
Highly infectious strains can be identified
E.g. the Delta variant of SARS-CoV-2 (a well-known coronavirus)
The ability of a pathogen to infect multiple species can be investigated
E.g. Ebola can infect primates as well as humans
The most appropriate control measures can be implemented based on the data provided
Potential antigens for use in vaccine production can be identified
Genome comparison in action: The Human Genome Project
A genome project works by collecting DNA samples from many individuals of a species. These DNA samples are then sequenced and compared to create a reference genome
More than one individual is used to create the reference genome as one organism may have anomalies/mutations in its DNA sequence that are atypical of the species
Applications of the Human Genome Project
The information generated from the HGP has been used to tackle human health issues with the end goal of finding cures for diseases
Scientists have noticed a correlation between changes in specific genes and the likelihood of developing certain inherited diseases
For example, several genes within the human genome have been linked to increased risk of certain cancers
There have also been specific genes linked to the development of Alzheimer's disease
Proteome: The full range of proteins produced by the genome.
Determining the proteome of humans is difficult as large amounts of non-coding DNA are present in human genomes
It can be very hard to identify these sections of DNA from the coding DNA
The presence of regulatory genes and the process of alternative splicing in human genomes also affects gene expression and the synthesis of proteins
The proteome is larger than the genome due to:
Alternative splicing
Post-translational modification of proteins (often takes place in the Golgi apparatus)
Alternative splicing allows for a single gene to produce multiple proteins
Synthetic biology is a recent area of research that aims to create new biological parts, devices, and systems, or to redesign systems that already exist in nature
It goes beyond genetic engineering,as it involves large alterations to an organism's genome. This new genome can cause a cell to operate in a novel way, not yet seen before
The assembly of the new genome can be done using existing DNA sequences or using entirely new sequences
These new sequences can be designed and written (using special computer programmes) so that they produce specific proteins