A genome is the entire set of DNA, including all the genes, in an organism
In genome projects, scientists work to determine the complete genome sequence of an organism. Their success depends on the complexity of the organism and the technology that is available
Sequencing genomes: Improvements in technology have allowed us to sequence the genomes of a variety of organisms
How do gene sequencing methods work?
Only on fragments of DNA, so if you want to sequence the entire genome of an organism, need to break it into smaller pieces first. The smaller pieces are sequenced and then put back in order to give the sequence of the whole genome
An example of genome sequencing?
The Human Genome Project - completed 2003
mapped entire sequence of human genome for first time
1990 - scientists from round the world joined together to attempt to sequence all 3 billion base pairs in human genome
aim to improve understanding of genetic factors in human disease - to develop new ways to diagnose and treat illness
now we have complete sequence - genes causing inherited diseases can be found in days rather than years it took before
What is a proteome?
All the proteins made by an organism. Some parts of the genome code for specific proteins, some don't code for anything at all
Simple organisms such as bacteria do not have non-coding DNA - relatively easy to determine their proteome from DNA sequence of their genome. Useful in medical research and development e.g. identifying protein antigens on the surface of disease-causing bacteria and viruses can help in development of vaccines to prevent disease
Being able to determine proteomes of disease causing bacteria and viruses also allows pathogens to be monitored during outbreaks of disease - lead to better management of the spread of infection and help to identify antibiotic resistance factors (e.g. mechanisms of antibiotic resistance)
More complex organisms contain large sections of non-coding DNA and contain complex regulatory genes, which determine when the genes that code for particular proteins should be turned on and off. Makes it more difficult to translate their genome into their proteome - hard to find the bits that code for proteins among the non-coding and regulatory DNA.
Work is being done on the human proteome - codes for more than 30,000 human proteins have been found so far
New sequencing techniques have been developed:
Sanger developed a technique in which a sample of DNA was tagged with radioactive bases, separated into 4 lanes on a gel and allowed to migrate
result photographed by X-ray
each lane represented one of the 4 bases - sequence of DNA could be worked out by combining results in each lane
time consuming - only one sample could run at a time
X-ray photographs had to be taken manually
Techniques now are often automated, more cost-effective and can be done on a large scale e.g. Pyrosequencing - can sequence around 400 million base pairs in a 10 hour period. Scientists can now sequence whole genomes much more quickly