What is the Human Genome Project?
By Syed Rizvi
A genome is the complete set of DNA that makes an organism what it is. There’s a lot of information that’s packed into our DNA, ranging from our skin and hair color to our susceptibility to certain diseases, which is why it’s so important for us to understand our DNA.
The Human Genome Project sought out this exact goal and brought together scientists from across the globe to help sequence the entirety of our DNA. To put into perspective how great of a task this was, the project ran from 1990 to 2003, and once completed was found to have sequenced approximately 3 billion base pairs and 20,000 genes. Being completed within 13 years was actually ahead of schedule!
Let’s suppose you wanted to count to 1 billion, and counted one number every second — based on this it would take you approximately 32 years to get to a billion. Now let’s say you decide to split the work between yourself and your friend and start counting from different points; suddenly the time is halved. The Human Genome Project took the same approach:a bunch of scientists all over the world started at different points in the genome, and then ultimately put it all together like a puzzle to get our complete DNA set.
Why do we need to know what's in our genome?
Knowing our entire genome is like having a map of the human body. Imagine if emergency services like ambulances didn’t have a map of the city. They’d get a call for an emergency and then would run around randomly trying to find the source of it. If they had an address and a map however, they could go straight to the site of emergency and treat the problem quickly. Similarly, gene mapping provides the location of all our genes.
Researchers can track the frequency of a recurring set of nucleotides in the genome to link them with certain traits. Let’s say that a family with Huntington’s disease came in, and they all got sequenced. If a sequence is shared amongst them all, and other patients with Huntington’s, it is likely that the researchers have identified a “marker” for a disease-linked gene. Through the Human Genome Project, scientists were able to find gene locations for diseases such as Huntington’s, cystic fibrosis, and breast cancer.
Treating a disease-causing gene can look like: obtaining the bad gene from a patient, introducing the fixed gene into cells, and then injecting the altered cells back into the body. One way this could be accomplished is through retroviruses, which you can read more about here.
How was it done?
There are a few different ways to sequence DNA, but the one that was most effective at the time of the Human Genome Project was the Chain Termination Method (also popularly known as Sanger Sequencing). The Chain Termination Method involved 3 main steps:
Binding and Extension:
In standard PCR, deoxynucleoside triphosphate (dNTPs) expands the growing DNA by binding with their complement on the template strand via DNA Polymerase. In the Chain Termination Method, a special class of NTPs known as dideoxynucleoside triphosphate (ddNTPs) are added to this mixture, which stops the extension of the growing DNA strand.
In standard PCR, the Polymerase catalyzes the formation of a phosphodiester bond between the 3’-OH group of the last nucleotide and 5’ phosphate of the next nucleotide. If we take a look at the name of our special nucleosides however, we notice they are a di- (meaning 2) deoxy (lacking a hydroxyl) group. Due to the lack of 3’-OH in these ddNTPs, the Polymerase can’t add anymore dNTPs to the growing chain. This results in billions of copies of DNA terminated at random lengths.
After we have all of our random copies of DNA, we need to separate and organize them via Gel Electrophoresis. The gel contains small pores which the DNA can move through — small DNA molecules can move easily through them, but large DNA fragments experience more friction with the pores and move more slowly.
DNA is negatively charged so when an electric current is run through the gel, the DNA will move towards the positive electrode. Because of this, the DNA will be arranged from largest to smallest from top to bottom.
Since DNA Polymerase extends from the 5’ to the 3’ end, each terminal dNTP will correspond to a part of the original sequence. For example, the shortest fragment must terminate at the first nucleotide from the 5’ end, and the longest fragment should terminate at the last nucleotide at the 3’ end.
Therefore, by reading bands from smallest to largest we can get the original sequence of the DNA.