The ABCs of DNA Sequencing: Reading Your Genetic Code
Imagine you had to copy a book manually by writing down every word by hand. That probably would take some time. Now imagine trying to do that with an entire set of encyclopedias; copying those would be a monumental task. That is essentially what whole genome sequencing has to accomplish, read all 3.2 billion characters of your DNA sequence and do it quickly and accurately. We hear much about how your personal DNA sequencing is the future of medicine, so just how does whole genome sequencing take a tiny sample of your DNA and turn it into over 6 terabytes of computer data?
Back to copying that encyclopedia, if you had a friend split it with you, that would double the speed of the process. If you got a bunch of people, so many that everyone only had to read about 20 letters worth, that wouldn’t take nearly as long. But what if one of them made a mistake? And how would we know what order to put those shorts strings of letters together? It would be better to have multiple readers over each section so we could catch any errors, and have them overlap a little to see how to piece them all together. So if we recruited even more people, got 50 copies of the encyclopedia and had them all compare notes at the end we’d have our volumes copied quickly and with few mistakes.
Basically, that is how whole genome sequencing works. After a sample is taken, usually a blood draw, your DNA is extracted and the long strands are sectioned into smaller, more manageable pieces. Those pieces are then amplified many times each and read (all at once!) using a series of lasers in an instrument called a sequencer. The sequencer then pieces together the short reads using some serious computing power and special software.
So, now you have your sequence, but what does it tell you? Your sequence would then be compared to a database of other people’s sequences, and compared to databases containing known sequence variations that may affect your health. The tricky part is, the average person will contain 10 million single changes compared to an average reference sequence, many of which aren’t associated with any risk and are merely a sign of our individuality, and more of which are completely unknown.
This is where the real hard work begins for the researchers, trying to find out if a single spelling mistake in the entire encyclopedia means anything, or is just a mistake. Luckily, the researchers here at Nationwide Children’s are developing new methods and building new software to better compare your sequence to all available information and even model what an individual sequence change could do in a cell. This technology greatly helps all of us better understand disease processes and may ultimately lead to better screening and treatments right here at Nationwide Children’s Hospital.