How Whole Exome Sequencing Works: Principles and Workflow Demystified
SummaryAs the development of biological experimental technology, especially gene-sequencing technology, both laboratory and clinical researchers realize that genome sequencing is the best way to analyze the etiology, pathophysiology, treatment and prognosis of diseases. Researches further demonstrate that there are only 30 million base pairs of genes that contain essential information of proteins for human beings.
- Author Name: Dianna Gellar
As the development of biological experimental technology, especially gene-sequencing technology, both laboratory and clinical researchers realize that genome sequencing is the best way to analyze the etiology, pathophysiology, treatment and prognosis of diseases. Researches further demonstrate that there are only 30 million base pairs of genes that contain essential information of proteins for human beings.
The exome is ususally defined as the sequence encompassing all exons of protein coding genes, as well as nonprotein coding elements such as microRNA or lncRNA. The investigation of exome helps to figure out which loci are responsible for proper diseases. When researchers plan to explore exons information of human genome, the cost to whole genome sequencing will be quite surprising considering the total length of human genome is over 3 billion base pairs in size. To study rare mendelian diseases, exome sequencing is a more effective way to identify the genetic variants. The breakthrough of target-enrichment strategies and DNA sequencing techniques contributes to the development of whole exome sequencing.
Principles of exome sequencing
Exome sequencing contains two main processes, namely target enrichment and sequencing. Target-enrichment is used to select and capture exomes from DNA samples. There are two major methods to achieve the enrichment of exomes.
Array-based exome enrichment uses probes bound to high-density microarrays to capture exosomes. A microarray is a 2-dimensional array on a glass slide or silicon thin-film that contains oligonucleotides complementary to target genome parts. While the fragmented DNA samples flow through the microarray, the complementary pairing effect will force exome binding at the microarray, while the other parts of the genome remain dissociative, which results in the separation of exome from other parts of the genome.
In-solute capturing is based on a magnetic bead. A magnetic bead is a kind of magnetic nanoparticle that contains functional chemical components to combine target substances. In this case, magnetic beads that could bind exomes are used. Then the story is just the same with the array-based method: exomes are attracted and bound to the magnetic beads, while other parts of the genome remain dissociative. The advantage of the in-solute capturing method is that the use of magnetic beads allows the reaction to be more effective by shaking or heating the system.
Both of the methods are effective ways to extract the exome from the genome. So we say the sensitivity levels of both are high enough. However, the problem is specificity. There are parts of the genome that share the same sequence as some exons. Those parts of the genome may bind to microarrays or magnetic beads, resulting in a false positive.
Sequencing is the process of figuring out the arrangement of all the deoxyribonucleotides in the exome, which may help us understand the potential pathophysiology alternations in some diseases. Because of the decrease in cost, the importance of whole exome sequencing is prominent. The cost of the human genome is approximately equal to two or three times the cost of whole exome sequencing. So why not run more samples using whole exome sequencing to obtain a more statistically significant result?
General workflow of exome sequencing
- Prepare your DNA samples: DNA fragmentation
Almost all the experiments on DNA begin with DNA fragmentation. DNA should be sheared into proper pieces because, usually, the length of a DNA sample extracted from tissues or cells is too long. This shearing process is called DNA fragmentation. The effective target length is determined by the sequencing instrument that you choose. In order to process whole exome sequencing, there are several major ways to fragment DNA samples.
Physical fragmentation. Physical fragmentation includes acoustic shearing, sonication, and hydrodynamic shearing. Among them, acoustic shearing and sonication are the main methods for DNA fragmentation. DNA samples are broken into several pieces due to the acoustic cavitation and hydrodynamic shearing when they are exposed to ultrasound.
Enzymatic Methods. Enzymes used to break DNA into small pieces include nuclease and transposase. Nuclease will cleave the phosphodiester bonds between nucleic acids, resulting in the breakdown of DNA. Specifically, restriction endonucleases will cleave DNA at restriction sites. Transposase is used to mediate transposition events, processes by which a certain DNA segment could "move around" the chromosome. It also plays a role in DNA fragmentation if we prepare appropriate DNA samples with transposase. The fragmented DNA is linked with adapters instead of being inserted again, resulting in fragmentation.
After fragmentation, your DNA samples are ready for the target enrichment process.
- Isolation of exomes: Target enrichment methods
Exome has to be isolated from the human genome before sequencing, as the former contributes to only 1% of the latter. The process of capturing the target genomic regions is called target enrichment. The basic idea of target enrichment is to separate anything of interest from other substances using the physicochemical property differences between them. There are some common kits for target enrichment methods. No matter what kit you choose, the variability in capture influences your exome sequencing, so be aware of the quality, quantity, and fragment sizes of your DNA samples.
Harvest your products: washing and elution
After the separation of the exome from other parts of the genome, several washes are required. The process of washing is just like what this word means literally -- to wash out anything we do not want so as to keep the thing of interest. In this case, we do not want substances such as the other parts of the genome, proteins, and electrolytes. Distilled water is usually used to elute the target, but some special reagent kits may require a specific eluent. Eluent is the reagent used to wash down the exome from the microarray or magnetic beads, which is able to break the connection between the exome and binding substances. Both the washing and elution processes could be processed multiple times in order to obtain purer exomes. Also, in some cases, one more target enrichment process is performed to make the elution better. Just follow the instructions on the reagent kit you used and adjust your protocol according to your actual situation.
- Sequencing technology
Because of the time cost and length requirement of Sanger sequencing, the technology did not contribute much to biological and clinical studies until next-generation sequencing (NGS) technologies were invented. NGS technologies are based on the use of dyed ddNTPs in the Sanger method. The improvement is that NGS allows DNA strands to be combined, amplified, and detected at the same time, leading to a breakneck increase in length requirement and efficiency of sequencing. To simplify, the principle of NGS is to bind the exome samples in a proper base (such as a flow cell from Illumina Hiseq or magnetic beads from Roche 454) and replicate them by PCR-in-situ in order to make the signal in every round of elongation amplified. Then ddNTPs are detected after every round of elongation. Finally, the complete sequence is integrated using a biological information algorithm. NGS largely improves the efficiency and allows for higher-throughput detection; that is why NGS is also called high-throughput sequencing and is widely used.
Besides NGS, the third generation of sequencing is rapidly developing and largely exceeds the efficiency of NGS. The key feature of third-generation sequencing is single-molecule sequencing. It reduces the time and cost of whole genome sequencing to several minutes. Companies such as PacificBio and Oxford Nanopore have proven their method works, and the third generation of sequencing technology could lead to a revolution in the exome sequencing area.
The data of sequencing are confusing and unreadable before bioinformatics analysis and interpretation because most of the sequencing methods produce short fragments of sequence, which require sequence assembly to figure out the final result. The following pipeline can be used by researchers who are interested in performing WES analysis for variant calling and genetic diseases.
We have benefited a lot from exome sequencing in both academic research and clinic diagnosis. Thanks to exome sequencing, our understanding of the genome has reached a new level. Many diseases that used to be mysteries, such as neurological disorders in infants, could now be predicted. Furthermore, many diseases with few treatments, such as carcinoma, are allowed to be treated by targeted therapy. It is said that the fourth generation of sequencing technology is developing. I hope it will drive another revolution in biological and medical research.