Use of Next-Generation Sequencing (NGS) in Understanding Disease
Decoding the information in an individual’s genome has led to a greater understanding of the interpersonal variability in disease progression and treatment response. This expanded knowledge has led to the proliferation of personalized medicine strategies to prevent, diagnose, and treat diseases based on each individual’s unique genetic (and molecular) profile. Therefore, being able to sequence an individual’s genomes (wholly or in parts) is the first step in personalized medicine.
The first commercialized method of DNA sequencing was the Sanger method. Sanger sequencing involves three mains steps:
- A chain-termination PCR reaction
- An electrophoresis step to separate the amplified fragments
- Analysis to determine the sequences
Sequencing the human genome using Sanger sequencing took 13 years and cost ~2.7 billion US dollars.
To speed up the sequencing process, new technologies were developed allowing for parallel sequencing of multiple DNA fragments utilizing chemical reactions and various optical detection techniques. These technologies are called next-generation sequencing (NGS) or second-generation sequencing, and resulted in a dramatic shortening of the time to sequence whole genomes with a concomitant drop in cost. The cost of sequencing a whole genome today has fallen to below $1000 and takes a matter of hours.
NGS is an umbrella term for various modern sequencing technologies that allow for high throughput DNA and RNA sequencing, which is faster and cheaper than Sanger sequencing. The advent of NGS has allowed for greater advancements in the field of molecular biology, genomics, and even oncology and other disease areas.
There are several NGS platforms available commercially. The current market leader is Illumina. Illumina has multiple platforms available, dependent on need, including MiSeq which can produce same-day sequencing results for very small panels. Thermo Fisher Scientific’s platform is the Ion Torrent or Ion Proton which utilizes detection of pH differences for sequencing. Pacific Biosciences’ platform offers single molecule, real-time technology utilizing a chip with single DNA molecules attached. Zero-mode waveguide technology enables isolation of a single nucleotide for the DNA polymerase to add fluorescent labels for detection of each base. This technology offers average read lengths of more than 10,000 base pairs.
Whole Genome Sequencing (WGS)
WGS, as the name implies, refers to sequencing the entire genome. This includes chromosomal as well as mitochondrial DNA (or chloroplasts in the case of plants). Microorganisms can also have their genomes sequenced using WGS methodologies. The advantages of WGS are:
- Providing the highest resolution of all the NGS protocols
- Detection of both large and small variants that might be missed with targeted sequencing approaches
- Identification of variants not in coding regions that may be implicated in disease
- Sequencing and assembling novel genomes
Of the current NGS technologies, WGS is still the costliest, albeit still relatively affordable. It also takes the longest time to read since the entire genome has to be covered.
RNA sequencing (RNAseq)
RNAseq assays the quantity and sequences of a transcriptome (total transcribed RNA including mRNA, rRNA and tRNA) within a sample. Understanding the transcriptome gives us information on gene expression levels and the knowledge of what and when genes are turned on/off in a sample. RNAseq also captures information about alternative splicing events which would not be identified by DNA sequencing methods. This technique can also identify certain post-transcriptional modifications such as polyadenylation and 5’ capping. Applications that utilize RNAseq include SNP identification and analyses, transcriptional profiling, and RNA editing and differential gene expression analysis.
The typical RNAseq protocol involves four steps:
- RNA extraction and purification
- cDNA library construction. Converting the extracted RNA into a cDNA library allows for the addition of the adapters required for NGS techniques for sequencing.
- Sequencing of the cDNA library. There are various options and platforms that can be used for sequencing. Single-read sequencing sequences the cDNA from just one end, whilst paired-end sequencing sequence from both ends. This makes single-read sequencing the cheaper and faster option of the two. In addition, there are strand-specific and non-strand-specific protocols. Strand-specific methods determines which DNA strand the RNA was transcribed from.
- Data analysis. At the end of sequencing, there will be millions of reads. Software is then used to align these reads with a reference genome and an RNA sequence map is produced. There are various software and methods used to analyze and output the results.
Whole Exome Sequencing (WES)
Exome sequencing or whole exome sequencing (WES) involves only sequencing the protein-coding regions of the genome known as the exome. The human exome is only about 3% of the entire human genome, however, it is thought that around 85% of disease-related mutations occur in the exome.
The advantages of WES compared to WGS, are
- WES focuses on the coding regions of the genome instead of the entire genome
- WES provides a cost-effective alternative to WGS, provided you are researching changes in coding sequences (SNVs, Indels, etc.)
- The data size from WES is smaller (4-5 Gb) vs WGS (approximately 90 Gb) and therefore more manageable with faster analysis
- Faster sequencing times, entire exomes can be sequenced to the required coverages in a matter of minutes
Third Generation Sequencing (TGS)
Third generation sequencing platforms seek to move away from the amplification process used in current next generation sequencing platforms. TGS platforms have the promise of being able to perform single-cell sequencing by utilizing the physical properties of DNA. Oxford Nanopore is one of the industry leading companies that commercialized the technology. The MiniION by Oxford Nanopore Technologies, allows sequencing on a desktop computer via a USB device. The DNA is passed through a protein nanopore membrane and sequencing detection is determined by the creation of an ionic current that changes based on the nucleotide.