Factors to Consider When Selecting a Next Generation Sequencing (NGS) Technology
How to Choose the Right Next Generation Sequencing (NGS) Technology
Many factors need to be considered when selecting the best NGS technology to use for your study. Different technology features not only affect the cost and time it will take to complete your project, but can also affect the chances that your project will succeed.
The main considerations for selecting a NGS technology which are discussed in this post are:
- Strategies for using NGS
- Depth of sequencing coverage
- Length of sequencing reads
- Single-end versus paired-end sequencing
Strategies for Employing NGS
Traditional single gene sequencing approaches have largely been replaced by NGS. This is because NGS allows ultra-high throughput of DNA/RNA sequencing that is more rapid and less expensive, while allowing for very high coverage of sequences.
- Whole Exome Sequencing (WES): This approach is used to read all protein-coding regions of all genes known as the exome. WES allows for the detection of variants in candidate genes that might not be covered in a targeted sequencing approach. It can also detect novel mutations that have not previously been associated with the disease in question.
- Targeted Sequencing (Panels or Regions of Interest): Typically used to read a limited number of genomic regions of interest which are usually well-described genes and mutations.
- Whole Genome Sequencing (WGS): Entails the sequencing of the complete genome, including regulatory regions, introns, and even mitochondrial DNA. WGS is very powerful since it allows for the identification of complex structural variations at very high resolution and is often used to detect pathogenic mutations in novel genes or intronic regions.
- RNA Sequencing (RNA-Seq): This strategy is used to directly sequence and quantify the number of mRNA molecules in the entire transcriptome.
Depth of Sequencing Coverage
The depth of sequencing coverage provides an indication as to the average number of sequencing reads that align to, or "cover", each base in a sequenced sample. This is an important parameter to keep in mind since sequencing is an inherently error prone process. Therefore, the higher the coverage, the higher the confidence you can have in the sequenced bases. Notably, the level of recommended coverage depends on several factors, including your research question and sequencing application. For instance, for WGS and WES applications, the recommended coverage is lower than other applications such as transcriptomic or DNA target-based sequencing.
Coverage (C) is commonly calculated using the Lander/Waterman equation which takes into consideration the read length (L), number of reads (N), and haploid genome length (G): C = LN / G
Length of Sequencing Reads
Read length refers to the number of base pairs sequenced from a DNA/RNA fragment. The regions of overlap between reads are used to later assemble and align the reads to a reference genome to reconstruct the full genomic sequence.
There are both short-read sequencing (SRS) and long-read sequencing (LRS) technologies available.
With high-throughput SRS technologies, millions of short DNA strands are read in parallel. SRS is the most used high-throughput sequencing system, and the approach is supported by a wide range of bioinformatics tools. SRS methods generally provide low cost and high-accuracy data that are used for a variety of applications including variant discovery.
In contrast, high-throughput LRS technologies are capable of generating reads that are hundreds of thousands of base pairs in length (averaging ~10-100kbp). While LRS can be more expensive and take longer than SRS approaches, it allows for greater resolution of the genome since it can span complex genomic features. Furthermore, since LRS does not use PCR, the DNA/RNA remains in its native state which enables LRS to also be used to detect base modifications such as methylation.
Since LRS provides read lengths that can span repetitive regions, it can be informative to resolve these regions (including those with high GC content) and for de novo genome assembly applications. Additional applications include identifying disease causing structural variants (i.e. large genomic alterations typically classified as deletions, duplications, insertions, inversions, and translocations describing different combinations of DNA gains, losses, or rearrangements), and determining specific isoforms of RNA transcripts, among other applications that are not possible with SRS. However, not all applications can use LRS, such as those with highly fragmented DNA.
Determining whether to use an SRS or LRS method is not always a straightforward decision and some problems may actually require a combination of these approaches. Selecting the sequencing read length is highly contextual and depends on the sample type, application, and desired coverage.
Single-End versus Paired-End Sequencing
Single-end sequencing involves the sequencing of DNA fragments from one end to the other and tends to be used for specific applications, such as RNA-seq. In general, this method is rapid and very cost-effective.
Paired-end sequencing is the most used NGS approach. This method involves the sequencing of DNA fragments from both ends to provide twice the number of sequencing reads. This method provides high confidence read alignments and can improve the ability to detect the relative position of sequencing reads and to identify gene insertions, deletions, repetitive sequences, and other rearrangements.
Overview of Several Commercially Available Sequencing Platforms
There are many different commercially available NGS platforms that vary on capability and the underlying technology, with some of the major players being Illumina, BGI Group, Thermo Fisher, Pacific Biosciences, Oxford Nanopore Technologies, and Roche (among others).
The SRS market is largely dominated by Illumina (e.g. NextSeq, NovaSeq), but other important players include BGI Group (e.g. MGISEQ) and Thermo Fisher (Ion Torrent). The LRS market is currently dominated by Pacific Biosciences’ (PacBio) single-molecule real-time (SMRT) sequencing and Oxford Nanopore Technologies’ (ONT) nanopore sequencing.
Other technologies are focused on structural variants and genome assembly such as Bionano Genomics’ Saphyr System. This technology uses a non-sequencing based optical mapping technology to analyze long strands of genomic DNA.
The following table provides a snapshot of four commercially available solutions to provide a sense of their capabilities and typical applications.
|Name of Commercial Sequencer||Description||Typical Applications|
|Illumina NovaSeq 6000
|Used patterned flow cells and uses Illumina’s 2-channel sequencing by synthesis chemistry||WGS, WES, targeted sequencing, transcriptomics|
|Uses DNA nanoball nanoarrays with polymerase-based stepwise sequencing (DNBseq) for short reads||WGS, WES, targeted sequencing, transcriptomics|
|PacBio® Sequel II
|Real-Time (SMRT®) Sequencing technology produces highly accurate long reads||Whole genome de novo assemblies, full-length transcriptomes|
|Bionano® Genomics Saphyr System
|Imaging of extremely long, high-molecular-weight DNA in its native state||Structural variant analysis, genome mapping|
Next generation sequencing is a powerful tool that has revolutionized many aspects of how basic, applied, and clinical research is conducted. Selecting the right NGS method and technology depends on a variety of factors, and consideration should be given to the various options so that the optimal choice is made for your specific research program. The opportunities of using NGS in preclinical research programs are broad and can help maximize the success of your preclinical development program so that your investigational agent is well-positioned for clinical success.