Next-Generation Sequencing in Cell Line and Biosample Authentication
by Yan Han, August 16, 2021 at 05:25 PM
While the authentication of biosamples (i.e., cell lines, organoids, xenograft and homograft models) has long been recommended, misidentification and contamination remains a problem. In this post, we explore two of the traditional (low throughput) genomic-based assays used for assessing biosample authentication and contamination: short tandem repeat (STR) and single nucleotide polymorphism (SNP) analysis. In addition, we highlight the value of using deep next-generation sequencing (NGS) as a high throughput and multifunctional approach for biosample authentication and contamination detection which has been shown to have significantly higher sensitivity as compared to the two traditional methods.
Why Does Biosample Authentication Matter?
While cell lines have a long played an important role in life science research, studies show they are often misidentification or contaminated. For example, cell lines can be misidentified through cross-contamination and mis-labeling. The International Cell Line Authentication Committee (ICLAC) has documented a growing list of more than 530 misidentified cell lines (updated in June 2021) having no known authentic stock.
The use of misidentified cell lines is a big and expanding problem, with one study finding more than 32,500 papers referring to data from cells the ICLAC had classified as misidentified.
Contamination is also a growing problem that can arise from a variety of sources including cross-contamination with other cell lines and microorganisms (e.g., mycoplasma and/or viruses). Overall, it is estimated that:
- Approximately 15-20% of cell lines have been cross-contaminated or misidentified with another cell line
- 5-30% of cell lines may be contaminated with mycoplasma
- more than 25% may be contaminated with at least one viral species
Irrespective of the source of the problem, using misidentified and/or contaminated cell lines can lead to unreliable research results that can lead to erroneous conclusions, wasted time, effort, and money, and damaged reputations. This is why funding agencies, and a growing list of publishers, are requiring researchers to provide authentications of their cell lines and other biosamples.
Given the importance of using authentic and clean biosamples, researchers should consider authenticating their samples:
- When a new biosample is acquired or developed
- As a standard quality-control measure
- When in doubt!
Conventional Methods and their Limitations for Biosample Authentication
Short Tandem Repeat (STR) Profiling
STR assays use primers designed to recognize repeated DNA segments of 2-6 base pairs in length. Since the targeted segments vary within a given population, a DNA ‘fingerprint’ can be generated. This technology has been extremely valuable for authenticating cell lines and tracking the identity of human tumors samples that are either derived from patient xenografts or cell lines.
The American Tissue Culture Collection (ATCC) has published detailed guidelines that describe how to standardize STR analysis for the purpose of human cell line authentication. The sensitivity of STR assays is reported to be 5-10%, though this is dependent on a variety of factors including the nature of the cell lines being tested and the quality of the data.
STR is a commonly used technique and is considered the current “gold standard” for biosample authentication, but there are some well-known limitations with its use:
- Low throughput: Makes it cumbersome to for authenticating large batches of samples
- Monofunctional: Requires other assays to assess for other parameters (e.g., checking a biosample for cell line and myoplasma contamination requires different assays)
- Labor intensive: Makes it costly to for authenticating large batches of samples
- Variable sensitivity for detecting contamination: Inconsistent ability to detect contamination
Further, biosamples with mutations in mismatch repair (MMR) genes are known to exhibit microsatellite instability and a hypermutator phenotype. Consequently, this can lead to genetic drift and/or outgrowth of contaminating cells and STR misclassification. The accuracy of the STR assay can also be low in cases of close genetic relationships such as the authentication of different tumor cell lineages from the same human donor or murine cell lines derived from specific strains of inbred animals that lack unique genetic markers.
Multiplex PCR/qPCR-Based SNP Profiling
In contrast to STR analysis, which relies on nucleotide repeats, SNP analysis evaluates variation at the level of a single DNA nucleotide. A SNP refers to a single base pair mutation or single nucleotide polymorphism at a specific locus, usually consisting of two alleles. Although these are naturally occurring variants, some SNPs have been found to be involved in the etiology of many human diseases and are becoming of particular interest in pharmacogenomics.
In recent years, SNP genotyping has been increasingly used for cell line and biosample authentication owing to its improved accuracy, sensitivity, and reduced cost. It can be used to overcome some of the limitations associated with STR analysis, for example, it can be used to authenticate species-specific tumor models, and even MMR deficient human cancer cell lines.
However, like STR, conventional SNP assays, such as multiplex PCR/qPCR approaches, are relatively low throughput and cumbersome for the authentication of a large numbers of samples. In addition, the number of SNPs that can be surveyed (coverage) becomes more challenging as the level of multiplexing increases. Although these SNP assays are reported to have a general sensitivity of ∼3–5% (similarly to STR), studies suggest this is sample dependent and often lower than claimed.
While STR and conventional SNP assays continue to be useful for a variety of applications related to biosample authentication, advances in sequencing technologies are now allowing researchers to conduct high throughput analyses with very high sensitivity. This approach is especially valuable in the field of oncology which has been at the center of the reproducibility crisis and has also witnessed a rapid growth in the use of large biobanks of biosamples (such as organoids and patient-derived xenografts [PDXs]), which can pose unique pitfalls when using traditional STR- and SNP-based authentication methods.
NGS-Based SNP Profiling
NGS technologies have revolutionized the field of genomics and they overcome many of the common problems associated with conventional biosample authentication methods.
Barcode deep NGS involves the use of multiplex PCR to amplify targeted DNA regions (‘barcodes’) for sequencing at high-depth. When combined with robust statistical analyses, unique SNP fingerprints can be identified which can distinguish samples from one another. Not only can NGS detect a large numbers of SNPs but recent advances have brought about enhanced cost-efficiency and technical accuracy.
As summarized in the schematic below, there are three levels of biosample authentication using CrownBio’s NGS-based method:
- Level 1: Matches a sample to a reference (e.g., cancer cell lines). Our NGS-based method provides high-depth (3000×) sequencing of 200 SNP sites for human samples and has shown 100% accuracy in identifying a sample or the major component of contaminated samples, which is a significant improvement over the conventional STR/SNP assays.
- Level 2: Detects contamination in biosamples. Our method consistently reaches 2% sensitivity when using a heterogeneity ratio, and sensitivity can reach ≤1% if the contaminant is in a library of reference samples with an SNP fingerprint.
- Level 3: Identifies the contaminant in a contaminated sample. Cross-contamination of cell lines is common in biobanks and the composition of a contaminated culture changes over time due to different growth rates of cell lines. Our SNP fingerprint library consists of over 1000 cancer cell lines, enabling a contaminating cell line to be confidently identified. Furthermore, our method also provides an accurate estimate of the contamination ratio.
Achieve Three-Level Model Authentication
A recent study has confirmed the advantages of NGS-based authentication compared to traditional STR- and SNP-based methods.
For level 1 authentication, the study showed that sequencing 200 SNP sites at high-depth (3000x coverage) had 100% accuracy in sample or major component identification. And since hundreds of samples can be profiled in a single run, this represents a major improvement over conventional STR/SNP assays, which are low throughput.
For level 2 authentication, sensitivity consistently reached 2% and in some cases below or equal to 1%, which is near the theoretical detection limit for this type of assay and is less variable as compared to traditional STR/SNP methods.
Level 3 authentication involves identification of contaminants and is a feature not currently possible with other methods. This has been made possible by NGS-based SNP profiling and advanced statistical methods. Further, NGS sequencing can also detect human-mouse interspecies contamination. Since PDXs involve human tumor cells implanted into mice, the result is that tumors contain mouse stromal cells - making STR analysis problematic. With NGS, divergent DNA segments with identical flanking sequences can be used instead of SNPs and allows for the accurate detection of interspecies contaminants.
Another advantage of NGS lies in mycoplasma contamination, which cannot be carried out with STR or conventional SNP assays. Using universal primers or species-specific primers, an additional targeted sequencing of all or individual mycoplasma species can be done with relative ease.
The following table compares the two traditional authentication technologies with NGS-based profiling.
|Assay Comparison||NGS-based SNP Profiling||STR Profiling||SNP Profiling|
|Technology||Barcode deep NGS||Multiplex PCR & capillary electrophoresis||Multiplex PCR/qPCR|
|Readout Type||Digital (clean, near-zero quantification error)||Analog (noisy, high quantification error)||Analog (noisy, high quantification error)|
|Human Sample Authentication||Yes||Yes||Yes|
|Mouse Sample Authentication||Yes||Limited||No|
|MMR Deficient Cell lines identification||Yes||No||Yes|
|Contamination-Detecting Sensitivity||High (1%)||Low to medium (5-20%)||Low to medium (3-20%)|
|Accuracy||High||Low to medium||Low to medium|
|Qualification of Contamination Ratio||Yes||No||No|
|Suitable for Large Biobanks||Yes||No||No|
|Interspecies Contamination Detection||Yes||Limited||Limited|
|Intraspecies Contamination Detection||Yes||Limited||Limited|
|Detecting Contamination w/o Reference||Yes||No||No|
|Estimating Mix Ratios for 3+ Cell Lines||Yes (1% sensitivity)||No||No|
Overall, NGS-based SNP profiling is currently the only commercial assay achieving level 3 authentication and it consistently outperforms conventional STR/SNP assays in both level 1 and 2 authentications. NGS profiling is more comprehensive and high throughput as compared to traditional STR and SNP genotyping.
The authentication of mouse and human biosamples—including cell lines, organoids, xenograft and homograft models—is becoming of higher importance as funding agencies, journals, and the scientific community are requiring researchers to demonstrate the authenticity of their biosamples.
While the STR- and SNP-based traditional methods of authentication have been useful and will continue to serve a role in biosample authentication, it is clear that novel technologies that have higher sensitivity and higher throughput are needed to overcome some of the limitations, and to meet the needs of biobanks and advanced research models.
NGS-based SNP profiling is a high throughput and relatively low-cost method for cell line and model authentication. It is ideally suited for building and maintaining biobanks as well as advanced analysis of interspecies contamination and microbiome contaminants.
CrownBio’s unique NGS-based SNP genotyping panel encompasses over 600 SNPs and chromosome segments to accurately characterize mouse and human samples including cell lines, organoids, xenograft models, and patient tissues. We have also generated unique DNA fingerprints for the most commonly used oncology research models, covering over 1,200 human cancer and 30 syngeneic cell lines.
If you are interested in learning more about how to use NGS-based profiling for your biosample authentication needs, watch our recent webinar Advancing Authentication: Next Generation Technology for Cell Line and Biosample Verification, or contact us today