Cell lines have been the cornerstone of life science research for decades. Yet they’ve been misidentified and misused almost as long as they’ve been in existence. Some staggering figures exist: the German DSMZ cell repository at one time reported that 18% of its human cell line stocks were cross-contaminated with other cell lines.
In 2015, the China Center for Type Culture Collection at Wuhan University reported that 25% of their repository was cross-contaminated. Their analysis demonstrated that 85% of cell lines in their repository, supposedly established from primary isolates, were actually HeLa cells, one of the oldest and most commonly used cell lines originally created in the US in 1951.
The American Type Cell Culture (ATCC) currently has over 30 cell lines highlighted as problematic after testing. All told, the International Cell Line Authentication Committee (ICLAC), a voluntary group formed to catalog misidentified cell lines for the research community, has documented an astounding 451 cell lines that are misidentified with no known authentic stock.
The Huge Impact of Misidentification on Published Research
Yet scientists routinely use these lines in their research and publications. A recent paper in PLOS One measured the impact of this problem by looking specifically at manuscripts where researchers used ICLAC misidentified cell lines. Labs are using contaminated cell lines throughout the worldwide scientific community, with researchers from the US and Japan topping the list.
Overall, 32,755 papers were found referring to data from cell lines the ICLAC had documented as misidentified. And even worse, the problem grew exponentially with more than half a million subsequent studies citing that original pool of papers. Incredibly, the authors think that’s a conservative figure, likely to grow much larger as new cases of misidentification emerge.
Where do all these Misidentified Cell Lines Come From?
In some cases, the cell type or tissue origin of cells grown are incorrect from the outset. Researchers extensively used the MDA-MB-435 cell line as a breast cancer cell line, only to learn later that it originally derived from a melanoma. In other cases, researchers accidentally cross-contaminate when creating a “new” cell line with an already established one. Those HeLa cells that make up 85% of China’s repository? They’re common culprits, since they can survive temperature changes and quickly overtake other cultures with their rapid growth kinetics.
How to Correctly Identify Cell Lines
Researchers use several methods to identify cell lines properly. Early efforts focused on chromosomal analysis, or karyotyping, which evaluates the size, number, and shape of chromosomes. In the case of cancer cell lines, aberrant chromosomes can be pretty unique to a cell line. The notorious HeLa cells have 76-80 chromosomes compared to the 46 found in a normal cell. However, the method is rather crude as it can’t differentiate between the cell lines that don’t have a distinct chromosomal abnormality. It’s also time consuming, making high throughput analysis challenging.
Multilocus DNA fingerprint analysis uses electrophoresis of restriction digests of genomic DNA followed by analysis with specific probes. It can easily differentiate between cell lines as it generates a unique pattern for each cell line; however multilocus DNA fingerprinting requires a high degree of expertise and is difficult to do in a high throughput mode.
Single nucleotide polymorphism (SNP) analysis detects variations in the DNA sequence that occur when a single nucleotide is altered. Each person (and cell line) has many single SNPs that together create a unique DNA pattern for that person. This method can efficiently detect differences among cell lines, but it cannot detect interspecies contaminants. This may, however be the best method to compare cell lines over passages as SNP analysis can measure genetic drift.
Isoenzyme analysis looks at a specific set of mammalian cell enzymes by electrophoresis and/or isoelectric focusing (IEF). This method is much faster and easier compared to chromosomal analysis and is particularly adept at identifying interspecies contamination. However, it’s not very sensitive as some other techniques for detecting contaminants, requiring as much as 25% contamination before detection. It’s also a cumbersome method to interpret. It’s rarely used alone but rather with at least one other authentication method.
The first 648 base pairs of the DNA fragment of the mitochondrial gene cytochrome c oxidase subunit I (COI) has remarkably low intraspecies variation but is highly unique among different species. As a PCR based technique, it’s much more sensitive than isoenzyme analysis for detecting cell line contaminants for other species, although not a great differentiator for cell lines created from the same species.
Method | Description | Sensitivity | Detects Interspecies Contamination | Detcts Intraspecies Contamination (Misidentification) |
---|---|---|---|---|
Chromosomal analysis (karyotyping) | Examines chromosomal profile | Low | Yes | In some cases |
Multilocus DNA fingerprint analysis | Analyzes restriction digests of genomic DNA | Medium | Yes | Yes, but with low sensitivity |
Isoenzyme analysis | Looks at a specific set of enzymes | Low | Yes, but with low sensitivity | Yes, but with low sensitivity |
Cytochrome c oxidase (COI) subunit analysis | Sequences a mitochondrial gene | High | Yes, with high sensitivity | No |
Single nucleotide polymorphism (SNP analysis) | Sequences single nucleotide variations in the genome | NA | NA | NA |
Short tandem repeat (STR) profiling | Amplifies microsatellite regions in the genome | High | No, unless extra primers are included | Yes |
Short Tandem Repeat Profiling is the Gold Standard for Cell Line Authentication
But the gold standard for human cell authentication is Short Tandem Repeat (STR) profiling. STR uses PCR to amplify the variable microsatellite regions from the cell’s genomic DNA. Distinct stretches of repeats, known as STR loci, are amplified simultaneously to generate a unique fingerprint for each cell line. This fingerprint can then be analyzed and compared against databases that contain previously verified STR sets at cell repository sites.
Sensitive, reproducible, and relatively easy to perform, this DNA analysis method has been widely used since the 1990s for DNA analysis, not only in research but also for practical applications like the FBI Laboratory’s Combined DNA Index System. Both academia and industry turn to STR for human cell authentication for reproducible, rapid, and economical results.
Short Tandem Repeat Profiling Should be Routinely used to Provide Cell Collection Integrity
To ensure the integrity of data and results, STR profiling should be performed across the industry, especially by CROs who undertake cell line assay work for other pharma clients. STR profiling should be performed whenever a new cell line is acquired, as well as each time a company or institutions cell bank is expanded.
The STR profile can be compared not only against previously logged internal results but also against all the major cell repositories, including ATCC and DSMZ. This quality control check will provide the quality check needed for any cell line collection, and eliminate use of commercially acquired cell lines which are cross-contaminated before any questionable data can be generated.
As a PCR test, STR profiling is species specific, and contaminants from other species simply fail to amplify. Therefore, for any cell lines are passaged in vivo, the COI test should also be performed as this efficiently detects interspecies DNA presence. Combined, these tests are aimed to ensure delivery of the right data from the right cell line.
Published Data Needs to be Re-Examined Due to Cell Line Errors
Although not every conclusion is automatically invalid due to the problem of misidentification, the data certainly need to be scrutinized and re-evaluated for each paper knowing the cell lines’ true identity. Misidentified cell lines causes all sorts of problems, from manuscript corrections and retractions to wasted grant money.
A white paper by Dr. Christopher Korch of the University of Colorado tried to estimate the damage done by the misidentification of just two lines, HEp-2 and INT 407. Both cell lines were reclassified as HeLa cells in 1967. Dr. Korch concluded that $713 million was spent on experiments published in the 7,125 manuscripts using these two cell lines alone, and that a subsequent $3.5 billion was spent on work subsequent to those papers.
Cell Line Misidentification Hinders Drug Discovery
The potential for catastrophe is just as serious in industry as in academia. Untold numbers of industry researchers have incorrectly prioritized programs using erroneous data from misidentified cell lines, which could spell disaster for any biotech company focused on filling its pipeline. Genentech, fearing such consequences and hoping to minimize potential damage, created their own cell line database, including both STR profiling as well as single nucleotide polymorphism (SNP) data to confirm the identity of more than 3500 cell lines.
Moving Forward: More Cell Line Authentication Required for Publications and Grants
This widespread cell line contamination deserves serious attention. Researchers wanting to avoid working with known misidentified cell lines can consult the ICLAC website. They should also frequently check cell lines, particularly in labs that use multiple cell lines simultaneously. Several CROs offer STR testing to eliminate this potentially catastrophic risk.
A great and growing number of reputable journals, hoping to rectify the problem, now require authentication prior to publication. Some private foundations require cell line authentication for grant recipients, and federal grant agencies like the NIH will likely make similar demands before long. But informed investigators, armed with new resources and a true awareness of this pervasive problem, can now stand ready to solve it.
Additional resources on this topic include:
The ICLAC website
Horbach SPJM, Halffman W (2017) The ghosts of HeLa: How cell line misidentification contaminates the scientific literature. PLOS ONE 12(10): e0186281. https://doi.org/10.1371/journal.pone.0186281