Rapid and Automated Analysis of Short Tandem Repeat Loci Using Time-of-Flight Mass Spectrometry
John M. Butler, Jia Li, Joseph Monforte, Christopher Becker, and Steven Lee*
GeneTrace Systems Inc., 333 Ravenswood Avenue, Menlo Park, CA 94025
*California Department of Justice, DNA Laboratory, 626 Bancroft Way, Berkeley, CA 94710
× Ø × Ø × Ø × Ø × Ø × Ø × Ø × Ø × Ø × Ø × Ø × Ø × Ø × Ø × Ø
ABSTRACT
DNA separations which traditionally have been performed by slab gel or capillary electrophoresis may now be conducted via time-of-flight mass spectrometry (TOF-MS). The advantages of using a mass spectrometric approach for short tandem repeat (STR) characterization include a dramatic increase in both the speed of analysis (~5 seconds per sample) and the accuracy of mass measurements (compared to relative mobilities used in electrophoresis). These features lend themselves to high-throughput sample processing with excellent accuracy in allele designations. STR microvariants are readily resolved (e.g., single base deletions such as the TH01 alleles 9.3 and 10). In addition, due to differences in molecular mass that exists among the different bases, microvariants of STR systems that vary by as little as a single base substitution can be detected.
INTRODUCTION
At least two requirements are necessary to reliably type short tandem repeat (STR) DNA markers. First, the DNA separation instrumentation used must be able to resolve and detect all possible alleles for a given STR system. For tetranucleotide repeats, alleles that differ by 4 bp, or less in the case of microvariants such as TH01 9.3 and 10 alleles, need to be distinguishable. Second, the PCR-amplified DNA fragment must be sized and the number of repeats calculated so that a genotype may be assigned to the sample. This last requirement is most commonly met with the use of allelic ladders which contain a mixture of common alleles for a particular STR locus.
Slab gel electrophoresis with silver staining (1) or fluorescent detection (2) and more recently capillary electrophoresis (3, 4) are commonly used to type STR alleles. We are developing an effective procedure for testing STR markers using time-of-flight mass spectrometry (TOF-MS). A year ago at the 1996 Promega Conference, GeneTrace Systems presented for the first time that PCR-amplified STRs could be detected by TOF-MS (5). Since that time we have obtained results from over two dozen tetranucleotide repeat STR loci. This paper will discuss the high degree of accuracy and precision that may be obtained with mass spectrometry as well as the rapid and automated nature of our process for STR typing.
Mass spectrometry offers unprecedented analysis times--on the order of seconds per sample--with excellent accuracy in measuring DNA fragment size. Substantial improvements have been made in recent years with the development of an effective ionization procedure, known as matrix-assisted laser desorption ionization (MALDI), and the discovery of new matrices, such as 3-hydroxypicolinic acid (6). In MALDI, DNA samples are mixed with an organic matrix and allowed to co-crystallize in a spatial array on a sample plate with each assay at a separate location. After the sample plate is placed in the mass spectrometer, which is under vacuum, a pulse of laser energy liberates a small portion of the DNA sample. While the generated ions travel to the detector in a matter of microseconds, multiple spectra are averaged for signal processing, which extends the measurement time to a few seconds. The DNA size is calculated by the time-of-flight to the detector in comparison to mass standards. Due to the increased accuracy with mass spectrometry, STR alleles may be reliably typed without comparison to allelic ladders. We show here that the STR results from time-of-flight mass spectrometry analysis may be obtained more accurately than gel electrophoresis and orders of magnitude faster.
MATERIALS AND METHODS
Genomic DNA from the K562 cell line (Promega Corporation, Madison, WI) was tested to insure reliable PCR amplification on all STR markers. Human genomic DNA samples representing several ethnic groups were purchased from Bios Laboratories (New Haven, CT). Allelic ladders were reamplified using a 1:1000 dilution of AmpFlSTR
Ô Green I (CSF1PO, TPOX, TH01, amelogenin) and AmpFlSTRÔ Blue (D3S1358, VWA, FGA) allelic ladders (PE Applied Biosystems, Foster City, CA).PCR primers were designed for each STR locus using Gene Runner software (Hastings Software, Inc., Hastings, NY) and sequence information from GenBank (http://www.ncbi.nlm.nih.gov). We successfully designed and tested primers from the following commonly used STR loci: CD4, CSF1PO, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, DYS19, F13A1, F13B, FES/FPS, FGA, HPRTB, LPL, TH01, TPOX, and VWA as well as the sex-typing marker amelogenin. In addition we examined three new tetranucleotide STR loci: GATA132B04 (chromosome 5), D16S2622, and D22S445. To improve the sensitivity and resolution in the mass spectrometer, primers were placed close to the repeat region to make the PCR product size ranges under 140 bp in size when possible. In the case of CD4, LPL, and amelogenin, previously published primers were used. Primers were purchased from Biosource/Keystone (Menlo Park, CA) or synthesized in-house. One primer in each locus-specific set was biotinylated at the 5-end for use in a solid-phase purification procedure.
PCR was performed using either Taq polymerase (Promega) or AmpliTaq Gold® (PE Applied Biosystems) as will be described in a future communication (7). Typically 1-10 ng of human genomic DNA template was tested. For high-throughput studies, PCR was conducted in a 384-well format. Samples prepared on the robotic workstation were amplified using AmpliTaq Gold® to reduce the possibility of primer dimer formation.
Parallel sample preparation was conducted on a robotic workdeck operated with a 96-pipet tip head designed by GeneTrace scientists. A purification procedure involving solid-phase capture and release from streptavidin-coated magnetic beads was then utilized (US Patent Application No. 08/639, 363) to remove salts which interfere with the mass spectrometry process (8). Each DNA sample was spotted with a 5:1 mixture of 3-hydroxypicolinic acid (3-HPA) and picolinic acid in 25 mM ammonium citrate and 25% acetonitrile.
A linear time-of-flight mass spectrometer of GeneTraces design was used for all of these experiments as will be described in a future communication (7). The mass spectrometer was calibrated daily with two oligonucleotide mass markers.
Expected masses for STR alleles were calculated from the GenBank® sequence and other known alleles by adding or subtracting the mass of the repeat sequence. Known alleles as described in STRBase (http://ibm4.carb.nist.gov:8800/dna/home.htm) were used. An additional mass of 313.2 Da was added to each allele to account for the nontemplate addition of adenine (2).
RESULTS AND DISCUSSION
In the mass spectrometer, both sensitivity and resolution are improved when PCR amplicons are under 140 bp in length. For most of the STR systems tested, we have therefore designed new PCR primers which are closer to the repeat region than those commonly used with electrophoretic separations. Our approach does not compromise the ability to reliably type STRs since the information content is in the number of repeats. Additionally, we convert our results from mass to genotype for each measured allele by comparing the observed mass for each peak to the expected mass of each allele in an STR system. Thus, our results can be correlated to other separation technologies already established for STR typing.
Figure 1 illustrates the mass accuracy that may be obtained using our TOF-MS technique. Two peaks were obtained from a DNA sample when it was amplified with PCR primers specific for the TPOX locus. The first peak was +5 Daltons (Da) from the expected mass for allele 8 while the second peak was -6 Da from the expected mass for allele 11. Since a single nucleotide is approximately 300 Da in mass, in this case we observed a mass accuracy better than 0.02 nucleotides for each allele. It is important to point out that this high degree of accuracy was obtained without comparison of the sample to an allelic ladder.
The mass difference between heterozygous peaks may also be used to confirm a genotype. In the TPOX sample described above, we observed a mass difference of 3768 Da between the two alleles. This mass is -11 Da from the expected mass of 3779.4 Da for three repeats since each AATG repeat has an expected mass of 1259.8 Da. By dividing the observed mass difference of 3768 Da by the expected mass for each repeat (1260 Da), we obtain a measurement of 2.99 repeats between the two alleles, as expected for alleles 8 and 11. Thus, reliable genotyping may be performed by comparing observed peak masses to expected allele masses and further confirmed with heterozygous samples by calculating the number of repeats between the two observed alleles. In addition, we can multiplex two or more STR loci per assay and are evaluating various combinations of STR markers.
A study of the precision of our TOF-MS measurements found that the standard deviation for each allele peak in a TPOX allelic ladder was better than 27 Da. In more familiar terms, this variation correlates to approximately 0.09 nucleotides, which is much better than similar precision measurements made with gel electrophoresis (2). We expect to be able to improve the precision of our measurements even further.
While we do not use allelic ladders for genotyping purposes, we have found them useful to demonstrate the resolution and accuracy obtained in our mass spectrometer between common STR alleles. For example, the single nucleotide difference between HUMTH01 alleles 9.3 and 10 can be fully resolved with our technique as demonstrated previously (5). Figure 2 shows several samples amplified with TPOX-specific PCR primers. The alleles are all well resolved and distinguishable.
Different repeat sequences may also be distinguished with TOF-MS. For example, if measuring the top strand of HUMTH01 (as designated from the GenBank® sequence; accession number D00269), the first complete repeat is TCAT, which has a mass of 1210.8 Da. On the other hand, if the bottom strand is examined, the first complete repeat is AATG, which has a mass of 1259.8 Da. We have examined both strands from HUMTH01 and have been able to distinguish them based upon the mass differences between the alleles (Figure 3).
Mass spectrometry also allows us to measure the nucleotide added from nontemplated addition during PCR. While adenine is often preferentially added by Taq polymerase (2,9), we have observed the addition of other nucleotides, such as thymine (Figure 4). We are studying the effect of various primer sequences, particularly the 5'-end of the reverse primer, to reduce nontemplated addition.
While the high degree of reliability with mass spectrometry is important, it is the speed of the technique that also makes it such an effective tool for high-throughput DNA analysis. In our high-throughput approach, we amplify and purify 384 samples in parallel using a robotic workstation containing a 96-tip pipetting head of our own design. The samples are then transferred to the mass spectrometer where each sample can be analyzed in a few seconds. In this fashion, several thousand samples may be processed daily with a single robotic workstation and automated mass spectrometer. Our approach offers a solution to the large DNA databases which are being generated from convicted offenders (10).
ACKNOWLEDGMENTS
We thank Dan Pollart, Mike Abbott, Gordon Haupt, Tom Shaler, Joanna Hunter, and Hua Lin of GeneTrace and John Tonkin of the California Department of Justice for technical assistance and helpful discussions. This research was supported in part by a grant from the National Institute of Justice (97-LB-VX-0003).
REFERENCES
1. Wiegand, P., Budowle, B., Brinkmann, B. (1993) Forensic validation of the STR systems SE33 and TC11. Int. J. Legal Med. 105: 315-320.
2. Kimpton, C.P., Gill, P., Walton, A., Urquhart, A., Millican, E.S., Adams, M. (1993) Automated DNA profiling employing multiplex amplification of short tandem repeat loci. PCR Meth. Appl. 3:13-22.
3. Butler, J.M., McCord, B.R., Jung, J.M., Allen, R.O. (1994) Rapid analysis of the short tandem repeat HUMTH01 by capillary electrophoresis. BioTechniques 17: 1062-1070.
4. Wang, Y., Ju, J., Carpenter, B.A., Atherton, J.M., Sensabaugh, G.F., Mathies, R.A. (1995) Rapid sizing of short tandem repeat alleles using capillary array electrophoresis and energy-transfer fluorescent primers. Anal. Chem. 67: 1197-1203.
5. Becker, C.H., Li, J., Shaler, T.A., Hunter, J.M., Lin, H., Monforte, J.A. (1997) Genetic analysis of short tandem repeat loci by time-of-flight mass spectrometry. Proceedings from the Seventh International Symposium on Human Identification 1996, pp. 158-162.
6. Wu, K.J., Steding, A., Becker, C.H. (1993) Matrix-assisted laser desorption time-of-flight mass spectrometry of oligonucleotides using 3-hydroxypicolinic acid as an ultraviolet-sensitive matrix. Rapid Commun. Mass Spectrom. 7: 142-146.
7. Butler, J.M., Li, J., Shaler, T.A., Monforte, J.A., Becker, C.H. (1997) Reliable genotyping of short tandem repeat loci without an allelic ladder using time-of-flight mass spectrometry. Int. J. Legal Med., in preparation.
8. Shaler, T.A., Wickham, J.N., Sannes, K.A., Wu, K.J., Becker, C.H. (1996) Effect of impurities on the matrix-assisted laser desorption mass spectra of single-stranded oligodeoxynucleotides. Anal. Chem. 68: 576-579.
9. Clark, J.M. (1988) Novel non-templated nucleotide addition reactions catalyzed by procaryotic and eucaryotic DNA polymerases. Nucleic Acids Res. 16: 9677-9686.
10. Gill, P., Urquhart, A., Millican, E., Oldroyd, N., Watson, S., Sparkes, R., Kimpton, C.P.(1996) A new method of STR interpretation using inferential logic--development of a criminal intelligence database. Int. J. Legal Med. 109: 14-22.
Figure 1. Mass spectrum of a TPOX sample (genotype 8,11). The mass scale was converted into PCR product size in base pairs to illustrate that smaller PCR products are used with our mass spectrometry technique. Reliable genotyping can be performed without an allelic ladder by comparing obtained peak masses to expected allele masses. Additionally, the number of repeats between observed alleles may be calculated by dividing the mass difference between peaks by the repeat mass.

Figure 2. Multiple samples from the STR locus TPOX. The allelic ladder in the top frame demonstrates that all common alleles may be detected and fully resolved from one another. The genotype for each sample is listed on the left side of the figure. The small peaks which appear immediately in front of the allele peaks result from a purine base loss (-117 Da) during the ionization process in the mass spectrometer. The small peak that immediately follows some of the major allele peaks results from a matrix-adduct (+139 Da).

Figure 3. Comparison of mass spectra from the upper and the lower strand of HUMTH01 (as designated from the GenBank® sequence; accession number D00269). The mass scales have been shifted to align alleles 5 and 6 from an allelic ladder.

Figure 4. Mass spectrum of a homozygous individual at the tetranucleotide STR D16S2622. The mass difference between the full-length product and the product plus nontemplated "adenylation" was 305 Da. Nontemplated addition in this case was with a thymine (304.2 Da) rather than adenine (313.2 Da).

Go to proceedings home page