James W. Schumm, Ann M. Lins, Cynthia J. Sprecher, and Katherine A. Micka
Promega Corporation, 2800 Woods Hollow Road, Madison, WI 53711
INTRODUCTION
MATERIALS AND METHODS
CHOICE OF STR SYSTEMS AND CONSTRUCTION OF ALLELIC LADDERS
MULTIPLEX STR SYSTEMS
POPULATION STATISTICS OF STR SYSTEMS
SUMMARY AND FUTURE PLANS
ACKNOWLEDGMENTS
REFERENCES
TABLES and FIGURES
Short tandem repeat (STR) loci are DNA sequences which contain tandemly repeated 3 to 7 base pair motifs (Edwards et al. 1991, 1992; Polymeropoulos et al., 1991a, 1991b). Many individual STR loci are polymorphic due to variation in the number of copies of the repeated motif present at the locus. Thousands of polymorphic STR loci have been described (Murray et al. 1994). STR loci may be used in combination with the PCR amplification employing flanking sequence primers to generate products ranging from 100 to 400 bp long. These short product lengths allow for use of partially degraded DNA samples. STR loci are also attractive as genetic markers for forensic and paternity analysis because, in conjunction with the amplification process, they require as little as 0.5ng of DNA template.
Various STR loci have been classified as either simple, compound, or complex based on the faithfulness of the repeats present at each locus (Moller et al. 1994a, 1994b). Heterogeneity observed with the simple systems results almost exclusively from variation in the number of repeat motifs present in the alleles. By contrast, complex STR loci contain many microvariant alleles, revealing a significant portion of their polymorphism through one- or two-base insertions or deletions either within or outside the region of repeats.
This paper describes the development of nine tetrameric STR loci, eight classified as simple STRs and one as a compound system. The regularity of the variation in these systems has allowed the development of allelic ladders as size markers. The limited size range of alleles in each of the selected STR systems has lead to the development of STR multiplex sets in which alleles of each system do not overlap with alleles of others in the multiplex set. These two developments support rapid, efficient, and accurate allele identification. All the developed systems are compatible with a variety of detection formats and contain features which allow inter-laboratory comparisons to be made with precision and confidence.
Oligonucleotide primers, allelic ladders, and buffers for silver stain procedures were derived from GenePrint STR Systems kits (Promega, Madison, WI). The corresponding materials for fluorescent procedures were derived from GenePrint Fluorescent STR Systems kits (Promega, Madison, WI). All purification, amplification, separation, and detection methods are detailed in the GenePrint STR Systems and GenePrint Fluorescent STR Systems users manuals (TMD#004 and TMD#006, respectively; Promega, Madison WI).
CHOICE OF STR SYSTEMS AND CONSTRUCTION OF ALLELIC LADDERS
We surveyed over 35 STR loci, selecting the nine systems which demonstrated high heterozygosity resulting almost exclusively from tetranucleotide repeat variation (i.e. rarely including one-base or two-base variants). These systems generated good amplification product yield with as little as 0.5ng of template, and revealed easily interpreted alleles with either the silver or fluorescent detection formats. The selected STR systems (Table 1) are also located on separate chromosomes to increase the chance of independent inheritance of the loci. The amelogenin locus (Akane et al. 1991; Sullivan et al. 1993), which is not an STR, was included in the set because it generates a 212 bp fragment from the X chromosome and a 218 bp fragment from the Y chromosome, thus allowing its application to gender identification.
Amplifiable systems containing short tandem repeats are prone to a process of repeat slippage (Weber and May 1989; Sprecher et al. 1996) which often results in a "shadow band" one repeat unit smaller than the authentic allele. We generally rejected STR loci displaying this problem. However, we included the vWA locus (Fig. 1) in which this phenomenon was most prominently observed because of the general acceptance of this locus in the forensic community.
A second artifact of the PCR process is the appearance of a light band one base below the primary allele fragment as seen with TPOX, vWA, LPL and F13B in the silver stain representation of Figure 1. Most likely this is caused by the terminal transferase activity of the Taq DNA polymerase adding a single adenine nucleotide at the 3' termini of the amplified PCR products (Clark 1988; Kimpton et al. 1993). The efficiency of this process varies among loci as indicated by the relative intensities of corresponding extra bands for the loci described. Recent protocol modifications (i.e. adding a single extension step of 30 minutes at 60°C at the end of the manufacturer's recommended protocol) decreases the amount of product without the adenine nucleotide and increases the amount with it (data not shown).
The genetic constitution of the STR loci was also considered in their selection. We selected loci which contain very few length microvariants. Of the nine STR loci described containing more than 80 alleles among them, only three alleles differ from the others by less than 4 bases; i.e. TH01 allele 9.3 (Puers et al. 1993), F13A01 allele 3.2 (Puers et al. 1994), HPRTB allele 11+ (Edwards et al. 1992; Micka et al. 1996). Thus, if the TH01 alleles 10 and 9.3 are binned, the separation and detection methods employed with these STR systems do not require one-base resolution.
Finally, separation of amplification products is performed in a denaturing gel system. When the separated fragments are stained with silver, two distinct bands are sometimes observed for each allele (Fig. 1 loci F13A01, F13B, FESFPS, LPL, vWA and TH01). These represent the opposing strands of a single PCR product which are the same length, but migrate differentially because they contain different DNA sequences (Frank et al. 1979; Sprecher et al. 1996). Different loci show different degrees of this separation. The opposing strands of CSF1PO, HPRTB, and TPOX coincide with one another appearing as a single fragment. When the upper and lower fragments of a single allele are separated more than the distance between neighboring alleles, interpretation becomes complicated. For this reason, we have rejected the use of STR loci in which the complimentary strands migrate very far apart from one another (e.g. D18S51, D21S11, data not shown).
For each of the nine STR loci and the amelogenin locus, we constructed an allelic ladder, i.e. a mixture of many or all of the possible amplified alleles for the individual locus. Allelic ladders serve as size standards allowing rapid and precise comparison of amplified sample DNAs with well-characterized allelic ladder components (Budowle et al. 1991; Puers et al. 1993, 1994; Sprecher et al. 1996). Using these size standards, there is no need for measurement of migration distance or calculation to determine the size of each allele. Because the allelic ladders share both length and sequence identity with amplified test sample products, ladder components co-migrate with the amplified samples regardless of the constitution of the gel used for separation or the labeled (e.g. fluorescent dye) or unlabeled nature of the detection system employed. The components of the allelic ladder for each locus and the size ranges for these fragments are listed in Table 1.
To achieve high throughput with the STR systems, we have developed two triplex sets for use with silver stain or other post-electrophoresis staining technologies (e.g. SYBR green) and two related quadriplex systems which can be used in conjunction with fluorescence detection equipment. The CTT triplex (Schumm et al. 1994; Lins et al. 1996) allows co-amplification of three loci, CSF1PO, TPOX, and TH01, while the FFv triplex (Lins et al. 1996) includes the loci F13A01, FESFPS, and vWA. The amplification products of the strain K562 DNA with each triplex are displayed in Figure 1 alongside allelic ladders for each corresponding locus.
The same six loci plus the STR loci F13B and LPL have been incorporated into two quadriplex systems for fluorescence detection. The CTTv (CSF1PO, TPOX, TH01, vWA) and FFFL (F13A01, FESFPS, F13B, LPL) quadriplex systems (Schumm et al. 1994; Lins et al. 1996) contain one fluoresceinated primer in each primer pair. Figure 2 illustrates the products of each of these systems detected with the Molecular Dynamics FluorImager SI fluorescent scanner (Sunnyvale, CA). This instrument allows the electrophoretic separation of fluorescent STR amplification products in standard gel rigs followed by placement of the gel within the instrument and a three minute scan of the fluorescent material. This means that a single instrument can accommodate analysis of 30 or more gels in a single day, allowing enormous throughput. As with the silver detection methods, allelic ladders for each locus are included to allow rapid allele identification (Fig. 2).
The same quadriplexes may be analyzed in a similar fashion using the FMBIO fluorescent scanner manufactured by Hitachi Software, Inc. (San Bruno, CA). However, this instrument employs a 532nm wavelength YAG laser which, with appropriate filters, allows detection of both fluorescein and rhodamine-like fluorescent tags. We have created a new CTTv quadriplex which contains one rhodamine-like fluorescent primer in each primer pair. Figure 3C illustrates the simultaneous detection of the rhodamine-like CTTv quadriplex and the fluorescein-FFFL quadriplex both in a single lane for each DNA sample. The images generated by the two separate dyes are shown in Figures 3A and 3B for the CTTv and FFFL quadriplexes, respectively. This approach cuts in half the number of gels which must be poured, run, and analyzed.
Finally, the ABI 373 DNA Sequencer and the ABI Prism 377 DNA Sequencer (Applied Biosystems Division, Perkin Elmer, Foster City, CA) are commonly employed pieces of equipment for DNA sequence analysis. These instruments are compatible with the one-color and two-color fluorescent STR quadriplex systems we have developed. Figure 4 illustrates the fluorescein-labeled FFFL STR quadriplex plus the rhodamine-like-labeled CTTv STR quadriplex subjected to electrophoresis together. Correlation of migration distances of allelic ladder components in other gel lanes (data not shown) allows calibration and allele size determination. The ABI Prism 377 instrument is designed to pre-run and run gels within it. Thus, the practical limit of two gels per eight hour day plus one overnight, is less than the high throughput capabilities achieved with the Hitachi or Molecular Dynamics fluorescent scanners.
POPULATION STATISTICS OF STR SYSTEMS
Studies defining various population statistics for the STR locus combinations described in this paper will be submitted soon for publication. Steve Creacy and Robert A. Bever of Genetic Design, Inc. (Greensboro, NC) and Cynthia J. Sprecher and James W. Schumm of Promega Corporation have generated a data set including at least 200 individuals in each of the African-American, Caucasian-American, and Hispanic-American populations. Table 2 lists the average matching probabilities (Jones 1972) which are achieved using these loci either as triplexes or quadriplexes, or in combination of six or eight loci. The combined use of the two triplexes provides matching probabilities ranging from 1 in 385,000 to 1 in 4,565,000 depending on the race being analyzed. Addition of the F13B and LPL loci, as achieved with the two quadriplexes produces matching probabilities of 1 in 430,000,000 for African-Americans, 1 in 17,400,000 for Caucasian-Americans, and 1 in 23,600,000 for Hispanic-Americans, respectively.
The average power of exclusion (PE) (Brenner and Morris 1990; Endean 1990) is a statistic of greater interest to those performing paternity determinations. The PE values for each multiplex and various multiplex combinations are described in Table 3. The combined PE values range from 0.980 to 0.989, depending on the racial group, for the two triplex sets. Higher values of 0.992 to 0.998 are observed for the combination of two quadriplexes, depending on the racial group. A panel of 9 to 12 STR systems with polymorphic content similar to these systems will be required to provide PE values greater than 0.999 in all races.
In summary, we have developed 9 STR systems plus amelogenin for analysis using silver stain detection. An allelic ladder has been developed for use with each system. Two triplex sets, CTT and FFv, have been developed to allow high throughput applications. Validation of the CTT triplex has been extensive (Micka et al. 1996).
The same 9 STR loci plus amelogenin have been developed for fluorescent analyses. Two STR quadriplexes have been developed with these loci. Detection of STR quadriplexes with the Hitachi FMBIO and Molecular Dynamics FluorImager fluorescent scanners as well as the ABI DNA Sequencers has been demonstrated. Recently, a two-color system has been developed which allows simultaneous detection of the CTTv and FFFL quadriplexes in a single gel lane when detected either with the Hitachi FMBIO fluorescent scanner or ABI DNA Sequencers.
The matching probabilities and powers of exclusion have been determined with various combinations of these STR systems. Use of the two quadriplex systems provides matching probabilities exceeding 1 in 17,000,000 for each of the races evaluated. Values for the power of exclusion range from 0.998 for Caucasian-Americans and African-Americans and 0.992 for Hispanic-Americans. We will be supporting and participating in additional validation studies as new multiplexes are developed.
We are currently evaluating an additional 100 STR systems for the development of new multiplexes to assist with paternity analyses. In addition to the two-color system which is currently being optimized, we are considering development of a third color for detection a new multiplex by the Hitachi FMBIO fluorescent scanner and the ABI DNA Sequencers.
We also wish to generate an INTERNET-based population database that would support published STR population data. Most published population data includes allele frequencies, but lacks the underlying genotype frequencies which allow additional statistical calculations. This INTERNET site, http://www.promega.com, would be available for retrieval of the actual genotype frequencies from a broad variety of sources as a service to the community.
We wish to thank Ms. Terry Spear of the California Criminalistics Institute (Sacramento, CA) for providing forensic DNA samples prepared from blood stains which we employed as amplification templates in Figures 2, 3 and 4.
Akane A., Shiono H., Matsubara K., Nakahori Y., Seki S., Nagafuchi S., Yamada M. and Nakagome Y. (1991) Sex identification of forensic specimens by polymerase chain reaction (PCR): two alternative methods. Forensic Sci. Int. 49:81-88.
Brenner C. and Morris J.W. (1989) Paternity index calculations in single locus hypervariable DNA probes: validation and other studies. In: Proceedings for the International Symposium on Human Identification. Promega Corporation, 1990: 21-53.
Budowle B., Chakraborty R., Giusti A.M., Eisenberg A.J. and Allen R.C. (1991) Analysis of the VNTR Locus D1S80 by the PCR Followed by High-Resolution PAGE. Am. J. Hum. Genet. 48:137-144.
Clark J.M. (1988) Novel non-templated nucleotide addition reactions catalyzed by procaryotic and eucaryotic DNA polymerases. Nucl. Acids Res. 16:9677-9686.
Edwards A., Civitello A., Hammond H.A. and Caskey C.T. (1991) DNA typing and genetic mapping with trimeric and tetrameric tandem repeats. Am. J. Hum. Genet. 49:746-756.
Edwards A., Hammond H.A., Jin L., Caskey C.T. and Chakraborty R. (1992) Genetic variation at five trimeric and tetrameric tandem repeat loci in four human population groups. Genomics 12:241-253.
Endean D.J. (1989) RFLP analysis for paternity testing: observations and caveats. In: Proceedings for the International Symposium on Human Identification. Promega Corporation, 1990: 55-76
Frank R. and Koster H. (1979) DNA chain length markers and the influence of base composition on electrophoretic mobility of oligodeoxyribonucleotides in polyacrylamide gels. Nucl. Acids Res. 6:2069-2087.
Jones D.A. (1972) Blood samples: probability of discriminations. J. Forensic Sci. Soc. 12:355-359.
Kimpton C.P., Gill P., Walton A., Urquhart A., Millican E.S. and Adams M. (1993) Automated DNA profiling employing multiplex amplification of short tandem repeat loci. PCR Methods and Applications 3:13-22.
Lins A.M., Sprecher C.J., Puers C. and Schumm J.W. (1996) Multiplex sets for the amplification of polymorphic short tandem repeat loci--silver stain and fluorescent detection. Biotechniques (forthcoming).
Micka K., Sprecher C.J., Lins A.M., Comey C., Coons B., Crouse C., Endean D., Zold K., Lee S., Duda N., Ma M. and Schumm J.W. (1996) Validation of Multiplex Polymorphic STR Amplification Sets Developed for Personal Identification Applications. J. Forensic Sci. (forthcoming).
Moller A., Meyer E. and Brinkmann B. (1994a) Different types of structural variation in STRs: HumFES/FPS, HumVWA and HumD21S11. Int. J. Leg. Med. 106:319-323.
Moller A., Wiegand P., Gruschow C., Seuchter S. A., Baur M.P. and Brinkmann B. (1994b) Population data and Forensic Efficiency Values for the STR Systems HumVWA, HumMBP and HumFABP. Int. J. Leg. Med. 106:183-189.
Murray J.C., Buetow K.H., Weber J.L., Ludwigsen S., Scherpbier-Heddema T., Manion F. et al. (1994) A comprehensive human linkage map with centimorgan density. Science 265:2049-2054.
Polymeropoulos M.H., Rath D.S., Xiao H. and Merril C.R. (1991a) Tetranucleotide repeat polymorphism at the human c-fes/fps proto-oncogene (FES). Nucl. Acids Res. 19:4018.
Polymeropoulos M.H., Rath D.S., Xiao H. and Merril C.R. (1991b) Tetranucleotide repeat polymorphism at the human coagulation factor XIII A subunit gene (F13A1). Nucl. Acids Res. 19:4306.
Polymeropoulos M.H., Xiao H., Rath D.S. and Merril C.R. (1991c) Tetranucleotide repeat polymorphism at the human tyrosine hydroxylase gene (TH). Nucl. Acids Res. 19:3753.
Puers C., Hammond H.A., Caskey C.T., Lins A.M., Sprecher C.J., Brinkmann B. and Schumm J.W. (1994) Allelic ladder characterization of the short tandem repeat polymorphism located in the 5' flanking region to the human coagulation factor XIII A subunit gene. Genomics 23:260-264.
Puers C., Hammond H.A., Jin L., Caskey C.T. and Schumm J.W. (1993) Identification of repeat sequence heterogeneity at the polymorphic short tandem repeat locus HUMTH01 [AATG]n and reassignment of alleles in population analysis by using a locus-specific allelic ladder. Am. J. Hum. Genet. 53:953-958.
Schumm J.W., Lins A., Puers C. and Sprecher C. (1993) Development of nonisotopic multiplex amplification sets for analysis of polymorphic STR loci. In: Proceedings from the Fourth International Symposium on Human Identification. Promega. Corporation, 1994:177-187.
Sprecher C.J., Puers C., Lins A.M. and Schumm J.W. (1996) A general approach to analysis of polymorphic short tandem repeat loci. Biotechniques 20:266-276.
Sullivan K., Mannucci A., Kimpton C.P. and Gill P. (1993) A Rapid and Quantitative DNA Sex Test: Fluorescence-Based PCR Analysis of X-Y Homologous Gene Amelogenin. BioTechniques 15:636-641.
Weber J.L. and May P.E. (1989) Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am. J. Hum. Genet. 44:388-396.
Locus |
Chromosome Location |
Size Range of Known Alleles (bases) |
Size Range of Allelic Ladder (bases) |
Allelic Ladder Component Names1 |
| Amelogenin | X:Y | 212,218 |
212,218 |
212,218 |
| CSF1PO | 5q33.5-34 | 295-327 |
299-323 |
7-15 |
| F13A01 | 6p24-25 | 281-331 |
283-331 |
4-9, 11-16 |
| F13B | 1q31-q32.1 | 169-193 |
169-189 |
6-11 |
| FESFPS | 15q25-qter | 222-50 |
226-246 |
8-13 |
| HPRTB | Xq26 | 259-303 |
259-303 |
6-17 |
| LPL | 8p22 | 105-133 |
105-133 |
7-14 |
| TH01 | 11p15.5 | 179-203 |
179-203 |
5-11 |
| TPOX | 2p23-2pter | 224-252 |
232-248 |
6-13 |
| vWA | 12p12-pter | 131-171 |
143-167 |
14-20 |
1 Names of alleles represent the number of repeats within the
alleles. The TH01 allele 9.3 (198 bases), F13A01 allele 10 (307 bases) and allele 3.2 (281
bases), F13B alleles 11 and 12, FESFPS alleles 7 and 14, and vWA alleles 11, 13, and 21
are not currently included in these allelic ladders.
Matching Probability1 |
||||
Caucasian- |
African- |
Hispanic- |
||
| Multiplexes for Silver Detection | ||||
| CTT Triplex (CSF1PO,TPOX,TH01) | 1 in 424 | 1 in 1639 | 1 in 546 | |
| FFv Triplex (F13A01,FESFPS,vWA) | 1 in 909 | 1 in 2785 | 1 in 1342 | |
| Both Triplexes (6 loci, above) | 1 in 385,000 | 1 in 4,565,000 | 1 in 733,000 | |
| Multiplexes for Fluorescent Detection | ||||
| CTTv Quadriplex (CSF1PO,TPOX,TH01,vWA) | 1 in 6623 | 1 in 25575 | 1 in 7194 | |
| FFFL Quadriplex (F13A01,FESFPS,F13B,LPL) | 1 in 2632 | 1 in 16807 | 1 in 3279 | |
| Both Quadriplexes (8 loci, above) | 1 in 17,400,000 | 1 in 430,000,000 | 1 in 23,600,000 | |
1 Matching probabilities have been determined as part of an
unpublished collaborative study among S. Creacy and R.A. Bever of Genetic Design
(Greensboro, NC) and authors C.J. Sprecher and J.W. Schumm.
Power of Exclusion1 |
|||
Caucasian- |
African- |
Hispanic- |
|
| Multiplexes for Silver Detection | |||
| CTT Triplex (CSF1PO,TPOX,TH01) | 0.878 |
0.906 |
0.830 |
| FFv Triplex (F13A01,FESFPS,vWA) | 0.907 |
0.936 |
0.881 |
| Both Triplexes (6 loci, above) | 0.989 |
0.994 |
0.980 |
| Multiplexes for Fluorescent Detection | |||
| CTTv Quadriplex (CSF1PO,TPOX,TH01,vWA) | 0.957 |
0.967 |
0.918 |
| FFFL Quadriplex (F13A01,FESFPS,F13B,LPL) | 0.942 |
0.945 |
0.902 |
| Both Quadriplexes (8 loci, above) | 0.998 |
0.998 |
0.992 |
1 Power of exclusions have been determined as part of an unpublished collaborative study among S. Creacy and R.A. Bever of Genetic Design (Greensboro, NC) and authors C.J. Sprecher and J.W. Schumm.
Amplification of varying concentrations of K562 template DNA at different STR loci and the Amelogenin locus. K562 DNA ranging in concentration from 0.5-250ng was amplified at various STR loci and the Amelogenin locus. The specific locus (or multiplex system) is indicated below each panel. For each panel, lanes 1 and 8 contain the locus-specific allelic ladder; lanes 2-6 contain amplified K562 DNA using 250, 25, 5, 1 and 0.5ng of starting template, respectively; lane 7 contains a negative control amplification reaction (i.e. no template DNA added). We routinely amplify 1-25ng of template DNA to yield an amount of sample alleles similar to the amount of amplified ladder alleles.
Fluorescent detection of CTTv and FFFL multiplex amplifications with the Molecular Dynamics FluorImager SI fluorescent scanner. For each panel, lanes (1) through (5) display amplification products of 5ng of five different forensic DNA templates derived from blood stains. The panel on the left shows amplification with the CTTv multiplex (CSF1PO, TPOX, TH01, and vWA loci) while the panel on the right illustrates amplification with the FFFL multiplex (F13A01, FESFPS, F13B, LPL loci). One of the two primers for each locus was labeled with fluorescein. Lane (6) displays a sample without DNA subjected to the same procedures. The lanes labeled (L) contain a mixture of the allelic ladders for the CSF1PO, FESFPS, TH01, and vWA loci (left panel) and F13A01, FESFPS, F13B, LPL loci (right panel). The separated fluoresceinated amplification products were visualized with the Molecular Dynamics FluorImager SI.
Simultaneous detection of CTTv and FFFL multiplex amplifications using the Hitachi FMBIO fluorescent scanner. Easch template was amplified separately with the rhodamine-derivative-labeled CTTv multiplex (CSF1PO, TPOX, THO1 and vWA loci) and the fluorescin-labeled FFFL multiplex (F13A01, FESFPS, F13B, LPL loci), mixed with loading solution, and subjected to electrophoresis in a single lane. Panel A displays the results of a 625nm scan detecting the CTTv products, panel B displays the results of a 505 nm scan detecting the FFFL products, and panel C shows a two-color display of both CTTv (red) and FFFL (green) products simultaneously. Lanes (1) through (5) display amplification products of 5ng of five different forensic DNA templates derived from blood stains. Lane (6) displays a sample with DNA subjected to the same procedures. The lanes labeled (L) contain a mixture of the allelic ladders for the CSF1PO, FESFPS, THO1 and vWA loci and F13A01, FESFPS, F13B and LPL loci. All the separated fluoresceinated amplification products were visualized with the Hitachi FMBIO fluorescent scanner.
Simultaneous detection of CTATv and FFFL multiplex amplifications using the Applied Biosystems Prism 377 DNA Sequencer. Five nanograms of a forensic DNA template derived from a blood stain was amplified separately with the rhodamine-derivative-labled CTATv multiplex (CSF1PO, TPOX, amelogenin, THO1 and vWA loci) and the fluorescein-labeled FFFL multiplex (F13A01, FESFPS, F13B, LPL loci), mixed with loading solution, and subjected to electrophoresis in a single lane. The green tracing illustrates the CTATv products, and the blue tracing displays the FFFL products. All the separated fluoresceinated products were visualized with the Applied Biosystems Prism 377 DNA Sequencer.
Go to proceedings home page