Catalog  |  Cart  |  Log In

Subpopulation Heterogeneity in Mitochondrial DNA Evaluated by Analysis of Molecular Variance of Sequence-Specific Oligonucleotide Typing of Worldwide Populations

Terry Melton and Mark Stoneking
Department of Anthropology, The Pennsylvania State University, University Park, PA


ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
ACKNOWLEDGMENTS
REFERENCES
TABLES and FIGURES

ABSTRACT

Mitochondrial DNA (mtDNA) control region sequence-specific oligonucleotide (SSO) types have been determined for 2013 individuals in 27 worldwide populations. From this data set, a subset of nine populations possessing 252 mtDNA SSO types, comprised of three African, three Asian, and three European populations, was selected for an analysis of molecular variance (AMOVA) which can detect the extent and statistical significance of heterogeneity present for SSO types. AMOVA incorporates information about genetic distances between pairs of SSO types to enhance a traditional analysis of variance approach for partitioning variation among populations and regions. Not surprisingly, there were statistically significant differences between African, Asian, and European SSO types. Within these regional classifications, however, pairwise population comparisons revealed both homogeneity and heterogeneity. Overall for these data, most mtDNA SSO type variation is present within populations (77%) and a moderate amount of the variation is attributable to major geographic regions (19%), while very little of the variation can be attributed to among population/within region variation (4%). This analysis shows the usefulness of the AMOVA approach for assessing subpopulation heterogeneity in SSO type variation at the population genetics level.

INTRODUCTION

Mitochondrial DNA (mtDNA) polymorphisms provide a valuable source of information for discriminating among individuals (Cann et al. 1987; Vigilant et al. 1989; Stoneking et al. 1990; DiRienzo and Wilson 1991; Kocher and Wilson 1991; Ward et al. 1991). The use of mtDNA in forensic investigations is increasing (Stoneking et al. 1991; Ginther et al. 1992; Holland et al. 1993; Stoneking et al. 1995) and guidelines for its use are becoming available to the forensics community (Wilson et al. 1993; Stoneking and Melton, forthcoming). Although the sequencing of segments of the hypervariable control region of mtDNA would provide the highest resolution for matching samples in a forensic investigation, an inexpensive alternative exclusionary technique is sequence-specific oligonucleotide (SSO) typing (Stoneking et al. 1991). In this system, oligonucleotide probes detect an enormous amount of nucleotide variation at 13 sites across the control region (Stoneking et al. 1991; Melton et al. 1995; Melton and Stoneking, forthcoming).

For forensic applications of DNA typing, the existence of subpopulation heterogeneity is an important factor in determining match probabilities. SSO typing has previously detected heterogeneity among world-wide populations based on the frequencies of nucleotide sequence variants (Stoneking et al. 1991). In addition, logistic regression methods revealed that mtDNA nucleotide variant frequencies are somewhat predictive of ethnicity (Connor and Stoneking 1994). However, because all sites of nucleotide variation are tightly linked in the non-recombining mtDNA molecule, it is possible to evaluate subpopulation heterogeneity using mitochondrial DNA "haplotypes", that is, an individual's profile of variants across the control region. We have applied an analysis of molecular variance (AMOVA) (Excoffier et al. 1992) to a subset of SSO type data from worldwide populations to illustrate how subpopulation heterogeneity may be detected and quantified.

METHODS

In our overall study, 2013 individuals have been typed for sequence variants at 13 nucleotide positions in 8 regions across the mtDNA control region. SSO types were either determined with SSO typing of samples of purified genomic DNA or, in a few populations, inferred from DNA sequences of the mtDNA control region. The 27 populations are from Africa (9 populations), Asia (12 populations), and Europe (including North American Caucasians, 6 populations), shown in Figure 1.

For this analysis, nine populations were chosen from the large data set to illustrate the analysis of molecular variance (Table 1). From the overall population of Africans, the Mukogodo, a hunter-gatherer group from Kenya (N=28), Mandenka tribal members of Senegal (N=116, Graven et al. 1995), and Yorubans of Nigeria (N=13, Vigilant 1990) were selected, giving a total sample size of 157. These populations were chosen because they are separated by substantial geographic distances, and represent both western African and sub-Saharan populations.

From the larger population of Asians, the southern Chinese (N=103), Malays (N=81), and Pakistanis (N=73) were selected. A previous in-depth analysis of Asian SSO type variation (Melton and Stoneking, forthcoming) indicated that western Asians and eastern Asians are significantly different with respect to SSO types. Thus, these three populations are illustrative of SSO type contrasts in Asian subpopulations.

European populations were broadly defined for this analysis to include Caucasian groups from North America. Two subpopulations from the United States (Midwest, N=190; Northeast, N=129) were chosen because of their forensic interest to this audience. French samples (N=81) were included to illustrate the extreme SSO type homogeneity of Western European populations and North American populations.

The SSO typing method used has been described in detail elsewhere (Stoneking et al. 1991; Melton et al. 1995; Melton and Stoneking, forthcoming). Individual results of the SSO typing at each site are arranged into a composite mtDNA SSO type. For example, the mtDNA type 1-1-2-1-2-1-1-0 indicates that probe variant IA1 annealed in the IA region, IB1 annealed in the IB region, etc. A "0" indicates that a blank result for the above example was obtained for the IID variant. A blank result for IID occurs either when a substitution in a nearby site prevents probe annealing, or when a nucleotide other than A or G is present at position 247 (the IID probe specific site). While blanks in different individuals for a probe region could reflect different substitutions, blanks are considered to be the same variant for the purpose of analysis. In most populations, blank results are uncommon (usually less than 5% of the total), therefore blank results do not carry much weight in the analysis. An SSO type, or profile, is equivalent to a single locus which may be compared among individuals; type frequencies are therefore much like allele frequencies.

An analysis of molecular variance (AMOVA) (Excoffier et al. 1992) was applied to the SSO types to measure the apportionment of diversity within and among the nine populations. AMOVA is especially useful for analysis of mtDNA data since it does not require independence of nucleotide sites. This method incorporates information about genetic distances between pairs of mtDNA types enhancing the more traditional computation of variance components and F-statistics from mtDNA type frequency data used to evaluate population subdivision (Cockerham 1969, 1973; Weir and Cockerham 1984). This evaluation may be based on whatever hierarchy of populations is stipulated and may be based on geographic, linguistic, ethnic or historical affinities. A conventional sum of squared deviations is partitioned into variance components attributable to variation among regions, among populations within regions, and within populations. Presently, up to 255 SSO types can be analyzed by the available software (L. Excoffier, pers. comm.); for this analysis, 814 individuals with 252 mtDNA types in nine populations were selected from a larger sample of 2013 individuals in 27 populations.

A genetic distance between each pair of SSO types was calculated for use in the analysis of molecular variance. For each variant region, the total number of nucleotide differences between two SSO types was counted; this count was then summed over all eight variant regions. In general, this procedure is similar to counting the number of site differences between restriction haplotypes, with changes made for SSO data by adjusting for blanks and for probes which detect two polymorphic sites instead of only one. For example, two SSO types which carry variants IA2 and IA3 would have two differences between them for that region since the two probes for these variants differ from each other at two nucleotide positions (Figure 2). Although a number of possible substitution patterns could give a blank result at any region, DNA sequences of these areas most often indicate that only one substitution accounts for the blank, so that any comparison of sites involving a blank is given a default distance of one.

To determine which pairs of populations were different or similar with respect to SSO types, genetic distances (analogous to a coefficient of coancestry) were calculated by AMOVA. Permutation procedures in AMOVA were used to test the significance of distance and variance components. Null distributions were generated by allocating every individual to a randomly chosen population while holding sample sizes constant over a large number of permutations (N=1000). Probabilities of observing random genetic distances and variance components greater than those generated in the analysis were reported. This method of significance testing is useful because concerns about the normality of underlying variance distributions can be disregarded (Excoffier et al. 1992).

For this analysis, a hierarchy was stipulated in which the nine populations were clustered into regional groups (Africa, Asia, and Europe). Frequencies of SSO types for each population were used as input for AMOVA along with a matrix of SSO type pairwise genetic distances. AMOVA then determined what portion of the total variance was attributable to variation among regions, among populations/within regions, or within populations.

RESULTS AND DISCUSSION

Intrapopulation distances (analogous to a coancestry coefficient) were generated for the subpopulations represented within the regional groups of Africans, Asians, and Europeans. Within Africa, the Senegalese and Nigerian populations were not significantly different from each other (p=0.059), while the Kenyan Mukogodo population was significantly different from both the Senegalese (p<0.001) and the Nigerians (p=0.001).

In Asia, Chinese and Malays were not significantly different (p=0.511), although Pakistanis were significantly different from both (p<0.001). These results are consistent with AMOVA analyses on a larger Asian data set (Melton and Stoneking, forthcoming) which show significant heterogeneity between western Asian and eastern Asian SSO types, but homogeneity between geographically proximal Asian populations.

European populations, defined to include two North American groups, were extremely homogeneous with respect to SSO types. Northeast U.S. and Midwest U.S. SSO types were virtually indistinguishable (p=0.986), even though each sample was collected using unknown strategies by a different organization. Both Midwest U.S. and Northeast U.S. SSO types were not significantly different from the French (p=0.521 and p=0.441, respectively).

All other pairwise population comparisons indicated no homogeneity (p<0.05). Pairwise population genetic distances were used to construct a neighbor-joining tree (Saitou and Nei 1987) to illustrate the relationships among populations described above (Figure 3). Populations cluster in the tree in concordance with the major geographic subdivisions; the genetic distance between the Pakistani population and European populations suggests a degree of relatedness with respect to SSO types.

AMOVA evaluated the three clusters of regional populations in three ways (Table 2). An analysis among all nine populations yielded variance components indicating that 82% of the variation is attributable to the SSO type variability within the populations, while 18% of the variation is attributable to differences among the populations. When the nine populations were lumped into three continental groups (Africa, Asia, Europe), then the among groups variance component was 20%, with 80% of the variation attributable to the within-group variability. A nested analysis that also included the three distinct subpopulations within each continental group showed that variance among continents was 18%, among populations within continents was 4%, and within populations was 78%. All variance components were significant (p<0.001), demonstrating the existence of heterogeneity. Therefore, while most of the variation is due to the large amount of variability within populations themselves, a moderate amount of the variation is due to the continental groupings, while within those large regions, the population differences account for a small amount of the variance. In other words, each population contains about 78% of the average total variation, while, the average differences between populations are largely due to the African-Asian-European division. Figure 4 shows the null distribution generated in the among populations/within groups component (3.89%). In order to test significance, the 814 SSO types were assigned randomly to 9 populations of sizes identical to those in this nested analysis and the variance components were recalculated 1000 times for different random assignments. The probability of observing the value which was obtained in the nested analysis (0.085) was found to be 0.001, indicating that statistically significant heterogeneity is present for subpopulations within each of the major geographic groups.

Lewontin (1972) found the most genetic diversity among individuals within populations in a review of classical loci (85.4%). However, the apportionment of variation ascribed to subpopulations of major ethnic groups (racial groups) was 8.3%, while variation among ethnic groups was 6.3% of the total. For the data set of mitochondrial DNA types in this study, the portion of variation due to major grouping is nearly three times that (18%), while the variance due to differences among subpopulations within these major groups is less than 4% of the total. These results are consistent overall with other investigators' studies of nuclear loci, including VNTRs (Devlin et al. 1993).

Adapting AMOVA to work with more than 255 haplotypes will allow analysis of the 2013 individuals typed from 27 populations (>450 SSO types). However, including more types and populations is not expected to vastly change the conclusions of this study,. Although subpopulation heterogeneity is observed in continental samples, the portion of variation is expected to be small in comparison to the variation present which distinguishes continents. Thus far, AMOVA has been a useful tool for studying the population genetics of mtDNA SSO types.

ACKNOWLEDGMENTS

We gratefully acknowledge the contribution of samples from China, Malaysia, and Pakistan by N. Saha (National University Hospital, Singapore). Samples from France and the midwestern U.S. were provided by Mark Batzer (Human Genome Center, Laurence Livermore National Laboratory), and from the northeastern U.S. by the Pennsylvania State Crime Lab. Lee Cronk (Texas A&M University) provided the Kenyan Mukogodo samples. This project was supported by NIJ grant 92-IJ-CX-K040 to Mark Stoneking.

REFERENCES

Cann R.L., Stoneking M. and Wilson A.C. (1987) Mitochondrial DNA and human evolution. Nature 325: 31-36.

Cockerham C.C. Analyses of gene frequencies. Genetics 74:679-700.

Cockerham C.C. Variance of gene frequencies. Evolution 23:72-84.

Connor A. and Stoneking M. (1994) Assessing ethnicity from human mitochondrial DNA types determined by hybridization with sequence-specific oligonucleotides. J. Forensic Sci. 39:1360-1371.

Devlin B., Risch N. and Roeder K. (1993) Statistical evaluation of DNA fingerprinting: a critique of the NRC's report. Science 259:748-749,837.

DiRienzo A. and Wilson A.C. (1991) Branching pattern in the evolutionary tree for human mitochondrial DNA. Proc. Natl. Acad. Sci. U.S.A. 88:1597-1601.

Excoffier L., Smouse P.E. and Quattro J.M. (1992) Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131:479-491.

Ginther C., Issel-Tarver L. and King M.-C. (1992) Identifying individuals by sequencing mitochondrial DNA from teeth. Nature Genetics 2:135-138.

Graven L., Passarino G., Semino O., Boursot P., Santachiara-Benerecetti, Langaney A. and Excoffier L. (1995) Evolutionary correlation between control region sequence and restriction polymorphisms in the mitochondrial genome of a large Senegalese Mandenka sample. Mol. Biol. Evol. 12:334-345.

Holland M.M., Fisher D.L., Mitchell L.G., Rodriquez W.C., Canik J.J., Merril C.R. and Weedn V. (1993) Mitochondrial DNA sequence analysis of human skeletal remains: identification of remains from the Vietnam War. J. Forensic Sci. 38:542-553.

Kocher T. and Wilson A.C. (1991) Sequence evolution of mitochondrial DNA in humans and chimpanzees: control region and a protein-coding region. In: Evolution of life: fossils, molecules, and culture. Osawa S. and Honjo T. (eds.). New York:Springer-Verlag, 391-413.

Lewontin R. (1972) The apportionment of human diversity. Evolution 6:381-398.

Melton T., Peterson R., Redd A.J., Saha N., Sofro A.S.M., Martinson J. and Stoneking M. (1995) Polynesian genetic affinities with southeast Asian populations as identified by mtDNA analysis. Am. J. Hum. Genet. 57:404-414.

Melton T. and Stoneking M. Extent of heterogeneity in mitochondrial DNA for ethnic Asian populations. J. Forensic Sci. (forthcoming).

Saitou N. and Nei M. (1987) The neighbor-joining method: a new method for constructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.

Stoneking M., Hedgecock D., Higuchi R.G., Vigilant L. and Erlich H.A. (1991) Population variation of human mtDNA control region sequences detected by enzymatic amplification and sequence-specific oligonucleotide probes. Am. J. Hum. Genet. 48:370-382.

Stoneking M., Jorde L.B., Bhatia K. and Wilson A.C. (1990) Geographic variation in human mitochondrial DNA from Papua New Guinea. Genetics 124:717-733.

Stoneking M., Melton T., Nott J., Barritt S., Roby R., Holland M., Weedn V., Gill P., Kimpton C., Aliston-Greiner R. and Sullivan K. (1995) Establishing the identity of Anna Anderson Manahan. Nature Genetics 9:9-10.

Stoneking M. and Melton T. Forensic applications of mitochondrial DNA analysis. In: Forensic Applications of PCR. Budowle B. (ed.). (forthcoming).

Vigilant L., Pennington R., Harpending H., Kocher T.D. and Wilson A.C. (1989) Mitochondrial DNA sequences in single hairs from a southern African population. Proc. Natl. Acad. Sci. U.S.A. 86:9350-9354.

Vigilant L. (1986) Control region sequences from African populations and the evolution of human mitochondrial DNA. PhD dissertation, University of California-Berkeley.

Ward R.H., Frazier B.L., Dew-Jager K. and Pääbo S. (1991) Extensive mitochondrial diversity within a single Amerindian tribe. Proc. Natl. Acad. Sci. U.S.A. 88:8720-8724.

Weir B.S. and Cockerham C.C. (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370.

Wilson M.R., Stoneking M., Holland M.M., DiZinno J.A. and Budowle B. (1993) Guidelines for the use of mitochondrial DNA sequencing in forensic science. Crime Lab. Dig. 20:68-77.


Table 1. Populations and sample sizes in this study.

POPULATION N
African 157
Kenya 28
Nigeria 13
Senegal 116
Asian 257
China 103
Malay 81
Pakistan 73
European 400
France 81
Northeast United States 129
Midwest United States 190
TOTAL 814

Table 2. AMOVA analysis of SSO type variation for 814 individuals from 9 worldwide populations.

Analysis

Variance Among Groups

Variance Within Groups

Variance Among Populations / Within Groups

Variance Among Populations

Variance Within Populations

9 populations

-

-

-

18.32%

81.68%

3 groups *

20.16%

79.84%

-

-

-

Nested Analysis

18.56%

-

3.89%

-

77.55%

* Africans, Asians and Europeans.


Figure 1. Locations and overall sample sizes of SSO-typed populations

Figure 2. Example showing how a genetic distance is calculated between to SSO variants in region IA

The box indicates the sequence region which is hybridized by the probes. The number of nucleotide differences is indicated on the right for each pairwise comparison. The difference between any "blank" type and any other type is always given a value of 1.

Figure 3. NJ tree of nine SS0-typed populations

Figure 4. Null distribution of the variance component (among populations/within groups).


Go to proceedings home page