Catalog  |  Cart  |  Log In

 

Clarification of Additional Issues Regarding Statistics and Population Substructure Effects on Forensic DNA Profile Frequency Estimates

B. Budowle and K.L. Monson
Forensic Science Research Unit, Laboratory Division, FBI Academy, Quantico, VA 22135


Krane et al. (1992) published an article which purported, among other things, that the rarity of a person's DNA profile might be underestimated if a database to which he did not belong were used, even though the alternate database were within the same broad "racial/ethnic" population group. They specifically evaluated the frequency of Finnish profiles estimated using an Italian database, and vice versa. Budowle et al. (1994a) reanalyzed the Krane et al. (1992) data and demonstrated, among other things, that the observations by Krane et al. (1992) are attributable to sampling error.

A letter to the Editor of The American Journal of Human Genetics from S. Sawyer (Washington University), D. Krane (Wright State University), A. Podlecki (Washington University) and D. Hartl (Harvard University) was provided to us by the Editor on 2/28/95. Sawyer et al. attempted in their letter to rebut the findings in the study by Budowle et al. (1994a). Their assertions have no basis and were addressed by FBI scientists on 3/25/95 (i.e., the date the response was sent to the Journal). On 8/14/95 (approximately five months later), we were informed by the Editor that the Journal has yet to determine whether or not to publish the two letters. However, the Editor also conveyed, that since the reception by the Journal of the FBI's response to the Sawyer et al. letter, Sawyer and his colleagues had revised their letter twice and now only the third and final version of their letter should be addressed. To date (almost an additional two months) no copy of any of the revised letters has been provided to us. This seems an unusual practice for addressing and accepting letters to the Editor, but under usual circumstances one might be resigned to the editor's prerogative. However, the points raised by Sawyer et al. have been raised in the courtroom by experts (including some of the authors of the Sawyer et al. letter). Moreover, some authors (e.g. Balding and Nichols 1995) still rely on the Krane et al. (1992) study even though it has been demonstrated soundly that the findings by Krane et al. are due to statistical artifacts (Budowle et al. 1994a). Therefore, it is important to inform the forensic community of the issues raised by Sawyer and his colleagues (both in their first letter-the only copy to which we have access-and in court) and the responses that demonstrate that their assertions are erroneous. In addition, some issues not covered by the reanalysis of Krane et al. 's data by Budowle et al. (1994a) will be addressed. For this paper the issues raised in the Sawyer et al. letter will be considered a personal communication (especially since they have been proffered by some in the courtroom-see People v. Pizarro, 10 CAL. App. 4th 50 (1992), letter from Hartl to Blaiser dated 10/10/94). We are confident that Sawyer and his colleagues would not hesitate to provide a copy of their letter and/or the Hartl letter to interested parties, since it represents their opinions. In addition, one should read the articles by Krane et al. (1992) and Budowle et al. (1994a) to familiarize oneself on the prior arguments regarding population substructure effects (based on Finns and Italians) on forensic DNA profile frequency estimates.

Sawyer et al. maintain that Krane et al. 's (1992) results were not statistical artifacts, contrary to the findings by Budowle et al. (1994a). In fact, Budowle et al. (1994a) demonstrated that the method by Krane et al. (1992) of leaving the target profile in their cognate database induces a large bias in the number of times the ratio of profile probabilities (cognate/noncognate) was greater than one-or any other arbitrarily greater bound one wishes to choose. Sawyer et al. 's new arguments about statistical significance do not alter that fact.

It should be well-understood that no human geneticist would dispute that Finnish and Italian populations are genetically differentiated. The point at issue for forensic DNA analyses is not whether or not Finns and Italians are different, but the degree of differentiation between relevant databases. Sawyer et al. imply that Budowle et al. (1994a) claimed the degree of differentiation between subgroups is zero, while they suggest that the point of the Krane et al. (1992) paper was that the two populations are significantly different. Their characterizations do not accurately represent the major points of either Krane et al. (1992) or Budowle et al. (1994a).

In fact, Sawyer et al. assert now that the major point of Krane et al. (1992) was to demonstrate statistically significant differences between Finns and Italians. If that had been their point, and had they done the statistical analyses correctly, there would have been no reason to write our paper (Budowle et al. 1994a). Moreover, the points made in our paper apply equally well to Sawyer et al. 's new analyses.

Budowle et al. (1994a) made four points:

  • The degree of differentiation between Finnish and Italian populations (which has been found to be small for subpopulations when assessed by FST or other conventional population genetics measures, e.g. Morton 1992), cannot be ascertained by the analysis conducted by Krane et al. (1992) because of the demonstrable bias induced by their analyses.
  • Consequently, there is no scientific basis for Krane et al. 's (1992) assertion that their analyses substantiate the need for the use of the National Research Council's (1992) ceiling principle
  • A comparison of Finns and Italians, estimated from quite small databases, does not represent most population substructure in the United States or describe forensic practices in the United States.
  • The fixed bin method, employed by most forensic laboratories in North America, is far more conservative than Krane et al. 's (1992) ±2.5% floating window.

Lewontin and Hartl (1991) posited that "the proper approach [for comparisons] is the straight forward one of sampling individual subgroups and examining the differences in the genotype frequencies among them" (emphasis added). Such studies have shown little difference in estimates (Budowle et al. 1994b, 1994c; Chakraborty and Kidd 1991; Devlin and Risch 1992b; Hartmann et al. 1994; Weir 1992). Although citing Lewontin and Hartl (1991), Krane et al. (1992) and Sawyer, Hartl and colleagues tacitly recant this position and now assert that differences in allele frequencies will result in product rule estimates that on the average will bias against a defendant. A simple example will illustrate the consequences of this subtle change in position. Krane et al. (1992) observed the "most pronounced" differences between Finns and Italians at the D10S28 locus were four-fold differences for two DNA fragments. The allele frequency estimates for the 5.43 kb fragment were 0.134 and 0.034, respectively, and for the 3.94 kb fragment the frequencies were 0.037 and 0.137, respectively. While these allele frequencies between the two sample populations might be significantly different, the difference in the heterozygous genotype frequency for these two allelic fragments would be negligible: 9.9 x 10-3 for Finns and 9.3 x 10-3 for Italians. Genotype frequency estimates demonstrate the minimal effects of the use of different databases for forensic purposes (Chakraborty and Kidd 1991; Budowle et al. 1994b, 1994c; Lewontin and Hartl 1991; Weir 1992). Of course, there can be situations where the differences between the genotype frequencies may be greater, but when multiplying across several loci the effects are minimized (Weir 1992).

To support their contention that Finns and Italians are significantly different at several VNTR loci, Sawyer et al. again apply the methodology first suggested by Lewontin and Hartl (1991) and then used by Krane et al. (1992). They calculate, as long as one has faith in their analyses and in their databases, that the ratios of cognate/noncognate three locus profile probabilities are significantly different, at least for some portions of the distribution. Their readers are invited further to believe that because some of the ratios of profile probabilities are large that the populations are very different. Their argument is unconvincing.

Sawyer et al. fail to recognize that their procedures, combined with the small sized databases they chose to analyze, creates a hopeless tangle of real population differences and methodological bias. The differences they report are due partially to real differences between the sample populations, but most are due undoubtedly to sampling error (see Devlin et al. 1994), and some are due to inherent problems in their databases (see discussion below). Within the scope of their methods the ratios can be interpreted such that there are not substantial differences between Finns and Italians. Consider these facts: for 1000 simulation experiments in which Budowle et al. (1994a) generated pseudopopulations representing "Finns and Italians" by sampling from a large U.S. Caucasian population database from Krane et al. (1992), 73.4% of the probability ratios were greater than one; from the single sample of Finns and Italians, Krane et al. (1992) found that 77.0% of the ratios were greater than one; while from 1000 simulation experiments, generated by Budowle et al. (1994a) using African American and U.S. Caucasian databases as proxies for Finns and Italians, 93.1% of the ratios were greater than one. Comparing the differences in ratios between each "subgroup" and the general groups, would this suggest that Finns and Italians are only about one-fifth, i.e., (77.0-73.4)/ (93.1-73.4), as differentiated as African Americans and U.S. Caucasians? Sawyer et al. want to concentrate on ten-fold or greater ratios. In this situation, for the simulation experiments sampling from a Caucasian population, 12.8% of the probability ratios were greater than ten; from the single sample of Finns and Italians, 30.1% of the ratios were greater than ten; while for the simulation experiments using African American and Caucasian databases as proxies, 75.4% of the ratios were greater than ten. Perhaps one should interpret these findings as Finns and Italians are only about one-fourth, i.e., (30.1-12.8)/ (75.4-12.8), as differentiated as African Americans and Caucasians. Or should the difference between one-fifth and one-fourth be considered?

Such interpretations may be tempting to support the claim that Finns and Italians are not substantially different, particularly in light of demonstrations that African Americans and Caucasians show a high degree of similarity in terms of profile probabilities and FST-type statistics (Morton et al. 1993; Weir 1992; Weir 1995). Standard population genetic methods and the studies cited above show that Caucasian subpopulations are indeed substantially less differentiated than are "racial" populations, which themselves are substantially similar at VNTR loci. Some of these standard analyses explicitly account for sampling error, and most, if not all of them have been based on larger databases. Consequently, such studies lend themselves to straightforward interpretations.

Moreover, proper analyses of the impact of population heterogeneity (Li 1969; Chakraborty and Kidd 1991; Devlin and Risch 1992; Morton 1992; Budowle et al. 1994b, 1994c; Devlin et al. 1993a; Chakraborty 1993) show that errors introduced into profile probability estimates are not large; these are all contrary to Sawyer et al. 's suggestion. Sawyer et al. 's assertions, like those of Krane et al. (1992), and Lewontin and Hartl (1991) before them, lack scientific basis.

Generally, forensic scientists use major population group databases and invoke independence within and between the loci (practices substantiated by Brookfield 1992, 1994; Budowle et al. 1994b, 1994c; Chakraborty and Kidd 1991; Chakraborty et al. 1994; Devlin and Risch 1992a, 1992b; Li and Chakravarti 1994; Morton 1992; Morton et al. 1993; Risch and Devlin 1992; Roeder 1994; Weir 1992). Therefore, it is misleading to suggest, as Krane et al. (1992) did and Sawyer et al. do now, that the evaluation of profile frequency ratios between Finns and Italians accurately reflects either population substructure or the consequences for forensic practices in the United States. The ratio of genotype frequencies between Finns and Italians is not a measure of the potential error that might occur in the general forensic case. Major population databases are used for the majority of cases in the United States, because usually there is no reason to suppose that a single subpopulation is the only possible source of the evidentiary material. The proper comparison under the unlikely single subpopulation scenario would be between the subpopulation and the database used by the forensic scientist. These ratios would be less, and the profile probabilities rarely would be substantially different (see Budowle et al. 1994b, 1994c; Chakraborty and Kidd 1991; Roeder 1994; Roeder et al. 1995; Weir 1992).

Sawyer et al. also impute that the findings by Krane et al. (1992) of multiple three-probe matches were not discussed by Budowle et al. (1994a). Although, Budowle et al. (1994a) did not discuss the issue because it was not relevant to the paper, the opportunity to discuss the implications of the matches now is welcomed. Krane et al. (1992) found two pairs of matching three-locus profiles in their Italian database consisting of 70 three-locus profiles. That the number of three-locus matches exceeds what would be expected in a similar size sample consisting entirely of randomly chosen sib pairs should give one pause to question Krane et al. 's (1992) sampling.

Further, a misleading portion of the Krane et al. (1992) study addressed in the 1994 Hartl to Blaiser letter is the comparison of their observed three-locus matching rate with the observed rate in FBI databases. In Krane et al. 's (1992) large U.S. Caucasian database (containing 1349 complete three-locus profiles), there were two matches in 909,226 pairwise comparisons. This observation agrees well with the expected three-locus rate of 1.4 (expected value was calculated from the observed single locus matching rates; Devlin, personal communication). Interestingly, Krane et al. (1992) never reported the expected values in their paper. Instead, they compared their observed three-locus matching rate with a smaller three-locus matching rate for FBI databases (1 in 7.6 x 106 pairwise comparisons) and implied that there was a problem with the FBI databases. Obviously, their discrepancy disappears when their own observed and expected matching rates are compared, and the observed and expected matching rates show good agreement within each database. The different rates between databases should not be unexpected and apparently are due to the different VNTR loci being compared between databases and the different restriction enzymes used to digest the DNA.

Besides the shortcoming described above and in Budowle et al. (1994a), there is evidence that the Krane et al. (1992) data contained data transcription errors, which resulted in an erroneous interpretation of substantial population heterogeneity (see Devlin et al. 1993b; Roeder 1994) -a point that Sawyer et al. neglect to convey. Perhaps little should be gleaned from the Krane et al. (1992) data.

Sawyer et al. also object to a statement by Budowle et al. (1994a) that there would be unusually large correlation of alleles in their small VNTR databases. Previously, Lewontin and Hartl (1991) stated that "statistical tests for HWE [Hardy-Weinberg expectations] are virtually useless as indicators of population substructure." Presumably, they believed that many correlations could not be detected by HWE tests. Apparently, Sawyer, Hartl and colleagues now disclaim that position; they employ HWE tests to assert that there are not large correlations. If they truly believe HWE tests are sufficiently powerful for their small databases, then one would expect that they believe the use of HWE tests are valid for analyses of larger databases, given that power is a function of sample size. For succinctness, the point will be expressed in terms of covariances; correlations follow directly. Assuming HWE, the expected covariance between alleles i and j in a database of size n alleles is -piipij/n where, pi indicates probability. Because of unobserved (rare) alleles, the observed allele frequencies are known to be upwardly biased in small databases consisting of highly polymorphic loci, so the covariances among observed alleles (-pipj/n) also will be inflated over their expected values. Naturally, these biases become larger as the sample size diminishes. Also, the significance at one pairwise linkage equilibrium test was not deemed particularly noteworthy by Sawyer et al. Therefore, Sawyer et al. now appear to endorse that observing no more departures than would be expected is acceptable for independence tests between loci.

Another statistical claim put forth by Sawyer et al. is that "in the FBI's own 'worldwide survey' of binned (emphasis added) VNTR frequencies (1993), approximately 70% of the comparisons between Caucasian subpopulations are statistically significant," which they assessed by standard contingency table analysis. This argument, like their Finn/Italian experiment, is irrelevant. As discussed earlier, for forensic applications it is the degree of differentiation that is important. However, it is puzzling that Sawyer et al. appear unaware of the statistical literature. For even moderately large sample sizes, standard contingency table analysis, as employed by Sawyer et al., exhibits extreme sensitivity to small perturbations (i.e. it frequently rejects the null hypothesis of no difference even if the difference is of little consequence). This is an often-discussed topic in statistics. For recent work on this issue, readers may consult Rudas et al. (1994). Traditional population genetic approaches that describe the amount of heterogeneity among populations are much more informative than are significance tests. Budowle (1995) found, for loci less polymorphic than VNTR loci, that an FST estimate (over all loci) was approximately 0.003 for subgroups as different as French Basques and Israelis (FST estimates would be expected generally to be larger for VNTR loci). This estimate is similar to those described by Weir (1994) for U.S. geographic samples for VNTR loci. Additionally, Weir (1992) observed that the few differences in bin frequencies had little impact on final profile frequency estimates between subgroups.

Sawyer et al. declare that they requested, in a letter dated October 26, 1994, more information on the bins used and referenced in the Budowle et al. (1994a) study. They protest that the lack of information on the dimensions of fixed bins hampered their ability to perform statistical analyses adequately. No letter was received. Usually, when someone requests information and does not receive an answer in a reasonable time, he/she usually follows up with a second request. However, this unfortunate communication failure did not preclude Sawyer et al. from performing their own analyses on bin data. Sawyer et al. are obviously familiar with the bin dimensions; they state in the last paragraph of their letter that they compared binned data using the worldwide survey of binned VNTR frequencies (Worldwide Study 1993). Each page of tabular bin frequency data contains the bin dimensions, which are the same on each page and are the same as described in Budowle et al. (1991). Finally, Hartl and Krane must be familiar with the fixed bin dimensions; both have testified in several court cases where fixed bin data have been used.

Sawyer et al. claim that "forensic significance" is a slang phrase. A forensically significant difference has been defined as the difference in the estimates would be considered substantially different (Chakraborty and Kidd 1991). Interestingly, similar phraseology was deemed appropriate by Hartl and Lewontin (1993) when they employed the terminology "biologically significant" in lieu of "forensically significant". They suggest that it should be left "to the reader to judge" what is "biologically significant." Lay persons, i.e. jurors, can readily understand, from their own experiences, what is substantially different and what is not. Based on extant data, general databases are appropriate and would not yield undue bias. Hartl (1994) has queried, "guess who decides whether differences are meaningfully different?" It is obvious that responsibility belongs to the jury.

Finally, when Krane et al. (1992) compared the interim ceiling estimates (National Research Council 1992) with the product of allele frequencies (where allele frequencies were determined using a ±2.5% floating window), the ceiling approach provided a conservative estimate usually by a factor of ten or more. They concluded that "the interim ceiling principle has a sufficient margin of safety" for subpopulation concerns. As noted above, their subpopulation concerns have no scientific basis for forensic applications. Nevertheless, for heuristic purposes, Budowle et al. (1994a) compared standard fixed bin methods to the ±2.5% floating window approach advocated by Krane et al. (1992). In Table 5 of Budowle et al. (1994a) it was demonstrated that the fixed bin method, and particularly the rebinning approach, yielded the same level of conservativeness as their methods. Now, Sawyer et al. seem to be advocating that the degree of conservatism generated by their group, using methods they endorse, has a sufficient margin of safety; however, their concerns cannot be mollified (or ameliorated) when the same degree of conservatism is generated, but by another group, e.g. the FBI. There does not appear to be any justification, scientific or otherwise, for such a position.

This is publication number 96-01 of the Laboratory Division of the Federal Bureau of Investigation. Names of commercial manufacturers are provided for identification only, and inclusion does not imply endorsement by the Federal Bureau of Investigation.

REFERENCES

Balding D.J. and Nichols R.A. (1995) A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96:3-12.

Brookfield J. (1992) The effect of population subdivision on estimates of the likelihood ratio in criminal cases using single-locus DNA probes. Heredity 69:97-100.

Brookfield J. (1994) The effect of relatives on the likelihood ratio associated with DNA profile evidence in criminal cases. J. Forensic Sci. Soc. 34:193-197.

Budowle B., Giusti A.M., Waye J.S., Baechtel F.S., Fourney R.M., Adams D.E., Presley L.A., Deadman H.A. and Monson K.L. (1991) Fixed-bin analysis for statistical evaluation of continuous distributions of allelic data from VNTR loci, for use in forensic comparisons. Am. J. Hum. Genet. 48:841-855.

Budowle B., Monson K.L. and Giusti A.M. (1994a) A reassessment of frequency estimates of Pvu II-generated VNTR profiles in a Finnish, an Italian , and a general United States Caucasian database: No evidence for ethnic subgroups affecting forensic estimates. Am. J. Hum. Genet. 55:533-539.

Budowle B., Monson K.L., Giusti A.M. and Brown B.L. (1994b) The assessment of frequency estimates of Hae III-generated VNTR profiles in various reference databases. J. Forensic Sci. 39:319-352.

Budowle B., Monson K.L., Giusti A.M. and Brown B.L. (1994c) Evaluation of Hinf I-generated VNTR profile frequencies determined using various ethnic databases. J. Forensic Sci. 39:988-1008.

Budowle B. (1995) The effects of inbreeding on DNA profile frequency estimates using PCR-based loci. Genetica (forthcoming).

Chakraborty R. and Kidd K.K. (1991) The utility of DNA typing in forensic work. Science 254:1735-1739.

Chakraborty R. (1993) NRC Report on DNA typing. Science 260:1059-1060.

Chakraborty R., Zhong Y., Jin L. and Budowle B. (1994) Nondetectability of restriction fragments and independence of DNA fragment sizes within and between loci in RFLP typing of DNA. Am. J. Hum. Genet. 55:391-401.

Devlin B. and Risch N. (1992a) A note on Hardy-Weinberg equilibrium of VNTR data using the FBI's fixed bin method. Am. J. Hum. Genet. 51:549-553.

Devlin B. and Risch N. (1992b) Ethnic differentiation at VNTR loci, with special reference to forensic applications. Am. J. Hum. Genet. 51:534-548.

Devlin B., Risch N. and Roeder K. (1993a) NRC Report on DNA typing. Science 260:1057-1059.

Devlin B., Krontris T. and Risch N. (1993b) Population genetics of the HRAS1 minisatellite locus. Am. J. Hum. Genet. 53:1298-1305.

Devlin B., Risch N. and Roeder K. (1994) Comments on statistical aspects on NRC's Report on DNA typing. J. Forensic Sci. 39:28-40.

Federal Bureau of Investigation (1993) VNTR Population Data: A Worldwide Study, Volumes I-IV.

Hartl D.L. (1994) Forensic DNA typing dispute-Letter. Nature 372:398-399.

Hartl D.L. and Lewontin R.C. (1993) Response to Devlin et al. Science 260:473-474.

Hartmann J., Keister R., Houlihan B., Thompson L., Baldwin R., Buse E., Driver B. and Kuo M. (1994) Diversity of ethnic and racial VNTR RFLP fixed-bin frequency distributions. Am. J. Hum. Genet. 55:1268-1278.

Krane D.E., Allen R.W., Sawyer S.A., Petrov D.A. and Hartl D.L. (1992) Genetic differences at four DNA typing loci in Finnish, Italian, and mixed Caucasian populations. Proc. Natl. Acad. Sci. U.S.A. 89:10583-10587.

Lewontin R.C. and Hartl D.L. (1991) Population genetics in forensic DNA typing. Science 254:1745-1750.

Li C.C. (1969) Population subdivision with respect to multiple alleles. Am. J. Hum. Genet. 33:23-29.

Li C.C. and Chakravarti A. (1994) DNA profile similarity in a subdivided population. Human Heredity 44:100-109.

Morton N.E. (1992) Genetic structure of forensic populations. Proc. Natl. Acad. Sci. U.S.A. 89:2556-2560.

Morton N.E., Collins A. and Balazs I. (1993) Bioassay of kinship for hypervariable loci in Blacks and Caucasians. Proc. Natl. Acad. Sci. U.S.A. 90:1892-1896.

Risch N. and Devlin B. (1992) On the probability of matching DNA fingerprints. Science 255:717-720.

Roeder K. (1994) DNA fingerprinting: A review of the controversy. Statistical Science 9:222-278.

Rudas T., Clogg C.C. and Lindsey B.G. (1994) A new index of fit based on mixture methods for the analysis of contingency tables. J. Roy. Stat. Soc. (Series B) 56:623-639.

Weir B.S. (1992) Independence of VNTR alleles defined by fixed bins. Genetics 130:873-887.

Weir B.S. (1994) Effect of inbreeding on forensic calculations. Ann. Rev. Genet. 28:597-621.


Go to proceedings home page