Next-generation sequencing (NGS) is a powerful technology that can provide forensic DNA profiles compatible with current databases, plus deliver additional levels of genetic data that will open new doors to investigations, mixture interpretation, missing persons cases, and more. The earliest published reports on the use of NGS for analysis of forensic markers were as recent as 2011–12
. Since this time, greater than 60 peer-reviewed studies using NGS laboratory methods and/or custom software tools have been reported for forensic examination of short tandem repeats (STRs), mitochondria and single nucleotide polymorphisms (SNPs). A full list can be viewed online at: www.genomicidlab.com/opendata
Promega Corporation developed an autosomal STR kit optimized for downstream sequencing with NGS called PowerSeq® Auto System that amplifies 23 loci, including the full CODIS extended STR panel. Developmental validation studies with PowerSeq® Auto System have been performed, demonstrating backwards compatibility to existing STR databases and extremely high sequence diversity in some CODIS core loci
. This sequence data may be used to augment STR analysis in samples for which capillary electrophoresis methods are inconclusive (e.g., degradation and mixtures) or additional information could assist the interpretation and reporting (e.g., familial searching and kinship). PowerSeq® Auto System has undergone further development to include Y-chromosome STRs and the control region of the mitochondria. This multiplex forensic NGS kit is called PowerSeq® Auto/Mito/Y System and features small amplicons (129–303bp), high sensitivity (~100pg DNA) and data for three forensic panels (22 autosomal STRs, 23 Y-STRs and 10 amplicons covering the mitochondrial and amelogenin control region). Herein we report on the evaluation of PowerSeq® Auto/Mito/Y System for analysis of reference samples.
Materials and Methods
One-half nanogram of each single source genomic DNA from Standard Reference Materials SRM2391c (National Institute of Standards and Technology) and 2800M Control DNA (Promega Corporation) were amplified using PowerSeq® Auto/Mito/Y, according to manufacturer’s protocol. Five hundred nanograms of column-purified amplification product were used to construct Illumina sequencing libraries with KAPA Hyper Prep Kit (Kapa Biosystems) using barcoded adapters. Individual libraries were quantified using KAPA Library Quantification kit (Kapa Biosystems), pooled without normalization and diluted to 4nM. Pooled libraries were sequenced (300bp single-end) with Illumina MiSeq (NC State University Genomic Sciences Laboratory) using MiSeq Reagent kit V2 (Illumina). Raw data (FASTQ) were generated for each indexed sample and may be downloaded at: www.genomicidlab.com/opendata
FASTQ files were adapter- and quality-end-trimmed using Trimmomatic v0.33 single-end module
with the following arguments: phred33, SLIDINGWINDOW:4:15, MINLEN:40. Autosomal and Y-chromosome STR data were analyzed using a modified version of STRaitRazor 2.0
in the authors’ custom Forensic Cloud Environment called Altius (Amazon Web Services). Fragment allele data for autosomal STRs were used to calculate Random Match Probabilities (RMPs) with the Federal Bureau of Investigation extended-set allele frequencies
according to the NRC II guidelines
. Mitochondrial data were analyzed with CLCBio Genomics Workbench v8.0.2 (Qiagen) using published parameters
. Variants to the rCRS
within the control region were identified using the software’s variant caller and manually reviewed. Mitochondrial haplotype frequencies and haplogroup estimation were generated using EMPOP3
(n=26,127), and haplogroup estimations were confirmed using Phylotree Build 16
FASTQ data from two separate sequencing runs demonstrated that each sample provided high quality sequence information (92.58% +/– 0.1 reads passed filter), and less than 1% of the sequence reads were removed due to quality filtering. Data sets of 250,000 reads were analyzed using Altius, and each data file was processed in less than 2 minutes. For each sample 44–56% of the reads were identified as matching autosomal or Y-STRs, while the remainder of reads were aligned to the control region of the mitochondrial genome. Interlocus and heterozygous intralocus balance and sensitivity for STRs was similar to previously reported values
. Thus, the PowerSeq® Auto/Mito/Y System reproducibly generated high-quality data for reference samples when using NGS workflows.
For the four reference samples evaluated, full autosomal STR profiles were produced for 22 loci (Table 1). Further, the fragment sizes and sequences were concordant with capillary electrophoresis data as reported in the SRM2391c Certificate of Analysis (NIST), and fragment sizes were concordant with 2800M product literature (Promega). For sample SRM2391c Component A, the D2S441 marker was observed to be homozygous by fragment size but heterozygous by sequence (Tables 1 and 2). Further, within this cohort, four autosomal STR loci (D2S441, D8S1179, D12S391 and vWA) gave instances of shared length (fragment) alleles that could be distinguished as separate sequence alleles (Table 2).
Table 1. Summary of PowerSeq Auto/Mito/Y System for Four Reference Samples
Table 2. Shared Fragment Alleles with Sequence Differences.
The RMP was calculated for all 22 autosomal STRs based on fragment size and provided discrimination ranging from 1E-29 to 1E-37, dependent on population group. Note, the sequence allele frequencies are currently in development. Thus, the discriminatory power is expected to increase when the RMP is calculated using sequence allele frequencies of appropriate databases. This increase in power may be especially valuable when analyzing samples with partial profiles or interpreting mixtures with likelihood ratios.
For the male samples, SRM2391c Component B, SRM2391c Component C and 2800M Control DNA, full Y-STR profiles for 23 loci were produced for both size and sequence. Further, the fragment size and sequence data were concordant with the certified haplotypes in the SRM2391c Certificate of Analysis (NIST). So, the PowerSeq® Auto/Mito/Y System may be used to generate full fragment and sequence profiles for autosomal and Y-STRs in casework applications such as familial searching.
To date, forensic mitochondrial DNA data are often underutilized by most forensic laboratories. Here we show that the PowerSeq® Auto/Mito/Y System concurrently generates high-resolution, complete coverage of the mitochondrial control region (Figure 1). Four distinct mitochondrial haplotypes were observed in our reference samples (Table 1). The mitochondrial control region haplotypes were concordant with Sanger sequencing-based analysis of the samples (data not shown). Further, haplogroup estimations and population frequencies could be determined from the data, providing an additional level of information from the reference samples.
Figure 1. Coverage across the control region (16024–576) Sample 2391c component A.
Forensic laboratories not currently generating mitochondrial DNA data may benefit from the additional information generated with the PowerSeq® Auto/Mito/Y System. The data gained might prove useful for samples with low quantities or fragmented DNA, as well as for cases of closed or small populations, extended kinship and missing persons identification.
The PowerSeq® Auto/Mito/Y System is a powerful new system that can be added to the forensic DNA analysis toolkit to help meet the needs of complex sample analysis. The ability to simultaneously analyze autosomal- and Y-STRs along with mitochondrial data from one sample will likely add value to many forensic casework, databasing, and missing persons laboratories. However, for this technology to be fully realized, some aspects need to be addressed. First, forensic sequence databases must be developed to statistically interpret profiles. We are currently addressing this need through a sequence population database to be presented to the Criminal Justice community late 2016 (NIJ Award 2015-DN-BX-K062). Second, laboratory methods need to be streamlined, optimized and validated in simple, low-cost workflows. Third, the software and analysis tools need to be fully developed and validated for the forensic laboratory. Lastly, NGS standards and guidelines need to be defined by governing bodies to allow implementation in accredited laboratory systems. Once these items are addressed, a system such as the PowerSeq® Auto/Mito/Y System can routinely be utilized by crime laboratories to maximize DNA data output from challenging forensic samples.
Portions of this work were funded by the Kenan Collaboratory Fund through the Kenan Institute for Ethics at Duke University and the NC State University Chancellors Faculty Excellence Program in Forensic Sciences.