Beyond Genetic Genealogy: Building Family Trees to Investigate Crime
By Ken Doyle
“Class! Class!” Diahan Southard raises her voice and claps twice, cutting through the buzz of conversation that fills the room.
Dutifully, her class responds. “Yes! Yes!”
It’s a technique that Southard has learned from her experience with raising three children and over 20 years of teaching. Her audience is broken up into small groups, gathered around tables in a ballroom at the Palm Springs Convention Center, California. Palm Springs is a trendy resort town in the Coachella Valley, in the middle of the Colorado desert and surrounded by mountains. It’s frequented by Hollywood stars and seems at odds with the deadly seriousness marking many of the sessions at the 30th International Symposium on Human Identification (ISHI 30), held in September 2019. Topics at the conference ranged from improving the response to mass fatalities to forensic taphonomy research, or the study of how human bodies decompose. Southard’s class, titled “Can You Solve Your Case Using Genetic Genealogy?” closed out the final day of the conference.
Diahan Southard's genetic genealogy workshop was presented at the 30th International Symposium on Human Identification, held at the Palm Springs Convention Center, California, September 23–26, 2019. Photo credit: Tara Luther
Genetic Genealogy and Forensics
Genetic genealogy combines the traditional discipline of genealogy with modern DNA analysis techniques. At a fundamental level, it builds genetic relationships among individuals—family trees—based on analysis of their DNA that determines how much of the genome two individuals share. The technique, by itself, isn’t new. What’s gained recent attention is the use of genetic genealogy by law enforcement to solve crimes, known as investigative genetic genealogy or forensic genetic genealogy (FGG). That application was the focus of Southard’s class.
The case that put the spotlight squarely on FGG had remained unsolved for over forty years. A serial killer and rapist, known by several names but dubbed the Golden State Killer by the media, left a trail of victims across ten California counties from 1976 through 1986. Ultimately, it was FGG that delivered the breakthrough that helped Paul Holes, the lead investigator, crack the case in April 2018.
However, the publicity generated in the Golden State Killer case had an unintended consequence. Along with another case using FGG later that year, the media attention prompted GEDmatch, the genetic genealogy research site that Holes and his team used, to change its terms of service regarding the use of its data by law enforcement. Previously, over a million DNA profiles in GEDmatch had been automatically opted in for law enforcement use. With the change in policy, all DNA profiles were opted out by default, and people uploading their profiles had to specifically opt in to permit access to their data by law enforcement. As of October 1, 2019, only 163,000 out of 1.3 million users had opted in, making the database considerably less useful for FGG. Despite the policy change, the number of opt-ins is growing, thanks to appeals from key figures in the genetic genealogy field.
The Origin Story
Southard’s fascination with the subject of genetic genealogy has its roots in high school. Her biology teacher managed to scrounge up used pipettes and left-over supplies from a local laboratory, so that his students could develop hands-on technical experience. As a result, everyone in Southard’s biology class learned basic DNA cloning techniques.
“Having a background in science means that I know that science isn’t scary,” Southard says. She cites a fear of the underlying science as one of the biggest obstacles to those who are new to the field. In addition, she says, her undergraduate education taught her the importance of the scientific method. “You make a hypothesis and then generate data, without bias toward or against your hypothesis. Then you evaluate how your data fit into that hypothesis.”
“You make a hypothesis and then generate data, without bias toward or against your hypothesis. Then you evaluate how your data fit into that hypothesis.”
Further, Southard credits her high school English teacher for setting her on the career path that eventually led her to build her own genetic genealogy education and consulting company, Your DNA Guide. “He told all of us graduating seniors,” she says, “that the best thing we could do when we got to college was find a professor who was researching something we were interested in and get involved.” For Southard, that “something” turned out to be the archaeogenetics laboratory of Dr. Scott Woodward at Brigham Young University. “It’s studying the genetics of mummies…and dead things,” she explains. Her initial project involved analyzing teeth and bone samples from bodies found in an ancient Egyptian cemetery outside Cairo. The analysis was built on mitochondrial DNA testing to identify maternal family relationships. Although the group was able to obtain a large collection of mitochondrial DNA profiles, the challenge they faced was not having anything to use as a reference to build genealogical networks. The need for a treasure trove of DNA data, from across the world, soon became apparent.
Woodward’s research led to the formation of the Sorenson Molecular Genealogy Foundation (SMGF), named after local philanthropist James Sorenson, in 1999. “I still distinctly remember sitting in the basement of the Benson building (affectionately called the Fishbowl) on the BYU campus,” Southard recalls, “where Dr. Woodward explained how we could create a database of DNA and genealogy. At some point in the not-too-distant future, we would be able to tell where someone—anyone—came from, by just looking at their DNA.”
The SMGF began building the first genetic DNA database by collecting samples from students at Brigham Young University and developing their family trees. Soon, its efforts spread, and its reach grew to include samples from across the globe. While other college students were partying on the weekends, Southard says, she and her colleagues were traveling across the US and around the world, educating people about genetic genealogy and collecting blood samples from volunteers. “I would carry home a cooler of blood [samples] on the airplane,” Southard says, “and Monday morning, I was back in the lab.”
By 2012, the Sorenson Database contained over 100,000 DNA samples and familial pedigrees, encompassing 2.8 million genealogical records and 2.4 million genotypes. The public database contained both Y-chromosome data (for tracing paternal lineage) and mitochondrial DNA information (for tracing maternal lineage). Although the database also contained a repository of autosomal DNA information, those data were not made publicly available.
After the death of James Sorenson in 2008, enthusiasm for the project waned. The Sorenson Database was acquired by the genealogy company Ancestry.com in 2012, whose founders were also graduates of Brigham Young University. In 2015, use of the data generated negative media and public attention, as a result of a false lead in a 1996 case involving the murder of a young woman named Angie Dodge. As a result, Ancestry.com took down the Sorenson Database. They claimed it had been used by law enforcement in a manner that violated the principles on which the SMGF was established.
“When the database was taken down, it did feel like the end of an era,” Southard says. There was an understanding, during the data collection, that the information would always be free and available. Southard felt personally responsible for all of the samples she had collected, and all the volunteers whom she had encouraged to participate in the project.
From Genealogist to Entrepreneur
At Southard’s ISHI 30 workshop, the participants take on the role of an FGG investigator, trying to solve a cold case in which a woman was murdered over 30 years ago. Southard doles out information a little bit at a time, starting with an analysis of a DNA sample from the crime scene. The goal of the exercise is to identify people who may be related to the killer, by building genetic networks based on common ancestors. Just as in the Angie Dodge case, Southard leads the class through several twists and turns, identifying second and third cousins of the suspect. Although the process appears complex, it’s based on a simple principle that involves a unit of measure called a centimorgan (cM)—technically, a measurement of the DNA recombination frequency within a region on a chromosome. However, it is often equated to a length of DNA. In humans, on average, 1 cM corresponds to 1 million base-pairs. As an article on Southard’s web site explains, ”Your total shared cM tells you how much DNA you share with another match. In general, the more DNA you share with a match, the higher the cM number will be and the more closely related you are.”
Any genetic genealogy service, such as those offered by 23andMe or AncestryDNA, provides a report that includes the total amount of DNA sequence you share with other people in the database. A free online tool, DNA Painter, includes an option that makes use of information from the Shared centiMorgan Project to predict relationships. For example, entering 200 cM of shared DNA into the tool returns a range of possibilities: there is a 45% probability that the match is a half second cousin, a second cousin once removed, or even a half great-great aunt or uncle. At the top of the chart, a parent and child will typically share 3,300–3,720 cM of DNA. At the other end, say 20 cM, the relationships become substantially more difficult to trace, stretching to sixth or seventh cousins.
"Your total shared cM tells you how much DNA you share with another match. In general, the more DNA you share with a match, the higher the cM number will be and the more closely related you are."
In the genetic genealogy workshop, the search for the elusive killer results in building several genetic networks based on shared cM data, which Southard has her class plot out on traditional family tree diagrams. It’s nearly impossible for an FGG investigation to unearth a single match; more often than not, the process uncovers tens or even hundreds of possibilities. Narrowing down the list involves a lot of traditional investigative work—searching through obituaries, newspaper records, census information or even old store receipts. It’s certainly not as glamorous a process as often depicted in television shows.
A Path Forward
Forensic investigators who are new to the field of FGG may be daunted by the learning curve initially. The first piece of advice Southard offers is to get your own DNA analyzed and plot out your family tree. “Hands down the best way to learn,” she says, “is to watch it work in your own family, with people you know. The more you can understand about your known relationships, the more you will be able to tackle the unknown.”
Particularly when using FGG in a criminal case, Southard notes that it’s important to respect the space and follow the rules. She says that the leads provided by genetic genealogy databases are often compared to a concerned citizen calling a tip line to report suspicious activity in the neighborhood. However, DNA information is a lot more personal and sensitive. Southard admonishes users of the technique to proceed cautiously. “While the guidelines around how to use these data are still evolving, honor what has been established. That’s the best way to ensure the longevity of this technology.”
Southard's classes and workshops provide genetic genealogy education to audiences around the world. Photo credit: Diahan Southard
Southard is optimistic about the future of FGG. She sees the technique becoming easier to use with the development of software tools that can automate some of the labor-intensive tasks involved in finding matches and building family trees. “23andMe recently released a tool that can reconstruct a tree for a group of individuals who are second cousins or closer,” she says. “This kind of tool can certainly speed up the work that we’re doing.”
In the end, Southard’s class exercise at ISHI 30 did not yield a definitive result. Addressing the class at the conclusion of the session, Southard says she was torn by the decision of whether or not to have the investigative trail end with the positive identification of the killer. However, the exercise reflects the reality that FGG isn’t a magic wand, and some real-world cases that employ FGG still remain unsolved.
That reality, however, shouldn’t prevent an investigator from using FGG in a case that could benefit from the technique. Neither should the limitations of the data set available for law enforcement prove to be a deterrent. As Southard says, “You don’t need a million samples to solve your case. You only need a few. And perhaps the few you need are already there in the database.”