skip to main content


Title: The effectiveness of various strategies to improve DNA analysis of formaldehyde‐damaged tissues from embalmed cadavers for human identification purposes
Abstract

Formalin‐fixed tissues provide the medical and forensic communities with alternative and often last resort sources of DNA for identification or diagnostic purposes. The DNA in these samples can be highly degraded and chemically damaged, making downstream genotyping using short tandem repeats (STRs) challenging. Therefore, the use of alternative genetic markers, methods that pre‐amplify the low amount of good quality DNA present, or methods that repair the damaged DNA template may provide more probative genetic information. This study investigated whether whole genome amplification (WGA) and DNA repair could improve STR typing of formaldehyde‐damaged (FD) tissues from embalmed cadavers. Additionally, comparative genotyping success using bi‐allelic markers, including INDELs and SNPs, was explored. Calculated random match probabilities (RMPs) using traditional STRs, INDEL markers, and two next generation sequencing (NGS) panels were compared across all samples. Overall, results showed that neither WGA nor DNA repair substantially improved STR success rates from formalin‐fixed tissue samples. However, when DNA from FD samples was genotyped using INDEL and SNP‐based panels, the RMP of each sample was markedly lower than the RMPs calculated from partial STR profiles. Therefore, the results of this study suggest that rather than attempting to improve the quantity and quality of severely damaged and degraded DNA prior to STR typing, a more productive approach may be to target smaller amplicons to provide more discriminatory DNA identifications. Furthermore, an NGS panel with less loci may yield better results when examining FD samples, due to more optimized chemistries that result in greater allelic balance and amplicon coverage.

 
more » « less
Award ID(s):
1719472
NSF-PAR ID:
10494213
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
PubMed
Date Published:
Journal Name:
Journal of Forensic Sciences
Volume:
68
Issue:
2
ISSN:
0022-1198
Page Range / eLocation ID:
596 to 607
Subject(s) / Keyword(s):
["DNA repair","formaldehyde-damaged","formalin-fixed","insertions\/deletions","next generation sequencing","paraffin-embedded","single nucleotide polymorphisms","whole genome amplification."]
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Short tandem repeats (STRs), also known as microsatellites, are commonly used to noninvasively genotype wild‐living endangered species, including African apes. Until recently, capillary electrophoresis has been the method of choice to determine the length of polymorphicSTRloci. However, this technique is labor intensive, difficult to compare across platforms, and notoriously imprecise. Here we developed a MiSeq‐based approach and tested its performance using previously genotyped fecal samples from long‐term studied chimpanzees in Gombe National Park, Tanzania. Using data from eight microsatellite loci as a reference, we designed a bioinformatics platform that converts raw MiSeq reads into locus‐specific files and automatically calls alleles after filtering stutter sequences and otherPCRartifacts. Applying this method to the entire Gombe population, we confirmed previously reported genotypes, but also identified 31 new alleles that had been missed due to sequence differences and size homoplasy. The new genotypes, which increased the allelic diversity and heterozygosity in Gombe by 61% and 8%, respectively, were validated by replicate amplification and pedigree analyses. This demonstrated inheritance and resolved one case of an ambiguous paternity. Using both singleplex and multiplex locus amplification, we also genotyped fecal samples from chimpanzees in the Greater Mahale Ecosystem in Tanzania, demonstrating the utility of the MiSeq‐based approach for genotyping nonhabituated populations and performing comparative analyses across field sites. The new automated high‐throughput analysis platform (available athttps://github.com/ShawHahnLab/chiimp) will allow biologists to more accurately and effectively determine wildlife population size and structure, and thus obtain information critical for conservation efforts.

     
    more » « less
  2. Abstract

    The development of high‐throughput sequencing technologies is dramatically increasing the use of single nucleotide polymorphisms (SNPs) across the field of genetics, but most parentage studies of wild populations still rely on microsatellites. We developed a bioinformatic pipeline for identifyingSNPpanels that are informative for parentage analysis from restriction site‐associatedDNAsequencing (RADseq) data. This pipeline includes options for analysis with or without a reference genome, and provides methods to maximize genotyping accuracy and select sets of unlinked loci that have high statistical power. We test this pipeline on small populations of Mexican gray wolf and bighorn sheep, for which parentage analyses are expected to be challenging due to low genetic diversity and the presence of many closely related individuals. We compare the results of parentage analysis acrossSNPpanels generated with or without the use of a reference genome, and betweenSNPs and microsatellites. For Mexican gray wolf, we conducted parentage analyses for 30 pups from a single cohort where samples were available from 64% of possible mothers and 53% of possible fathers, and the accuracy of parentage assignments could be estimated because true identities of parents were known a priori based on field data. For bighorn sheep, we conducted maternity analyses for 39 lambs from five cohorts where 77% of possible mothers were sampled, but true identities of parents were unknown. Analyses with and without a reference genome producedSNPpanels with ≥95% parentage assignment accuracy for Mexican gray wolf, outperforming microsatellites at 78% accuracy. Maternity assignments were completely consistent across allSNPpanels for the bighorn sheep, and were 74.4% consistent with assignments from microsatellites. Accuracy and consistency of parentage analysis were not reduced when using as few as 284SNPs for Mexican gray wolf and 142SNPs for bighorn sheep, indicating our pipeline can be used to developSNPgenotyping assays for parentage analysis with relatively small numbers of loci.

     
    more » « less
  3. Short tandem repeats (STRs) represent an important class of genetic variation that can contribute to phenotypic differences. Although millions of single nucleotide variants (SNVs) and short indels have been identified among wild Caenorhabditis elegans strains, the natural diversity in STRs remains unknown. Here, we characterized the distribution of 31,991 STRs with motif lengths of 1–6 bp in the reference genome of C. elegans . Of these STRs, 27,667 harbored polymorphisms across 540 wild strains and only 9691 polymorphic STRs (pSTRs) had complete genotype data for more than 90% of the strains. Compared with the reference genome, the pSTRs showed more contraction than expansion. We found that STRs with different motif lengths were enriched in different genomic features, among which coding regions showed the lowest STR diversity and constrained STR mutations. STR diversity also showed similar genetic divergence and selection signatures among wild strains as in previous studies using SNVs. We further identified STR variation in two mutation accumulation line panels that were derived from two wild strains and found background-dependent and fitness-dependent STR mutations. We also performed the first genome-wide association analyses between natural variation in STRs and organismal phenotypic variation among wild C. elegans strains. Overall, our results delineate the first large-scale characterization of STR variation in wild C. elegans strains and highlight the effects of selection on STR mutations. 
    more » « less
  4. Abstract

    The development of next-generation sequencing (NGS) enabled a shift from array-based genotyping to directly sequencing genomic libraries for high-throughput genotyping. Even though whole-genome sequencing was initially too costly for routine analysis in large populations such as breeding or genetic studies, continued advancements in genome sequencing and bioinformatics have provided the opportunity to capitalize on whole-genome information. As new sequencing platforms can routinely provide high-quality sequencing data for sufficient genome coverage to genotype various breeding populations, a limitation comes in the time and cost of library construction when multiplexing a large number of samples. Here we describe a high-throughput whole-genome skim-sequencing (skim-seq) approach that can be utilized for a broad range of genotyping and genomic characterization. Using optimized low-volume Illumina Nextera chemistry, we developed a skim-seq method and combined up to 960 samples in one multiplex library using dual index barcoding. With the dual-index barcoding, the number of samples for multiplexing can be adjusted depending on the amount of data required, and could be extended to 3,072 samples or more. Panels of doubled haploid wheat lines (Triticum aestivum, CDC Stanley x CDC Landmark), wheat-barley (T.aestivumxHordeum vulgare) and wheat-wheatgrass (Triticum durum x Thinopyrum intermedium) introgression lines as well as known monosomic wheat stocks were genotyped using the skim-seq approach. Bioinformatics pipelines were developed for various applications where sequencing coverage ranged from 1 × down to 0.01 × per sample. Using reference genomes, we detected chromosome dosage, identified aneuploidy, and karyotyped introgression lines from the skim-seq data. Leveraging the recent advancements in genome sequencing, skim-seq provides an effective and low-cost tool for routine genotyping and genetic analysis, which can track and identify introgressions and genomic regions of interest in genetics research and applied breeding programs.

     
    more » « less
  5. Abstract

    Genetic biodiversity contributes to individual fitness, species' evolutionary potential, and ecosystem stability. Temporal monitoring of the genetic status and trends of wild populations' genetic diversity can provide vital data to inform policy decisions and management actions. However, there is a lack of knowledge regarding which genetic metrics, temporal sampling protocols, and genetic markers are sufficiently sensitive and robust, on conservation‐relevant timescales. Here, we tested six genetic metrics and various sampling protocols (number and arrangement of temporal samples) for monitoring genetic erosion following demographic decline. To do so, we utilized individual‐based simulations featuring an array of different initial population sizes, types and severity of demographic decline, andDNAmarkers [single nucleotide polymorphisms (SNPs) and microsatellites] as well as decline followed by recovery. Number of alleles markedly outperformed other indicators across all situations. The type and severity of demographic decline strongly affected power, while the number and arrangement of temporal samples had small effect. Sampling 50 individuals at as few as two time points with 20 microsatellites performed well (good power), and could detect genetic erosion while 80–90% of diversity remained. This sampling and genotyping effort should often be affordable. Power increased substantially with more samples or markers, and we observe that power of 2500SNPs was nearly equivalent to 250 microsatellites, a result of theoretical and practical interest. Our results suggest high potential for using historic collections in monitoring programs, and demonstrate the need to monitor genetic as well as other levels of biodiversity.

     
    more » « less