skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The effectiveness of various strategies to improve DNA analysis of formaldehyde‐damaged tissues from embalmed cadavers for human identification purposes
Abstract Formalin‐fixed tissues provide the medical and forensic communities with alternative and often last resort sources of DNA for identification or diagnostic purposes. The DNA in these samples can be highly degraded and chemically damaged, making downstream genotyping using short tandem repeats (STRs) challenging. Therefore, the use of alternative genetic markers, methods that pre‐amplify the low amount of good quality DNA present, or methods that repair the damaged DNA template may provide more probative genetic information. This study investigated whether whole genome amplification (WGA) and DNA repair could improve STR typing of formaldehyde‐damaged (FD) tissues from embalmed cadavers. Additionally, comparative genotyping success using bi‐allelic markers, including INDELs and SNPs, was explored. Calculated random match probabilities (RMPs) using traditional STRs, INDEL markers, and two next generation sequencing (NGS) panels were compared across all samples. Overall, results showed that neither WGA nor DNA repair substantially improved STR success rates from formalin‐fixed tissue samples. However, when DNA from FD samples was genotyped using INDEL and SNP‐based panels, the RMP of each sample was markedly lower than the RMPs calculated from partial STR profiles. Therefore, the results of this study suggest that rather than attempting to improve the quantity and quality of severely damaged and degraded DNA prior to STR typing, a more productive approach may be to target smaller amplicons to provide more discriminatory DNA identifications. Furthermore, an NGS panel with less loci may yield better results when examining FD samples, due to more optimized chemistries that result in greater allelic balance and amplicon coverage.  more » « less
Award ID(s):
1719472
PAR ID:
10494213
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
PubMed
Date Published:
Journal Name:
Journal of Forensic Sciences
Volume:
68
Issue:
2
ISSN:
0022-1198
Page Range / eLocation ID:
596 to 607
Subject(s) / Keyword(s):
DNA repair formaldehyde-damaged formalin-fixed insertions/deletions next generation sequencing paraffin-embedded single nucleotide polymorphisms whole genome amplification.
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Short tandem repeats (STRs) represent an important class of genetic variation that can contribute to phenotypic differences. Although millions of single nucleotide variants (SNVs) and short indels have been identified among wild Caenorhabditis elegans strains, the natural diversity in STRs remains unknown. Here, we characterized the distribution of 31,991 STRs with motif lengths of 1–6 bp in the reference genome of C. elegans . Of these STRs, 27,667 harbored polymorphisms across 540 wild strains and only 9691 polymorphic STRs (pSTRs) had complete genotype data for more than 90% of the strains. Compared with the reference genome, the pSTRs showed more contraction than expansion. We found that STRs with different motif lengths were enriched in different genomic features, among which coding regions showed the lowest STR diversity and constrained STR mutations. STR diversity also showed similar genetic divergence and selection signatures among wild strains as in previous studies using SNVs. We further identified STR variation in two mutation accumulation line panels that were derived from two wild strains and found background-dependent and fitness-dependent STR mutations. We also performed the first genome-wide association analyses between natural variation in STRs and organismal phenotypic variation among wild C. elegans strains. Overall, our results delineate the first large-scale characterization of STR variation in wild C. elegans strains and highlight the effects of selection on STR mutations. 
    more » « less
  2. Abstract The development of high‐throughput sequencing technologies is dramatically increasing the use of single nucleotide polymorphisms (SNPs) across the field of genetics, but most parentage studies of wild populations still rely on microsatellites. We developed a bioinformatic pipeline for identifyingSNPpanels that are informative for parentage analysis from restriction site‐associatedDNAsequencing (RADseq) data. This pipeline includes options for analysis with or without a reference genome, and provides methods to maximize genotyping accuracy and select sets of unlinked loci that have high statistical power. We test this pipeline on small populations of Mexican gray wolf and bighorn sheep, for which parentage analyses are expected to be challenging due to low genetic diversity and the presence of many closely related individuals. We compare the results of parentage analysis acrossSNPpanels generated with or without the use of a reference genome, and betweenSNPs and microsatellites. For Mexican gray wolf, we conducted parentage analyses for 30 pups from a single cohort where samples were available from 64% of possible mothers and 53% of possible fathers, and the accuracy of parentage assignments could be estimated because true identities of parents were known a priori based on field data. For bighorn sheep, we conducted maternity analyses for 39 lambs from five cohorts where 77% of possible mothers were sampled, but true identities of parents were unknown. Analyses with and without a reference genome producedSNPpanels with ≥95% parentage assignment accuracy for Mexican gray wolf, outperforming microsatellites at 78% accuracy. Maternity assignments were completely consistent across allSNPpanels for the bighorn sheep, and were 74.4% consistent with assignments from microsatellites. Accuracy and consistency of parentage analysis were not reduced when using as few as 284SNPs for Mexican gray wolf and 142SNPs for bighorn sheep, indicating our pipeline can be used to developSNPgenotyping assays for parentage analysis with relatively small numbers of loci. 
    more » « less
  3. Next-generation sequencing (NGS) technologies have revolutionized phylogenomics by decreasing the cost and time required to generate sequence data from multiple markers or whole genomes. Further, the fragmented DNA of biological specimens collected decades ago can be sequenced with NGS, reducing the need for collecting fresh specimens. Sequence capture, also known as anchored hybrid enrichment, is a method to produce reduced representation libraries for NGS sequencing. The technique uses single-stranded oligonucleotide probes that hybridize with pre-selected regions of the genome that are sequenced via NGS, culminating in a dataset of numerous orthologous loci from multiple taxa. Phylogenetic analyses using these sequences have the potential to resolve deep and shallow phylogenetic relationships. Identifying the factors that affect sequence capture success could save time, money, and valuable specimens that might be destructively sampled despite low likelihood of sequencing success. We investigated the impacts of specimen age, preservation method, and DNA concentration on sequence capture (number of captured sequences and sequence quality) while accounting for taxonomy and extracted tissue type in a large-scale butterfly phylogenomics project. This project used two probe sets to extract 391 loci or a subset of 13 loci from over 6,000 butterfly specimens. We found that sequence capture is a resilient method capable of amplifying loci in samples of varying age (0–111 years), preservation method (alcohol, papered, pinned), and DNA concentration (0.020 ng/μl - 316 ng/ul). Regression analyses demonstrate that sequence capture is positively correlated with DNA concentration. However, sequence capture and DNA concentration are negatively correlated with sample age and preservation method. Our findings suggest that sequence capture projects should prioritize the use of alcohol-preserved samples younger than 20 years old when available. In the absence of such specimens, dried samples of any age can yield sequence data, albeit with returns that diminish with increasing age. 
    more » « less
  4. Larracuente, Amanda (Ed.)
    Abstract Short tandem repeats (STRs) have orders of magnitude higher mutation rates than single nucleotide variants (SNVs) and have been proposed to accelerate evolution in many organisms. However, only few studies have addressed the impact of STR variation on phenotypic variation at both the organismal and molecular levels. Potential driving forces underlying the high mutation rates of STRs also remain largely unknown. Here, we leverage the recently generated expression and STR variation data among wild Caenorhabditis elegans strains to conduct a genome-wide analysis of how STRs affect gene expression variation. We identify thousands of expression STRs (eSTRs) showing regulatory effects and demonstrate that they explain missing heritability beyond SNV-based expression quantitative trait loci. We illustrate specific regulatory mechanisms such as how eSTRs affect splicing sites and alternative splicing efficiency. We also show that differential expression of antioxidant genes and oxidative stresses might affect STR mutations systematically using both wild strains and mutation accumulation lines. Overall, we reveal the interplay between STRs and gene expression variation by providing novel insights into regulatory mechanisms of STRs and highlighting that oxidative stress could lead to higher STR mutation rates. 
    more » « less
  5. Abstract The development of next-generation sequencing (NGS) enabled a shift from array-based genotyping to directly sequencing genomic libraries for high-throughput genotyping. Even though whole-genome sequencing was initially too costly for routine analysis in large populations such as breeding or genetic studies, continued advancements in genome sequencing and bioinformatics have provided the opportunity to capitalize on whole-genome information. As new sequencing platforms can routinely provide high-quality sequencing data for sufficient genome coverage to genotype various breeding populations, a limitation comes in the time and cost of library construction when multiplexing a large number of samples. Here we describe a high-throughput whole-genome skim-sequencing (skim-seq) approach that can be utilized for a broad range of genotyping and genomic characterization. Using optimized low-volume Illumina Nextera chemistry, we developed a skim-seq method and combined up to 960 samples in one multiplex library using dual index barcoding. With the dual-index barcoding, the number of samples for multiplexing can be adjusted depending on the amount of data required, and could be extended to 3,072 samples or more. Panels of doubled haploid wheat lines (Triticum aestivum, CDC Stanley x CDC Landmark), wheat-barley (T.aestivumxHordeum vulgare) and wheat-wheatgrass (Triticum durum x Thinopyrum intermedium) introgression lines as well as known monosomic wheat stocks were genotyped using the skim-seq approach. Bioinformatics pipelines were developed for various applications where sequencing coverage ranged from 1 × down to 0.01 × per sample. Using reference genomes, we detected chromosome dosage, identified aneuploidy, and karyotyped introgression lines from the skim-seq data. Leveraging the recent advancements in genome sequencing, skim-seq provides an effective and low-cost tool for routine genotyping and genetic analysis, which can track and identify introgressions and genomic regions of interest in genetics research and applied breeding programs. 
    more » « less