skip to main content


Title: A comparison of proteomic, genomic, and osteological methods of archaeological sex estimation
Abstract

Sex estimation of skeletons is fundamental to many archaeological studies. Currently, three approaches are available to estimate sex–osteology, genomics, or proteomics, but little is known about the relative reliability of these methods in applied settings. We present matching osteological, shotgun-genomic, and proteomic data to estimate the sex of 55 individuals, each with an independent radiocarbon date between 2,440 and 100 cal BP, from two ancestral Ohlone sites in Central California. Sex estimation was possible in 100% of this burial sample using proteomics, in 91% using genomics, and in 51% using osteology. Agreement between the methods was high, however conflicts did occur. Genomic sex estimates were 100% consistent with proteomic and osteological estimates when DNA reads were above 100,000 total sequences. However, more than half the samples had DNA read numbers below this threshold, producing high rates of conflict with osteological and proteomic data where nine out of twenty conditional DNA sex estimates conflicted with proteomics. While the DNA signal decreased by an order of magnitude in the older burial samples, there was no decrease in proteomic signal. We conclude that proteomics provides an important complement to osteological and shotgun-genomic sex estimation.

 
more » « less
Award ID(s):
1825022
NSF-PAR ID:
10172642
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
10
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Genomics is narrowing uncertainty in the phylogenetic structure for many amniote groups. For one of the most diverse and species-rich groups, the squamate reptiles (lizards, snakes, and amphisbaenians), an inverse correlation between the number of taxa and loci sampled still persists across all publications using DNA sequence data and reaching a consensus on the relationships among them has been highly problematic. In this study, we use high-throughput sequence data from 289 samples covering 75 families of squamates to address phylogenetic affinities, estimate divergence times, and characterize residual topological uncertainty in the presence of genome-scale data. Importantly, we address genomic support for the traditional taxonomic groupings Scleroglossa and Macrostomata using novel machine-learning techniques. We interrogate genes using various metrics inherent to these loci, including parsimony-informative sites (PIS), phylogenetic informativeness, length, gaps, number of substitutions, and site concordance to understand why certain loci fail to find previously well-supported molecular clades and how they fail to support species-tree estimates. We show that both incomplete lineage sorting and poor gene-tree estimation (due to a few undesirable gene properties, such as an insufficient number of PIS), may account for most gene and species-tree discordance. We find overwhelming signal for Toxicofera, and also show that none of the loci included in this study supports Scleroglossa or Macrostomata. We comment on the origins and diversification of Squamata throughout the Mesozoic and underscore remaining uncertainties that persist in both deeper parts of the tree (e.g., relationships between Dibamia, Gekkota, and remaining squamates; among the three toxicoferan clades Iguania, Serpentes, and Anguiformes) and within specific clades (e.g., affinities among gekkotan, pleurodont iguanians, and colubroid families).

     
    more » « less
  2. Abstract Background

    Estimation of genetic relatedness, or kinship, is used occasionally for recreational purposes and in forensic applications. While numerous methods were developed to estimate kinship, they suffer from high computational requirements and often make an untenable assumption of homogeneous population ancestry of the samples. Moreover, genetic privacy is generally overlooked in the usage of kinship estimation methods. There can be ethical concerns about finding unknown familial relationships in third-party databases. Similar ethical concerns may arise while estimating and reporting sensitive population-level statistics such as inbreeding coefficients for the concerns around marginalization and stigmatization.

    Results

    Here, we present SIGFRIED, which makes use of existing reference panels with a projection-based approach that simplifies kinship estimation in the admixed populations. We use simulated and real datasets to demonstrate the accuracy and efficiency of kinship estimation. We present a secure federated kinship estimation framework and implement a secure kinship estimator using homomorphic encryption-based primitives for computing relatedness between samples in two different sites while genotype data are kept confidential. Source code and documentation for our methods can be found at https://doi.org/10.5281/zenodo.7053352.

    Conclusions

    Analysis of relatedness is fundamentally important for identifying relatives, in association studies, and for estimation of population-level estimates of inbreeding. As the awareness of individual and group genomic privacy is growing, privacy-preserving methods for the estimation of relatedness are needed. Presented methods alleviate the ethical and privacy concerns in the analysis of relatedness in admixed, historically isolated and underrepresented populations.

    Short Abstract

    Genetic relatedness is a central quantity used for finding relatives in databases, correcting biases in genome wide association studies and for estimating population-level statistics. Methods for estimating genetic relatedness have high computational requirements, and occasionally do not consider individuals from admixed ancestries. Furthermore, the ethical concerns around using genetic data and calculating relatedness are not considered. We present a projection-based approach that can efficiently and accurately estimate kinship. We implement our method using encryption-based techniques that provide provable security guarantees to protect genetic data while kinship statistics are computed among multiple sites.

     
    more » « less
  3. Rationale

    Protein studies in archaeology and paleontology have been dominated by stable isotope studies to understand diet and trophic levels, but recent applications of proteomic techniques have resulted in a more complete understanding of protein diagenesis than stable isotopes alone. In stable isotope analyses, samples are retained or discarded based on their properties. Proteomics can directly determine what proteins are present within the sample and may be able to allow previously discarded samples to be analyzed.

    Methods

    Protein samples that had been previously analyzed for stable isotopes, including those with marginal and poor sample quality, were characterized by liquid chromatography/mass spectrometry using an LTQ Orbitrap Velos mass spectrometer after separation on a Dionex Ultimate 3000 LC system. Data were analyzed using MetaMorpheus and custom R scripts.

    Results

    We found a variety of proteins in addition to collagen, although collagen I was found in the majority of the samples (most samples >80%). We also found a positive correlation between total deamidation and wt% N, suggesting that deamidation may impact the overall nitrogen signal in bulk analyses. The amino acid profiles of samples, including those of marginal or poor stable isotope quality, reflect the expected collagen I percentages, allowing their use in single amino acid stable isotope analyses.

    Conclusions

    All the samples regardless of quality were found to have high concentrations of collagen I, making interpretations of dietary routing based on collagen I reasonably valid. The amino acid profiles on the marginal and poor samples reflect an expected collagen I profile and allow these samples to be recovered for single amino acid analyses.

     
    more » « less
  4. Abstract

    The genomics revolution continues to change how ecologists and evolutionary biologists study the evolution and maintenance of biodiversity. It is now easier than ever to generate large molecular data sets consisting of hundreds to thousands of independently evolving nuclear loci to estimate a suite of evolutionary and demographic parameters. However, any inferences will be incomplete or inaccurate if incorrect taxonomic identities and perpetuated throughout the analytical pipeline. Due to decades of research and comprehensive online databases, sequencing and analysis of mitochondrial DNA (mtDNA), chloroplast DNA (cpDNA) and select nuclear genes can provide researchers with a cost effective and simple means to verify the species identity of samples prior to subsequent phylogeographic and population genomic analysis. The addition of these sequences to genomic studies can also shed light on other important evolutionary questions such as explanations for gene tree‐species tree discordance, species limits, sex‐biased dispersal patterns, adaptation, and mtDNA introgression. Although the mtDNA and cpDNA genomes often should not be used exclusively to make historical inferences given their well‐known limitations, the addition of these data to modern genomic studies adds little cost and effort while simultaneously providing a wealth of useful data that can have significant implications for both basic and applied research.

     
    more » « less
  5. Abstract Motivation

    Metagenomic binning aims to retrieve microbial genomes directly from ecosystems by clustering metagenomic contigs assembled from short reads into draft genomic bins. Traditional shotgun-based binning methods depend on the contigs’ composition and abundance profiles and are impaired by the paucity of enough samples to construct reliable co-abundance profiles. When applied to a single sample, shotgun-based binning methods struggle to distinguish closely related species only using composition information. As an alternative binning approach, Hi-C-based binning employs metagenomic Hi-C technique to measure the proximity contacts between metagenomic fragments. However, spurious inter-species Hi-C contacts inevitably generated by incorrect ligations of DNA fragments between species link the contigs from varying genomes, weakening the purity of final draft genomic bins. Therefore, it is imperative to develop a binning pipeline to overcome the shortcomings of both types of binning methods on a single sample.

    Results

    We develop HiFine, a novel binning pipeline to refine the binning results of metagenomic contigs by integrating both Hi-C-based and shotgun-based binning tools. HiFine designs a strategy of fragmentation for the original bin sets derived from the Hi-C-based and shotgun-based binning methods, which considerably increases the purity of initial bins, followed by merging fragmented bins and recruiting unbinned contigs. We demonstrate that HiFine significantly improves the existing binning results of both types of binning methods and achieves better performance in constructing species genomes on publicly available datasets. To the best of our knowledge, HiFine is the first pipeline to integrate different types of tools for the binning of metagenomic contigs.

    Availability and implementation

    HiFine is available at https://github.com/dyxstat/HiFine.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less