skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Quality and quantity of genetic relatedness data affect the analysis of social structure
Abstract Kinship plays a fundamental role in the evolution of social systems and is considered a key driver of group living. To understand the role of kinship in the formation and maintenance of social bonds, accurate measures of genetic relatedness are critical. Genotype‐by‐sequencing technologies are rapidly advancing the accuracy and precision of genetic relatedness estimates for wild populations. The ability to assign kinship from genetic data varies depending on a species’ or population's mating system and pattern of dispersal, and empirical data from longitudinal studies are crucial to validate these methods. We use data from a long‐term behavioural study of a polygynandrous, bisexually philopatric marine mammal to measure accuracy and precision of parentage and genetic relatedness estimation against a known partial pedigree. We show that with moderate but obtainable sample sizes of approximately 4,235 SNPs and 272 individuals, highly accurate parentage assignments and genetic relatedness coefficients can be obtained. Additionally, we subsample our data to quantify how data availability affects relatedness estimation and kinship assignment. Lastly, we conduct a social network analysis to investigate the extent to which accuracy and precision of relatedness estimation improve statistical power to detect an effect of relatedness on social structure. Our results provide practical guidance for minimum sample sizes and sequencing depth for future studies, as well as thresholds for post hoc interpretation of previous analyses.  more » « less
Award ID(s):
1755229 1559380
PAR ID:
10455749
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Molecular Ecology Resources
Volume:
19
Issue:
5
ISSN:
1755-098X
Page Range / eLocation ID:
p. 1181-1194
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Kinship relationship estimation plays a significant role in today's genome studies. Since genetic data are mostly stored and protected in different silos, retrieving the desirable kinship relationships across federated data warehouses is a non-trivial problem. The ability to identify and connect related individuals is important for both research and clinical applications. In this work, we propose a new privacy-preserving kinship relationship estimation framework: Incremental Update Kinship Identification (INK). The proposed framework includes three key components that allow us to control the balance between privacy and accuracy (of kinship estimation): an incremental process coupled with the use of auxiliary information and informative scores. Our empirical evaluation shows that INK can achieve higher kinship identification correctness while exposing fewer genetic markers. 
    more » « less
  2. Abstract Researchers have long debated which estimator of relatedness best captures the degree of relationship between two individuals. In the genomics era, this debate continues, with relatedness estimates being sensitive to the methods used to generate markers, marker quality, and levels of diversity in sampled individuals. Here, we compare six commonly used genome‐based relatedness estimators (kinship genetic distance [KGD], Wang maximum likelihood [TrioML], Queller and Goodnight [Rxy], Kinship INference for Genome‐wide association studies [KING‐robust), and pairwise relatedness [RAB], allele‐sharing coancestry [AS]) across five species bred in captivity–including three birds and two mammals–with varying degrees of reliable pedigree data, using reduced‐representation and whole genome resequencing data. Genome‐based relatedness estimates varied widely across estimators, sequencing methods, and species, yet the most consistent results for known first order relationships were found usingRxy,RAB, and AS. However, AS was found to be less consistently correlated with known pedigree relatedness than eitherRxyorRAB. Our combined results indicate there is not a single genome‐based estimator that is ideal across different species and data types. To determine the most appropriate genome‐based relatedness estimator for each new data set, we recommend assessing the relative: (1) correlation of candidate estimators with known relationships in the pedigree and (2) precision of candidate estimators with known first‐order relationships. These recommendations are broadly applicable to conservation breeding programmes, particularly where genome‐based estimates of relatedness can complement and complete poorly pedigreed populations. Given a growing interest in the application of wild pedigrees, our results are also applicable to in situ wildlife management. 
    more » « less
  3. Abstract Non‐random mating among individuals can lead to spatial clustering of genetically similar individuals and population stratification. This deviation from panmixia is commonly observed in natural populations. Consequently, individuals can have parentage in single populations or involving hybridization between differentiated populations. Accounting for this mixture and structure is important when mapping the genetics of traits and learning about the formative evolutionary processes that shape genetic variation among individuals and populations. Stratified genetic relatedness among individuals is commonly quantified using estimates of ancestry that are derived from a statistical model. Development of these models for polyploid and mixed‐ploidy individuals and populations has lagged behind those for diploids. Here, we extend and test a hierarchical Bayesian model, calledentropy, which can use low‐depth sequence data to estimate genotype and ancestry parameters in autopolyploid and mixed‐ploidy individuals (including sex chromosomes and autosomes within individuals). Our analysis of simulated data illustrated the trade‐off between sequencing depth and genome coverage and found lower error associated with low‐depth sequencing across a larger fraction of the genome than with high‐depth sequencing across a smaller fraction of the genome. The model has high accuracy and sensitivity as verified with simulated data and through analysis of admixture among populations of diploid and tetraploidArabidopsis arenosa. 
    more » « less
  4. Abstract The development of high‐throughput sequencing technologies is dramatically increasing the use of single nucleotide polymorphisms (SNPs) across the field of genetics, but most parentage studies of wild populations still rely on microsatellites. We developed a bioinformatic pipeline for identifyingSNPpanels that are informative for parentage analysis from restriction site‐associatedDNAsequencing (RADseq) data. This pipeline includes options for analysis with or without a reference genome, and provides methods to maximize genotyping accuracy and select sets of unlinked loci that have high statistical power. We test this pipeline on small populations of Mexican gray wolf and bighorn sheep, for which parentage analyses are expected to be challenging due to low genetic diversity and the presence of many closely related individuals. We compare the results of parentage analysis acrossSNPpanels generated with or without the use of a reference genome, and betweenSNPs and microsatellites. For Mexican gray wolf, we conducted parentage analyses for 30 pups from a single cohort where samples were available from 64% of possible mothers and 53% of possible fathers, and the accuracy of parentage assignments could be estimated because true identities of parents were known a priori based on field data. For bighorn sheep, we conducted maternity analyses for 39 lambs from five cohorts where 77% of possible mothers were sampled, but true identities of parents were unknown. Analyses with and without a reference genome producedSNPpanels with ≥95% parentage assignment accuracy for Mexican gray wolf, outperforming microsatellites at 78% accuracy. Maternity assignments were completely consistent across allSNPpanels for the bighorn sheep, and were 74.4% consistent with assignments from microsatellites. Accuracy and consistency of parentage analysis were not reduced when using as few as 284SNPs for Mexican gray wolf and 142SNPs for bighorn sheep, indicating our pipeline can be used to developSNPgenotyping assays for parentage analysis with relatively small numbers of loci. 
    more » « less
  5. Understanding the relationship between genotypes and phenotypes is crucial for advancing personalized medicine. Expression quantitative trait loci (eQTL) mapping plays a significant role by correlating genetic variants to gene expression levels. Despite the progress made by large-scale projects, eQTL mapping still faces challenges in statistical power and privacy concerns. Multi-site studies can increase sample sizes but are hindered by privacy issues. We present privateQTL, a novel framework leveraging secure multi-party computation for secure and federated eQTL mapping. When tested in a real-world scenario with data from different studies, privateQTL outperformed meta-analysis by accurately correcting for covariates and batch effect and retaining higher accuracy and precision for both eGene-eVariant mapping and effect size estimation. In addition, privateQTL is modular and scalable, making it adaptable for other molecular phenotypes and large-scale studies. Our results indicate that privateQTL is a practical solution for privacy-preserving collaborative eQTL mapping. 
    more » « less