skip to main content


Title: Privacy-aware estimation of relatedness in admixed populations
Abstract Background

Estimation of genetic relatedness, or kinship, is used occasionally for recreational purposes and in forensic applications. While numerous methods were developed to estimate kinship, they suffer from high computational requirements and often make an untenable assumption of homogeneous population ancestry of the samples. Moreover, genetic privacy is generally overlooked in the usage of kinship estimation methods. There can be ethical concerns about finding unknown familial relationships in third-party databases. Similar ethical concerns may arise while estimating and reporting sensitive population-level statistics such as inbreeding coefficients for the concerns around marginalization and stigmatization.

Results

Here, we present SIGFRIED, which makes use of existing reference panels with a projection-based approach that simplifies kinship estimation in the admixed populations. We use simulated and real datasets to demonstrate the accuracy and efficiency of kinship estimation. We present a secure federated kinship estimation framework and implement a secure kinship estimator using homomorphic encryption-based primitives for computing relatedness between samples in two different sites while genotype data are kept confidential. Source code and documentation for our methods can be found at https://doi.org/10.5281/zenodo.7053352.

Conclusions

Analysis of relatedness is fundamentally important for identifying relatives, in association studies, and for estimation of population-level estimates of inbreeding. As the awareness of individual and group genomic privacy is growing, privacy-preserving methods for the estimation of relatedness are needed. Presented methods alleviate the ethical and privacy concerns in the analysis of relatedness in admixed, historically isolated and underrepresented populations.

Short Abstract

Genetic relatedness is a central quantity used for finding relatives in databases, correcting biases in genome wide association studies and for estimating population-level statistics. Methods for estimating genetic relatedness have high computational requirements, and occasionally do not consider individuals from admixed ancestries. Furthermore, the ethical concerns around using genetic data and calculating relatedness are not considered. We present a projection-based approach that can efficiently and accurately estimate kinship. We implement our method using encryption-based techniques that provide provable security guarantees to protect genetic data while kinship statistics are computed among multiple sites.

 
more » « less
NSF-PAR ID:
10380546
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Briefings in Bioinformatics
Volume:
23
Issue:
6
ISSN:
1467-5463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Kinship plays a fundamental role in the evolution of social systems and is considered a key driver of group living. To understand the role of kinship in the formation and maintenance of social bonds, accurate measures of genetic relatedness are critical. Genotype‐by‐sequencing technologies are rapidly advancing the accuracy and precision of genetic relatedness estimates for wild populations. The ability to assign kinship from genetic data varies depending on a species’ or population's mating system and pattern of dispersal, and empirical data from longitudinal studies are crucial to validate these methods. We use data from a long‐term behavioural study of a polygynandrous, bisexually philopatric marine mammal to measure accuracy and precision of parentage and genetic relatedness estimation against a known partial pedigree. We show that with moderate but obtainable sample sizes of approximately 4,235 SNPs and 272 individuals, highly accurate parentage assignments and genetic relatedness coefficients can be obtained. Additionally, we subsample our data to quantify how data availability affects relatedness estimation and kinship assignment. Lastly, we conduct a social network analysis to investigate the extent to which accuracy and precision of relatedness estimation improve statistical power to detect an effect of relatedness on social structure. Our results provide practical guidance for minimum sample sizes and sequencing depth for future studies, as well as thresholds for post hoc interpretation of previous analyses.

     
    more » « less
  2. Abstract Aim

    Quantifying abundance distributions is critical for understanding both how communities assemble, and how community structure varies through time and space, yet estimating abundances requires considerable investment in fieldwork. Community‐level population genetic data potentially offer a powerful way to indirectly infer richness, abundance and the history of accumulation of biodiversity within a community. Here we introduce a joint model linking neutral community assembly and comparative phylogeography to generate both community‐level richness, abundance and genetic variation under a neutral model, capturing both equilibrium and non‐equilibrium dynamics.

    Location

    Global.

    Methods

    Our model combines a forward‐time individual‐based community assembly process with a rescaled backward‐time neutral coalescent model of multi‐taxa population genetics. We explore general dynamics of genetic and abundance‐based summary statistics and use approximate Bayesian computation (ABC) to estimate parameters underlying the model of island community assembly. Finally, we demonstrate two applications of the model using community‐scale mtDNAsequence data and densely sampled abundances of an arachnid community on La Réunion. First, we use genetic data alone to estimate a summary of the abundance distribution, ground‐truthing this against the observed abundances. Then, we jointly use the observed genetic data and abundances to estimate the proximity of the community to equilibrium.

    Results

    Simulation experiments of ourABCprocedure demonstrate that coupling abundance with genetic data leads to improved accuracy and precision of model parameter estimates compared with using abundance‐only data. We further demonstrate reasonable precision and accuracy in estimating a metric underlying the shape of the abundance distribution, temporal progress towards local equilibrium and several key parameters of the community assembly process. For the insular arachnid assemblage, we find the joint distribution of genetic diversity and abundance approaches equilibrium expectations, and that the Shannon entropy of the observed abundances can be estimated using genetic data alone.

    Main conclusions

    The framework that we present unifies neutral community assembly and comparative phylogeography to characterize the community‐level distribution of both abundance and genetic variation through time, providing a resource that should greatly enhance understanding of both the processes structuring ecological communities and the associated aggregate demographic histories.

     
    more » « less
  3. Abstract Aim

    Intraspecific genetic variation is key for adaptation and survival in changing environments and is known to be influenced by many factors, including population size, dispersal and life‐history traits. We investigated genetic variation within Neotropical amphibian species to provide insights into how natural history traits, phylogenetic relatedness, climatic and geographic characteristics can explain intraspecific genetic diversity.

    Location

    Neotropics.

    Taxon

    Amphibians.

    Methods

    We assembled data sets using open‐access databases for natural history traits, genetic sequences, phylogenetic trees, climatic and geographic data. For each species, we calculated overall nucleotide diversity (π) and tested for isolation by distance (IBD) and isolation by environment (IBE). We then identified predictors ofπ, IBD and IBE using random forest (RF) regression or RF classification. We also fitted phylogenetic generalized linear mixed models (PGLMMs) to predictπ, IBD and IBE.

    Results

    We compiled 4052 mitochondrial DNA sequences from 256 amphibian species (230 frogs and 26 salamanders), georeferencing 2477 sequences from 176 species that were not linked to occurrence data. RF regressions and PGLMMs were congruent in identifying range size and precipitation (σ) as the most important predictors ofπ, influencing it positively. RF classification and PGLMMs identified minimum elevation as an important predictor of IBD; most species without IBD tended to occur at higher elevations. Maximum latitude and precipitation (σ) were the best predictors of IBE, and most species without IBE occur at lower latitudes and in areas with more variable precipitation.

    Main Conclusions

    This study identified predictors of genetic variation in Neotropical amphibians using both machine learning and phylogenetic methods. This approach was valuable to determine which predictors were congruent between methods. We found that species with small ranges or living in zones with less variable precipitation tended to have low genetic diversity. We also showed that Western Mesoamerica, Andes and Atlantic Forest biogeographic units harbour high diversity across many species that should be prioritized for protection. These results could play a key role in the development of conservation strategies for Neotropical amphibians.

     
    more » « less
  4. Abstract

    Understanding the genomic consequences of population decline is important for predicting species' vulnerability to intensifying global change. Empirical information about genomic changes in populations in the early stages of decline, especially for those still experiencing immigration, remains scarce. We used 7834 autosomal SNPs and demographic data for 288 Florida scrub jays (Aphelocoma coerulescens; FSJ) sampled in 2000 and 2008 to compare levels of genetic diversity, inbreeding, relatedness, and lengths of runs of homozygosity (ROH) between two subpopulations within dispersal distance of one another but have experienced contrasting demographic trajectories. At Archbold Biological Station (ABS), the FSJ population has been stable because of consistent habitat protection and management, while at nearby Placid Lakes Estates (PLE), the population declined precipitously due to suburban development. By the onset of our sampling in 2000, birds in PLE were already less heterozygous, more inbred, and on average more related than birds in ABS. No significant changes occurred in heterozygosity or inbreeding across the 8‐year sampling interval, but average relatedness among individuals decreased in PLE, thus by 2008 average relatedness did not differ between sites. PLE harbored a similar proportion of short ROH but a greater proportion of long ROH than ABS, suggesting one continuous population of shared demographic history in the past, which is now experiencing more recent inbreeding. These results broadly uphold the predictions of simple population genetic models based on inferred effective population sizes and rates of immigration. Our study highlights how, in just a few generations, formerly continuous populations can diverge in heterozygosity and levels of inbreeding with severe local population decline despite ongoing gene flow.

     
    more » « less
  5. Abstract Objectives

    We examined autosomal genome‐wide SNPs and Y‐chromosome data from 15 Siberian and 12 reference populations to study the affinities of Siberian populations, and to address hypotheses about the origin of the Samoyed peoples.

    Methods

    Samples were genotyped for 567 096 autosomal SNPs and 147 Y‐chromosome polymorphic sites. For several analyses, we used 281 093 SNPs from the intersection of our data with publicly available ancient Siberian samples. To examine genetic relatedness among populations, we applied PCA,FST, TreeMix, and ADMIXTURE analyses. To explore the potential effect of demography and evolutionary processes, the distribution of ROH and IBD sharing within population were studied.

    Results

    Analyses of autosomal and Y‐chromosome data reveal high differentiation of the Siberian groups. The Siberian populations have a large proportion of their genome in ROH and IBD segments. Several populations (ie, Nganasans, Evenks, Yukagirs, and Koryaks) do not appear to have experienced admixture with other Siberian populations (ie, producing only positive f3), while for the other tested populations the composition of mixing sources always included Nganasans or Evenks. The Nganasans from the Taymyr Peninsula demonstrate the greatest level of shared shorter ROH and IBD with nearly all other Siberian populations.

    Conclusions

    Autosomal SNP and Y‐chromosome data demonstrate that Samoyedic populations differ significantly in their genetic composition. Genetic relationship is observed only between Forest and Tundra Nentsi. Selkups are affiliated with the Kets from the Yenisey River, while the Nganasans are separated from their linguistic neighbors, showing closer affinities with the Evenks and Yukagirs.

     
    more » « less