skip to main content


Title: Privacy-aware estimation of relatedness in admixed populations
Abstract Background

Estimation of genetic relatedness, or kinship, is used occasionally for recreational purposes and in forensic applications. While numerous methods were developed to estimate kinship, they suffer from high computational requirements and often make an untenable assumption of homogeneous population ancestry of the samples. Moreover, genetic privacy is generally overlooked in the usage of kinship estimation methods. There can be ethical concerns about finding unknown familial relationships in third-party databases. Similar ethical concerns may arise while estimating and reporting sensitive population-level statistics such as inbreeding coefficients for the concerns around marginalization and stigmatization.

Results

Here, we present SIGFRIED, which makes use of existing reference panels with a projection-based approach that simplifies kinship estimation in the admixed populations. We use simulated and real datasets to demonstrate the accuracy and efficiency of kinship estimation. We present a secure federated kinship estimation framework and implement a secure kinship estimator using homomorphic encryption-based primitives for computing relatedness between samples in two different sites while genotype data are kept confidential. Source code and documentation for our methods can be found at https://doi.org/10.5281/zenodo.7053352.

Conclusions

Analysis of relatedness is fundamentally important for identifying relatives, in association studies, and for estimation of population-level estimates of inbreeding. As the awareness of individual and group genomic privacy is growing, privacy-preserving methods for the estimation of relatedness are needed. Presented methods alleviate the ethical and privacy concerns in the analysis of relatedness in admixed, historically isolated and underrepresented populations.

Short Abstract

Genetic relatedness is a central quantity used for finding relatives in databases, correcting biases in genome wide association studies and for estimating population-level statistics. Methods for estimating genetic relatedness have high computational requirements, and occasionally do not consider individuals from admixed ancestries. Furthermore, the ethical concerns around using genetic data and calculating relatedness are not considered. We present a projection-based approach that can efficiently and accurately estimate kinship. We implement our method using encryption-based techniques that provide provable security guarantees to protect genetic data while kinship statistics are computed among multiple sites.

 
more » « less
NSF-PAR ID:
10380546
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Briefings in Bioinformatics
Volume:
23
Issue:
6
ISSN:
1467-5463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Kinship plays a fundamental role in the evolution of social systems and is considered a key driver of group living. To understand the role of kinship in the formation and maintenance of social bonds, accurate measures of genetic relatedness are critical. Genotype‐by‐sequencing technologies are rapidly advancing the accuracy and precision of genetic relatedness estimates for wild populations. The ability to assign kinship from genetic data varies depending on a species’ or population's mating system and pattern of dispersal, and empirical data from longitudinal studies are crucial to validate these methods. We use data from a long‐term behavioural study of a polygynandrous, bisexually philopatric marine mammal to measure accuracy and precision of parentage and genetic relatedness estimation against a known partial pedigree. We show that with moderate but obtainable sample sizes of approximately 4,235 SNPs and 272 individuals, highly accurate parentage assignments and genetic relatedness coefficients can be obtained. Additionally, we subsample our data to quantify how data availability affects relatedness estimation and kinship assignment. Lastly, we conduct a social network analysis to investigate the extent to which accuracy and precision of relatedness estimation improve statistical power to detect an effect of relatedness on social structure. Our results provide practical guidance for minimum sample sizes and sequencing depth for future studies, as well as thresholds for post hoc interpretation of previous analyses.

     
    more » « less
  2. Kinship relationship estimation plays a significant role in today's genome studies. Since genetic data are mostly stored and protected in different silos, retrieving the desirable kinship relationships across federated data warehouses is a non-trivial problem. The ability to identify and connect related individuals is important for both research and clinical applications. In this work, we propose a new privacy-preserving kinship relationship estimation framework: Incremental Update Kinship Identification (INK). The proposed framework includes three key components that allow us to control the balance between privacy and accuracy (of kinship estimation): an incremental process coupled with the use of auxiliary information and informative scores. Our empirical evaluation shows that INK can achieve higher kinship identification correctness while exposing fewer genetic markers. 
    more » « less
  3. Abstract

    The degree to which individuals inbreed is a fundamental aspect of population biology shaped by both passive and active processes. Yet, the relative influences of random and non‐random mating on the overall magnitude of inbreeding are not well characterized for many taxa. We quantified variation in inbreeding among qualitatively accessible and isolated populations of a sessile marine invertebrate (the colonial ascidianLissoclinum verrilli) in which hermaphroditic colonies cast sperm into the water column for subsequent uptake and internal fertilization. We compared estimates of inbreeding to simulations predicting random mating within sites to evaluate if levels of inbreeding were (1) less than expected because of active attempts to limit inbreeding, (2) as predicted by genetic subdivision and passive inbreeding tolerance, or (3) greater than simulations due to active attempts to promote inbreeding via self‐fertilization or a preference for related mates. We found evidence of restricted gene flow and significant differences in the genetic diversity ofL. verrillicolonies among sites, indicating that on average colonies were weakly related in accessible locations, but their levels of relatedness matched that of first cousins or half‐siblings on isolated substrates. Irrespective of population size, progeny arrays revealed variation in the magnitude of inbreeding across sites that tracked with the mean relatedness of conspecifics. Biparental reproduction was confirmed in most offspring (86%) and estimates of total inbreeding largely overlapped with simulations of random mating, suggesting that interpopulation variation in mother–offspring resemblance was primarily due to genetic subdivision and passive tolerance of related mates. Our results highlight the influence of demographic isolation on the genetic composition of populations, and support theory predicting that tolerance of biparental inbreeding, even when mates are closely related, may be favoured under a broad set of ecological and evolutionary conditions.

     
    more » « less
  4. Abstract

    Understanding the genomic consequences of population decline is important for predicting species' vulnerability to intensifying global change. Empirical information about genomic changes in populations in the early stages of decline, especially for those still experiencing immigration, remains scarce. We used 7834 autosomal SNPs and demographic data for 288 Florida scrub jays (Aphelocoma coerulescens; FSJ) sampled in 2000 and 2008 to compare levels of genetic diversity, inbreeding, relatedness, and lengths of runs of homozygosity (ROH) between two subpopulations within dispersal distance of one another but have experienced contrasting demographic trajectories. At Archbold Biological Station (ABS), the FSJ population has been stable because of consistent habitat protection and management, while at nearby Placid Lakes Estates (PLE), the population declined precipitously due to suburban development. By the onset of our sampling in 2000, birds in PLE were already less heterozygous, more inbred, and on average more related than birds in ABS. No significant changes occurred in heterozygosity or inbreeding across the 8‐year sampling interval, but average relatedness among individuals decreased in PLE, thus by 2008 average relatedness did not differ between sites. PLE harbored a similar proportion of short ROH but a greater proportion of long ROH than ABS, suggesting one continuous population of shared demographic history in the past, which is now experiencing more recent inbreeding. These results broadly uphold the predictions of simple population genetic models based on inferred effective population sizes and rates of immigration. Our study highlights how, in just a few generations, formerly continuous populations can diverge in heterozygosity and levels of inbreeding with severe local population decline despite ongoing gene flow.

     
    more » « less
  5. Abstract

    For species of management concern, accurate estimates of inbreeding and associated consequences on reproduction are crucial for predicting their future viability. However, few studies have partitioned this aspect of genetic viability with respect to reproduction in a group-living social mammal. We investigated the contributions of foundation stock lineages, putative fitness consequences of inbreeding, and genetic diversity of the breeding versus nonreproductive segment of the Yellowstone National Park gray wolf population. Our dataset spans 25 years and seven generations since reintroduction, encompassing 152 nuclear families and 329 litters. We found more than 87% of the pedigree foundation genomes persisted and report influxes of allelic diversity from two translocated wolves from a divergent source in Montana. As expected for group-living species, mean kinship significantly increased over time but with minimal loss of observed heterozygosity. Strikingly, the reproductive portion of the population carried a significantly lower genome-wide inbreeding coefficients, autozygosity, and more rapid decay for linkage disequilibrium relative to the nonbreeding population. Breeding wolves had significantly longer lifespans and lower inbreeding coefficients than nonbreeding wolves. Our model revealed that the number of litters was negatively significantly associated with heterozygosity (R = −0.11). Our findings highlight genetic contributions to fitness, and the importance of the reproductively active individuals in a population to counteract loss of genetic variation in a wild, free-ranging social carnivore. It is crucial for managers to mitigate factors that significantly reduce effective population size and genetic connectivity, which supports the dispersion of genetic variation that aids in rapid evolutionary responses to environmental challenges.

     
    more » « less