skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Recurrent mutation in the ancestry of a rare variant
Abstract Recurrent mutation produces multiple copies of the same allele which may be co-segregating in a population. Yet, most analyses of allele-frequency or site-frequency spectra assume that all observed copies of an allele trace back to a single mutation. We develop a sampling theory for the number of latent mutations in the ancestry of a rare variant, specifically a variant observed in relatively small count in a large sample. Our results follow from the statistical independence of low-count mutations, which we show to hold for the standard neutral coalescent or diffusion model of population genetics as well as for more general coalescent trees. For populations of constant size, these counts are distributed like the number of alleles in the Ewens sampling formula. We develop a Poisson sampling model for populations of varying size and illustrate it using new results for site-frequency spectra in an exponentially growing population. We apply our model to a large data set of human SNPs and use it to explain dramatic differences in site-frequency spectra across the range of mutation rates in the human genome.  more » « less
Award ID(s):
2152103 2534011
PAR ID:
10429652
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
GENETICS
Volume:
224
Issue:
3
ISSN:
1943-2631
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Synopsis Understanding recent population trends is critical to quantifying species vulnerability and implementing effective management strategies. To evaluate the accuracy of genomic methods for quantifying recent declines (beginning <120 generations ago), we simulated genomic data using forward-time methods (SLiM) coupled with coalescent simulations (msprime) under a number of demographic scenarios. We evaluated both site frequency spectrum (SFS)-based methods (momi2, Stairway Plot) and methods that employ linkage disequilibrium information (NeEstimator, GONE) with a range of sampling schemes (contemporary-only samples, sampling two time points, and serial sampling) and data types (RAD-like data and whole-genome sequencing). GONE and momi2 performed best overall, with >80% power to detect severe declines with large sample sizes. Two-sample and serial sampling schemes could accurately reconstruct changes in population size, and serial sampling was particularly valuable for making accurate inferences when genotyping errors or minor allele frequency cutoffs distort the SFS or under model mis-specification. However, sampling only contemporary individuals provided reliable inferences about contemporary size and size change using either site frequency or linkage-based methods, especially when large sample sizes or whole genomes from contemporary populations were available. These findings provide a guide for researchers designing genomics studies to evaluate recent demographic declines. 
    more » « less
  2. Abstract In studying allele-frequency variation across populations, it is often convenient to classify an allelic type as “rare,” with nonzero frequency less than or equal to a specified threshold, “common,” with a frequency above the threshold, or entirely unobserved in a population. When sample sizes differ across populations, however, especially if the threshold separating “rare” and “common” corresponds to a small number of observed copies of an allelic type, discreteness effects can lead a sample from one population to possess substantially more rare allelic types than a sample from another population, even if the two populations have extremely similar underlying allele-frequency distributions across loci. We introduce a rarefaction-based sample-size correction for use in comparing rare and common variation across multiple populations whose sample sizes potentially differ. We use our approach to examine rare and common variation in worldwide human populations, finding that the sample-size correction introduces subtle differences relative to analyses that use the full available sample sizes. We introduce several ways in which the rarefaction approach can be applied: we explore the dependence of allele classifications on subsample sizes, we permit more than two classes of allelic types of nonzero frequency, and we analyze rare and common variation in sliding windows along the genome. The results can assist in clarifying similarities and differences in allele-frequency patterns across populations. 
    more » « less
  3. Abstract The demographic history of a population is important for conservation and evolution, but this history is unknown for many populations. Methods that use genomic data have been developed to infer demography, but they can be challenging to implement and interpret, particularly for large populations. Thus, understanding if and when genetic estimates of demography correspond to true population history is important for assessing the performance of these genetic methods. Here, we used double‐digest restriction‐site associated DNA (ddRAD) sequencing data from archived collections of larval summer flounder (Paralichthys dentatus,n = 279) from three cohorts (1994–1995, 1997–1998 and 2008–2009) along the U.S. East coast to examine how contemporary effective population size and genetic diversity responded to changes in abundance in a natural population. Despite little to no detectable change in genetic diversity, coalescent‐based demographic modelling from site frequency spectra revealed that summer flounder effective population size declined dramatically in the early 1980s. The timing and direction of change corresponded well with the observed decline in spawning stock census abundance in the late 1980s from independent fish surveys. Census abundance subsequently recovered and achieved the prebottleneck size. Effective population size also grew following the bottleneck. Our results for summer flounder demonstrate that genetic sampling and site frequency spectra can be useful for detecting population dynamics, even in species with large effective sizes. 
    more » « less
  4. Interpretations of values of the F ST measure of genetic differentiation rely on an understanding of its mathematical constraints. Previously, it has been shown that F ST values computed from a biallelic locus in a set of multiple populations and F ST values computed from a multiallelic locus in a pair of populations are mathematically constrained as a function of the frequency of the allele that is most frequent across populations. We generalize from these cases to report here the mathematical constraint on F ST given the frequency M of the most frequent allele at a multiallelic locus in a set of multiple populations. Using coalescent simulations of an island model of migration with an infinitely-many-alleles mutation model, we argue that the joint distribution of F ST and M helps in disentangling the separate influences of mutation and migration on F ST . Finally, we show that our results explain a puzzling pattern of microsatellite differentiation: the lower F ST in an interspecific comparison between humans and chimpanzees than in the comparison of chimpanzee populations. We discuss the implications of our results for the use of F ST . This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’. 
    more » « less
  5. Barton, N (Ed.)
    Abstract In rapidly evolving populations, numerous beneficial and deleterious mutations can arise and segregate within a population at the same time. In this regime, evolutionary dynamics cannot be analyzed using traditional population genetic approaches that assume that sites evolve independently. Instead, the dynamics of many loci must be analyzed simultaneously. Recent work has made progress by first analyzing the fitness variation within a population, and then studying how individual lineages interact with this traveling fitness wave. However, these “traveling wave” models have previously been restricted to extreme cases where selection on individual mutations is either much faster or much slower than the typical coalescent timescale Tc. In this work, we show how the traveling wave framework can be extended to intermediate regimes in which the scaled fitness effects of mutations (Tcs) are neither large nor small compared to one. This enables us to describe the dynamics of populations subject to a wide range of fitness effects, and in particular, in cases where it is not immediately clear which mutations are most important in shaping the dynamics and statistics of genetic diversity. We use this approach to derive new expressions for the fixation probabilities and site frequency spectra of mutations as a function of their scaled fitness effects, along with related results for the coalescent timescale Tc and the rate of adaptation or Muller’s ratchet. We find that competition between linked mutations can have a dramatic impact on the proportions of neutral and selected polymorphisms, which is not simply summarized by the scaled selection coefficient Tcs. We conclude by discussing the implications of these results for population genetic inferences. 
    more » « less