skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A rarefaction approach for measuring population differences in rare and common variation
Abstract In studying allele-frequency variation across populations, it is often convenient to classify an allelic type as “rare,” with nonzero frequency less than or equal to a specified threshold, “common,” with a frequency above the threshold, or entirely unobserved in a population. When sample sizes differ across populations, however, especially if the threshold separating “rare” and “common” corresponds to a small number of observed copies of an allelic type, discreteness effects can lead a sample from one population to possess substantially more rare allelic types than a sample from another population, even if the two populations have extremely similar underlying allele-frequency distributions across loci. We introduce a rarefaction-based sample-size correction for use in comparing rare and common variation across multiple populations whose sample sizes potentially differ. We use our approach to examine rare and common variation in worldwide human populations, finding that the sample-size correction introduces subtle differences relative to analyses that use the full available sample sizes. We introduce several ways in which the rarefaction approach can be applied: we explore the dependence of allele classifications on subsample sizes, we permit more than two classes of allelic types of nonzero frequency, and we analyze rare and common variation in sliding windows along the genome. The results can assist in clarifying similarities and differences in allele-frequency patterns across populations.  more » « less
Award ID(s):
2116322
PAR ID:
10416153
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
GENETICS
Volume:
224
Issue:
2
ISSN:
1943-2631
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Recurrent mutation produces multiple copies of the same allele which may be co-segregating in a population. Yet, most analyses of allele-frequency or site-frequency spectra assume that all observed copies of an allele trace back to a single mutation. We develop a sampling theory for the number of latent mutations in the ancestry of a rare variant, specifically a variant observed in relatively small count in a large sample. Our results follow from the statistical independence of low-count mutations, which we show to hold for the standard neutral coalescent or diffusion model of population genetics as well as for more general coalescent trees. For populations of constant size, these counts are distributed like the number of alleles in the Ewens sampling formula. We develop a Poisson sampling model for populations of varying size and illustrate it using new results for site-frequency spectra in an exponentially growing population. We apply our model to a large data set of human SNPs and use it to explain dramatic differences in site-frequency spectra across the range of mutation rates in the human genome. 
    more » « less
  2. Abstract Allele-sharing statistics for a genetic locus measure the dissimilarity between two populations as a mean of the dissimilarity between random pairs of individuals, one from each population. Owing to within-population variation in genotype, allele-sharing dissimilarities can have the property that they have a nonzero value when computed between a population and itself. We consider the mathematical properties of allele-sharing dissimilarities in a pair of populations, treating the allele frequencies in the two populations parametrically. Examining two formulations of allele-sharing dissimilarity, we obtain the distributions of within-population and between-population dissimilarities for pairs of individuals. We then mathematically explore the scenarios in which, for certain allele-frequency distributions, the within-population dissimilarity – the mean dissimilarity between randomly chosen members of a population – can exceed the dissimilarity between two populations. Such scenarios assist in explaining observations in population-genetic data that members of a population can be empirically more genetically dissimilar from each other on average than they are from members of another population. For a population pair, however, the mathematical analysis finds that at least one of the two populations always possesses smaller within-population dissimilarity than the value of the between-population dissimilarity. We illustrate the mathematical results with an application to human population-genetic data. 
    more » « less
  3. The ways in which genetic variation is distributed within and among populations is a key determinant of the evolutionary features of a species. However, most comprehensive studies of these features have been restricted to studies of subdivision in settings known to have been driven by local adaptation, leaving our understanding of the natural dispersion of allelic variation less than ideal. Here, we present a geographic population-genomic analysis of 10 populations of the freshwater microcrustacean Daphnia pulex, an emerging model system in evolutionary genomics. These populations exhibit a pattern of moderate isolation-by-distance, with an average migration rate of 0.6 individuals per generation, and average effective population sizes of ∼650,000 individuals. Most populations contain numerous private alleles, and genomic scans highlight the presence of islands of excessively high population subdivision for more common alleles. A large fraction of such islands of population divergence likely reflect historical neutral changes, including rare stochastic migration and hybridization events. The data do point to local adaptive divergence, although the precise nature of the relevant variation is diffuse and cannot be associated with particular loci, despite the very large sample sizes involved in this study. In contrast, an analysis of between-species divergence highlights positive selection operating on a large set of genes with functions nearly nonoverlapping with those involved in local adaptation, in particular ribosome structure, mitochondrial bioenergetics, light reception and response, detoxification, and gene regulation. These results set the stage for using D. pulex as a model for understanding the relationship between molecular and cellular evolution in the context of natural environments. 
    more » « less
  4. Abstract Deleterious variants are selected against but can linger in populations at low frequencies for long periods of time, decreasing fitness and contributing to disease burden in humans and other species. Deleterious variants occur at low frequency but distinguishing deleterious variants from low‐frequency neutral variation is challenging based on population genomics data alone. As a result, we have little sense of the number and identity of deleterious variants in wild populations. For haplodiploid species, it has been hypothesised that deleterious alleles will be directly exposed to selection in haploid males, but selection can be masked in diploid females when deleterious variants are recessive, resulting in more efficient purging of deleterious mutations in males. Therefore, comparisons of the differences between haploid and diploid genomes from the same population may be a useful method for inferring rare deleterious variants. This study provides the first formal test of this hypothesis. Using wild populations of Northern paper wasps (Polistes fuscatus), we find that males have fewer missense and nonsense variants per generation than females from the same population. Allele frequency differences are especially pronounced for rare missense and nonsense variants and these differences lead to a lower mutational load in males than females. Based on these data we infer that many highly deleterious mutations are segregating in the paper wasp population. Stronger selection against deleterious alleles in haploid males may have implications for adaptation in other haplodiploid insects and provides evidence that wild populations harbour abundant deleterious variants. 
    more » « less
  5. Abstract Effective population size affects the efficacy of selection, rate of evolution by drift and neutral diversity levels. When species are subdivided into multiple populations connected by gene flow, evolutionary processes can depend on global or local effective population sizes. Theory predicts that high levels of diversity might be maintained by gene flow, even very low levels of gene flow, consistent with species long‐term effective population size, but tests of this idea are mostly lacking. Here, we show thatLycaeidesbutterfly populations maintain low contemporary (variance) effective population sizes (e.g. ~200 individuals) and thus evolve rapidly by genetic drift. However, populations harboured high levels of genetic diversity consistent with an effective population size several orders of magnitude larger. We hypothesized that the differences in the magnitude and variability of contemporary versus long‐term effective population sizes were caused by gene flow of sufficient magnitude to maintain diversity but only subtly affect evolution on generational timescales. Consistent with this hypothesis, we detected low but nontrivial gene flow among populations. Furthermore, using short‐term population‐genomic time‐series data, we documented patterns consistent with predictions from this hypothesis, including a weak but detectable excess of evolutionary change in the direction of the mean (migrant gene pool) allele frequencies across populations and consistency in the direction of allele frequency change over time. The documented decoupling of diversity levels and short‐term change by drift inLycaeideshas implications for our understanding of contemporary evolution and the maintenance of genetic variation in the wild. 
    more » « less