Coalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. A promising avenue for the analysis of large genomic alignments, which are increasingly common, is coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. We introduce a novel method for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables our method to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human–chimp–gorilla scenario, we show that our method has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies, and we report on their accuracy. Furthermore, we discuss a potential direction for scaling the method to larger data sets through a divide-and-conquer approach. This accuracy means our method is useful now, and by deriving transition rates by simulation, it is flexible enough to enable future implementations of various population models.
more »
« less
Population divergence time estimation using individual lineage label switching
Abstract Divergence time estimation from multilocus genetic data has become common in population genetics and phylogenetics. We present a new Bayesian inference method that treats the divergence time as a random variable. The divergence time is calculated from an assembly of splitting events on individual lineages in a genealogy. The time for such a splitting event is drawn from a hazard function of the truncated normal distribution. This allows easy integration into the standard coalescence framework used in programs such as Migrate. We explore the accuracy of the new inference method with simulated population splittings over a wide range of divergence time values and with a reanalysis of a dataset of 5 populations consisting of 3 present-day populations (Africans, Europeans, Asian) and 2 archaic samples (Altai and Ust’Isthim). Evaluations of simple divergence models without subsequent geneflow show high accuracy, whereas the accuracy of the results of isolation with migration models depends on the magnitude of the immigration rate. High immigration rates lead to a time of the most recent common ancestor of the sample that, looking backward in time, predates the divergence time. Even with many independent loci, accurate estimation of the divergence time with high immigration rates becomes problematic. Our comparison to other software tools reveals that our lineage-switching method, implemented in Migrate, is comparable to IMa2p. The software Migrate can run large numbers of sequence loci (>1,000) on computer clusters in parallel.
more »
« less
- Award ID(s):
- 2109989
- PAR ID:
- 10365402
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- G3 Genes|Genomes|Genetics
- Volume:
- 12
- Issue:
- 4
- ISSN:
- 2160-1836
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We present an integrated open population model where the population dynamics are defined by a differential equation, and the related statistical model utilizes a Poisson binomial convolution likelihood. Key advantages of the proposed approach over existing open population models include the flexibility to predict related, but unobserved quantities such as total immigration or emigration over a specified time period, and more computationally efficient posterior simulation by elimination of the need to explicitly simulate latent immigration and emigration. The viability of the proposed method is shown in an in-depth analysis of outdoor recreation participation on public lands, where the surveyed populations changed rapidly and demographic population closure cannot be assumed even within a single day.more » « less
-
Evolution by natural selection may be effective enough to allow for recurrent, rapid adaptation to distinct niche environments within a well-mixed population. For this to occur, selection must act on standing genetic variation such that mortality i.e. genetic load, is minimized while polymorphism is maintained. Selection on multiple, redundant loci of small effect provides a potentially inexpensive solution. Yet, demonstrating adaptation via redundant, polygenic selection in the wild remains extremely challenging because low per-locus effect sizes and high genetic redundancy severely reduce statistical power. One approach to facilitate identification of loci underlying polygenic selection is to harness natural replicate populations experiencing similar selection pressures that harbor high within-, yet negligible among-population genetic variation. Such populations can be found among the teleost Fundulus heteroclitus. F. heteroclitus inhabits salt marsh estuaries that are characterized by high environmental heterogeneity e.g. tidal ponds, creeks, coastal basins. Here, we sample four of these heterogeneous niches (one coastal basin and three replicate tidal ponds) at two time points from among a single, panmictic F. heteroclitus population. We identify 10,861 single nucleotide polymorphisms using a genotyping-by-sequencing approach and quantify temporal allele frequency change within, as well as spatial divergence among subpopulations residing in these niches. We find a significantly elevated number of concordant allele frequency changes among all subpopulations, suggesting ecosystem-wide adaptation to a common selection pressure. Remarkably, we also find an unexpected number of temporal allele frequency changes that generate fine-scale divergence among subpopulations, suggestive of local adaptation to distinct niche environments. Both patterns are characterized by a lack of large-effect loci yet an elevated total number of significant loci. Adaptation via redundant, polygenic selection offers a likely explanation for these patterns as well as a potential mechanism for polymorphism maintenance in the F. heteroclitus system.more » « less
-
A challenge to understanding biological diversification is accounting for community-scale processes that cause multiple, co-distributed lineages to co-speciate. Such processes predict non-independent, temporally clustered divergences across taxa. Approximate-likelihood Bayesian computation (ABC) approaches to inferring such patterns from comparative genetic data are very sensitive to prior assumptions and often biased toward estimating shared divergences. We introduce a full-likelihood Bayesian approach, ecoevolity, which takes full advantage of information in genomic data. By analytically integrating over gene trees, we are able to directly calculate the likelihood of the population history from genomic data, and efficiently sample the model-averaged posterior via Markov chain Monte Carlo algorithms. Using simulations, we find that the new method is much more accurate and precise at estimating the number and timing of divergence events across pairs of populations than existing approximate-likelihood methods. Our full Bayesian approach also requires several orders of magnitude less computational time than existing ABC approaches. We find that despite assuming unlinked characters (e.g., unlinked single-nucleotide polymorphisms), the new method performs better if this assumption is violated in order to retain the constant characters of whole linked loci. In fact, retaining constant characters allows the new method to robustly estimate the correct number of divergence events with high posterior probability in the face of character-acquisition biases, which commonly plague loci assembled from reduced-representation genomic libraries. We apply our method to genomic data from four pairs of insular populations of Gekko lizards from the Philippines that are not expected to have co-diverged. Despite all four pairs diverging very recently, our method strongly supports that they diverged independently, and these results are robust to very disparate prior assumptions.more » « less
-
Abstract In today’s rapidly changing world, it is critical to examine how animal populations will respond to severe environmental change. Following events such as pollution or deforestation that cause populations to decline, extinction will occur unless populations can adapt in response to natural selection, a process called evolutionary rescue. Theory predicts that immigration can delay extinction and provide novel genetic material that can prevent inbreeding depression and facilitate adaptation. However, when potential source populations have not experienced the new environment before (i.e., are naive), immigration can counteract selection and constrain adaptation. This study evaluated the effects of immigration of naive individuals on evolutionary rescue using the red flour beetle, Tribolium castaneum, as a model system. Small populations were exposed to a challenging environment, and 3 immigration rates (0, 1, or 5 migrants per generation) were implemented with migrants from a benign environment. Following an initial decline in population size across all treatments, populations receiving no immigration gained a higher growth rate one generation earlier than those with immigration, illustrating the constraining effects of immigration on adaptation. After 7 generations, a reciprocal transplant experiment found evidence for adaptation regardless of immigration rate. Thus, while the immigration of naive individuals briefly delayed adaptation, it did not increase extinction risk or prevent adaptation following environmental change.more » « less
An official website of the United States government
