Abstract We consider a simple diploid population-genetic model with potentially high variability of offspring numbers among individuals. Specifically, against a backdrop of Wright–Fisher reproduction and no selection, there is an additional probability that a big family occurs, meaning that a pair of individuals has a number of offspring on the order of the population size. We study how the pedigree of the population generated under this model affects the ancestral genetic process of a sample of size two at a single autosomal locus without recombination. Our population model is of the type for which multiple-merger coalescent processes have been described. We prove that the conditional distribution of the pairwise coalescence time given the random pedigree converges to a limit law as the population size tends to infinity. This limit law may or may not be the usual exponential distribution of the Kingman coalescent, depending on the frequency of big families. But because it includes the number and times of big families, it differs from the usual multiple-merger coalescent models. The usual multiple-merger coalescent models are seen as describing the ancestral process marginal to, or averaging over, the pedigree. In the limiting ancestral process conditional on the pedigree, the intervals between big families can be modeled using the Kingman coalescent but each big family causes a discrete jump in the probability of coalescence. Analogous results should hold for larger samples and other population models. We illustrate these results with simulations and additional analysis, highlighting their implications for inference and understanding of multilocus data.
more »
« less
This content will become publicly available on February 13, 2026
Multiple merger coalescent inference of effective population size
Variation in a sample of molecular sequence data informs about the past evolutionary history of the sample’s population. Traditionally, Bayesian modelling coupled with the standard coalescent is used to infer the sample’s bifurcating genealogy and demographic and evolutionary parameters such as effective population size and mutation rates. However, there are many situations where binary coalescent models do not accurately reflect the true underlying ancestral processes. Here, we propose a Bayesian non-parametric method for inferring effective population size trajectories from a multifurcating genealogy under the Lambda-coalescent. In particular, we jointly estimate the effective population size and the model parameter for the Beta-coalescent model, a special type of Lambda-coalescent. Finally, we test our methods on simulations and apply them to study various viral dynamics as well as Japanese sardine population size changes over time. The code and vignettes can be found in the phylodyn package. This article is part of the theme issue ‘“A mathematical theory of evolution”: phylogenetic models dating back 100 years’.
more »
« less
- Award ID(s):
- 2143242
- PAR ID:
- 10589235
- Editor(s):
- NA
- Publisher / Repository:
- The Royal Society Publishing
- Date Published:
- Journal Name:
- Philosophical Transactions of the Royal Society B: Biological Sciences
- Volume:
- 380
- Issue:
- 1919
- ISSN:
- 0962-8436
- Subject(s) / Keyword(s):
- Multiple mergers coalescent, Gaussian processes,Lambda-coalescent, Beta-coalescent
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The dynamic nature of the SIV population during disease progression in the SIV/macaque model of AIDS and the factors responsible for its behavior have not been documented, largely owing to the lack of sufficient spatial and temporal sampling of both viral and host data from SIV-infected animals. In this study, we detail Bayesian coalescent inference of the changing collective intra-host viral effective population size ( N e ) from various tissues over the course of infection and its relationship with what we demonstrate is a continuously changing immune cell repertoire within the blood. Although the relative contribution of these factors varied among hosts and time points, the adaptive immune response best explained the overall periodic dynamic behavior of the effective virus population. Data exposing the nature of the relationship between the virus and immune cell populations revealed the plausibility of an eco-evolutionary mathematical model, which was able to mimic the large-scale oscillations in N e through virus escape from relatively few, early immunodominant responses, followed by slower escape from several subdominant and weakened immune populations. The results of this study suggest that SIV diversity within the untreated host is governed by a predator-prey relationship, wherein differing phases of infection are the result of adaptation in response to varying immune responses. Previous investigations into viral population dynamics using sequence data have focused on single estimates of the effective viral population size ( N e ) or point estimates over sparse sampling data to provide insight into the precise impact of immune selection on virus adaptive behavior. Herein, we describe the use of the coalescent phylogenetic frame- work to estimate the relative changes in N e over time in order to quantify the relationship with empirical data on the dynamic immune composition of the host. This relationship has allowed us to expand on earlier simulations to build a predator-prey model that explains the deterministic behavior of the virus over the course of disease progression. We show that sequential viral adaptation can occur in response to phases of varying immune pressure, providing a broader picture of the viral response throughout the entire course of progression to AIDS.more » « less
-
Abstract Simultaneous molecular dating of population and species divergences is essential in many biological investigations, including phylogeography, phylodynamics and species delimitation studies. In these investigations, multiple sequence alignments consist of both intra‐ and interspecies samples (mixed samples). As a result, the phylogenetic trees contain interspecies, interpopulation and within‐population divergences. Bayesian relaxed clock methods are often employed in these analyses, but they assume the same tree prior for both inter‐ and intraspecies branching processes and require specification of a clock model for branch rates (independent vs. autocorrelated rates models). We evaluated the impact of a single tree prior on Bayesian divergence time estimates by analysing computer‐simulated data sets. We also examined the effect of the assumption of independence of evolutionary rate variation among branches when the branch rates are autocorrelated. Bayesian approach with coalescent tree priors generally produced excellent molecular dates and highest posterior densities with high coverage probabilities. We also evaluated the performance of a non‐Bayesian method, RelTime, which does not require the specification of a tree prior or a clock model. RelTime's performance was similar to that of the Bayesian approach, suggesting that it is also suitable to analyse data sets containing both populations and species variation when its computational efficiency is needed.more » « less
-
Abstract Neutrality tests such as Tajima’s D and Fay and Wu’s H are standard implements in the population genetics toolbox. One of their most common uses is to scan the genome for signals of natural selection. However, it is well understood that D and H are confounded by other evolutionary forces—in particular, population expansion—that may be unrelated to selection. Because they are not model-based, it is not clear how to deconfound these tests in a principled way. In this article, we derive new likelihood-based methods for detecting natural selection, which are robust to fluctuations in effective population size. At the core of our method is a novel probabilistic model of tree imbalance, which generalizes Kingman’s coalescent to allow certain aberrant tree topologies to arise more frequently than is expected under neutrality. We derive a frequency spectrum-based estimator that can be used in place of D, and also extend to the case where genealogies are first estimated. We benchmark our methods on real and simulated data, and provide an open source software implementation.more » « less
-
Abstract Phylodynamics is an area of population genetics that uses genetic sequence data to estimate past population dynamics. Modern state‐of‐the‐art Bayesian nonparametric methods for recovering population size trajectories of unknown form use either change‐point models or Gaussian process priors. Change‐point models suffer from computational issues when the number of change‐points is unknown and needs to be estimated. Gaussian process‐based methods lack local adaptivity and cannot accurately recover trajectories that exhibit features such as abrupt changes in trend or varying levels of smoothness. We propose a novel, locally adaptive approach to Bayesian nonparametric phylodynamic inference that has the flexibility to accommodate a large class of functional behaviors. Local adaptivity results from modeling the log‐transformed effective population size a priori as a horseshoe Markov random field, a recently proposed statistical model that blends together the best properties of the change‐point and Gaussian process modeling paradigms. We use simulated data to assess model performance, and find that our proposed method results in reduced bias and increased precision when compared to contemporary methods. We also use our models to reconstruct past changes in genetic diversity of human hepatitis C virus in Egypt and to estimate population size changes of ancient and modern steppe bison. These analyses show that our new method captures features of the population size trajectories that were missed by the state‐of‐the‐art methods.more » « less