skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 16 until 2:00 AM ET on Saturday, May 17 due to maintenance. We apologize for the inconvenience.


Title: Horseshoe‐based Bayesian nonparametric estimation of effective population size trajectories
Abstract Phylodynamics is an area of population genetics that uses genetic sequence data to estimate past population dynamics. Modern state‐of‐the‐art Bayesian nonparametric methods for recovering population size trajectories of unknown form use either change‐point models or Gaussian process priors. Change‐point models suffer from computational issues when the number of change‐points is unknown and needs to be estimated. Gaussian process‐based methods lack local adaptivity and cannot accurately recover trajectories that exhibit features such as abrupt changes in trend or varying levels of smoothness. We propose a novel, locally adaptive approach to Bayesian nonparametric phylodynamic inference that has the flexibility to accommodate a large class of functional behaviors. Local adaptivity results from modeling the log‐transformed effective population size a priori as a horseshoe Markov random field, a recently proposed statistical model that blends together the best properties of the change‐point and Gaussian process modeling paradigms. We use simulated data to assess model performance, and find that our proposed method results in reduced bias and increased precision when compared to contemporary methods. We also use our models to reconstruct past changes in genetic diversity of human hepatitis C virus in Egypt and to estimate population size changes of ancient and modern steppe bison. These analyses show that our new method captures features of the population size trajectories that were missed by the state‐of‐the‐art methods.  more » « less
Award ID(s):
1754451
PAR ID:
10456968
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
76
Issue:
3
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 677-690
Size(s):
p. 677-690
Sponsoring Org:
National Science Foundation
More Like this
  1. NA (Ed.)
    Variation in a sample of molecular sequence data informs about the past evolutionary history of the sample’s population. Traditionally, Bayesian modelling coupled with the standard coalescent is used to infer the sample’s bifurcating genealogy and demographic and evolutionary parameters such as effective population size and mutation rates. However, there are many situations where binary coalescent models do not accurately reflect the true underlying ancestral processes. Here, we propose a Bayesian non-parametric method for inferring effective population size trajectories from a multifurcating genealogy under the Lambda-coalescent. In particular, we jointly estimate the effective population size and the model parameter for the Beta-coalescent model, a special type of Lambda-coalescent. Finally, we test our methods on simulations and apply them to study various viral dynamics as well as Japanese sardine population size changes over time. The code and vignettes can be found in the phylodyn package. This article is part of the theme issue ‘“A mathematical theory of evolution”: phylogenetic models dating back 100 years’. 
    more » « less
  2. We consider the problem of online learning in the presence of distribution shifts that occur at an unknown rate and of unknown intensity. We derive a new Bayesian online inference approach to simultaneously infer these distribution shifts and adapt the model to the detected changes by integrating ideas from change point detection, switching dynamical systems, and Bayesian online learning. Using a binary ‘change variable,’ we construct an informative prior such that--if a change is detected--the model partially erases the information of past model updates by tempering to facilitate adaptation to the new data distribution. Furthermore, the approach uses beam search to track multiple change-point hypotheses and selects the most probable one in hindsight. Our proposed method is model-agnostic, applicable in both supervised and unsupervised learning settings, suitable for an environment of concept drifts or covariate drifts, and yields improvements over state-of-the-art Bayesian online learning approaches. 
    more » « less
  3. Abstract Understanding animal movement often relies upon telemetry and biologging devices. These data are frequently used to estimate latent behavioural states to help understand why animals move across the landscape. While there are a variety of methods that make behavioural inferences from biotelemetry data, some features of these methods (e.g. analysis of a single data stream, use of parametric distributions) may limit their generality to reliably discriminate among behavioural states.To address some of the limitations of existing behavioural state estimation models, we introduce a nonparametric Bayesian framework called the mixed‐membership method for movement (M4), which is available within the open‐sourcebayesmoveR package. This framework can analyse multiple data streams (e.g. step length, turning angle, acceleration) without relying on parametric distributions, which may capture complex behaviours more successfully than current methods. We tested our Bayesian framework using simulated trajectories and compared model performance against two segmentation methods (behavioural change point analysis (BCPA) and segclust2d), one machine learning method [expectation‐maximization binary clustering (EMbC)] and one type of state‐space model [hidden Markov model (HMM)]. We also illustrated this Bayesian framework using movements of juvenile snail kitesRostrhamus sociabilisin Florida, USA.The Bayesian framework estimated breakpoints more accurately than the other segmentation methods for tracks of different lengths. Likewise, the Bayesian framework provided more accurate estimates of behaviour than the other state estimation methods when simulations were generated from less frequently considered distributions (e.g. truncated normal, beta, uniform). Three behavioural states were estimated from snail kite movements, which were labelled as ‘encamped’, ‘area‐restricted search’ and ‘transit’. Changes in these behaviours over time were associated with known dispersal events from the nest site, as well as movements to and from possible breeding locations.Our nonparametric Bayesian framework estimated behavioural states with comparable or superior accuracy compared to the other methods when step lengths and turning angles of simulations were generated from less frequently considered distributions. Since the most appropriate parametric distributions may not be obvious a priori, methods (such as M4) that are agnostic to the underlying distributions can provide powerful alternatives to address questions in movement ecology. 
    more » « less
  4. Abstract We develop a prior probability model for temporal Poisson process intensities through structured mixtures of Erlang densities with common scale parameter, mixing on the integer shape parameters. The mixture weights are constructed through increments of a cumulative intensity function which is modeled nonparametrically with a gamma process prior. Such model specification provides a novel extension of Erlang mixtures for density estimation to the intensity estimation setting. The prior model structure supports general shapes for the point process intensity function, and it also enables effective handling of the Poisson process likelihood normalizing term resulting in efficient posterior simulation. The Erlang mixture modeling approach is further elaborated to develop an inference method for spatial Poisson processes. The methodology is examined relative to existing Bayesian nonparametric modeling approaches, including empirical comparison with Gaussian process prior based models, and is illustrated with synthetic and real data examples. 
    more » « less
  5. Abstract The demographic history of a population is important for conservation and evolution, but this history is unknown for many populations. Methods that use genomic data have been developed to infer demography, but they can be challenging to implement and interpret, particularly for large populations. Thus, understanding if and when genetic estimates of demography correspond to true population history is important for assessing the performance of these genetic methods. Here, we used double‐digest restriction‐site associated DNA (ddRAD) sequencing data from archived collections of larval summer flounder (Paralichthys dentatus,n = 279) from three cohorts (1994–1995, 1997–1998 and 2008–2009) along the U.S. East coast to examine how contemporary effective population size and genetic diversity responded to changes in abundance in a natural population. Despite little to no detectable change in genetic diversity, coalescent‐based demographic modelling from site frequency spectra revealed that summer flounder effective population size declined dramatically in the early 1980s. The timing and direction of change corresponded well with the observed decline in spawning stock census abundance in the late 1980s from independent fish surveys. Census abundance subsequently recovered and achieved the prebottleneck size. Effective population size also grew following the bottleneck. Our results for summer flounder demonstrate that genetic sampling and site frequency spectra can be useful for detecting population dynamics, even in species with large effective sizes. 
    more » « less