skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Scalable Bayesian Divergence Time Estimation With Ratio Transformations
Abstract Divergence time estimation is crucial to provide temporal signals for dating biologically important events from species divergence to viral transmissions in space and time. With the advent of high-throughput sequencing, recent Bayesian phylogenetic studies have analyzed hundreds to thousands of sequences. Such large-scale analyses challenge divergence time reconstruction by requiring inference on highly correlated internal node heights that often become computationally infeasible. To overcome this limitation, we explore a ratio transformation that maps the original $N-1$ internal node heights into a space of one height parameter and $N-2$ ratio parameters. To make the analyses scalable, we develop a collection of linear-time algorithms to compute the gradient and Jacobian-associated terms of the log-likelihood with respect to these ratios. We then apply Hamiltonian Monte Carlo sampling with the ratio transform in a Bayesian framework to learn the divergence times in 4 pathogenic viruses (West Nile virus, rabies virus, Lassa virus, and Ebola virus) and the coralline red algae. Our method both resolves a mixing issue in the West Nile virus example and improves inference efficiency by at least 5-fold for the Lassa and rabies virus examples as well as for the algae example. Our method now also makes it computationally feasible to incorporate mixed-effects molecular clock models for the Ebola virus example, confirms the findings from the original study, and reveals clearer multimodal distributions of the divergence times of some clades of interest.  more » « less
Award ID(s):
1754142
PAR ID:
10556088
Author(s) / Creator(s):
; ; ; ; ; ; ;
Editor(s):
Yang, Ziheng
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Systematic Biology
Volume:
72
Issue:
5
ISSN:
1063-5157
Page Range / eLocation ID:
1136 to 1153
Subject(s) / Keyword(s):
Bayesian inference divergence time estimation effective sample size Hamiltonian Monte Carlo pathogens phylogenetics ratio transformation
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Motivation Precise time calibrations needed to estimate ages of species divergence are not always available due to fossil records' incompleteness. Consequently, clock calibrations available for Bayesian dating analyses can be few and diffused, i.e. phylogenies are calibration-poor, impeding reliable inference of the timetree of life. We examined the role of speciation birth–death (BD) tree prior on Bayesian node age estimates in calibration-poor phylogenies and tested the usefulness of an informative, data-driven tree prior to enhancing the accuracy and precision of estimated times. Results We present a simple method to estimate parameters of the BD tree prior from the molecular phylogeny for use in Bayesian dating analyses. The use of a data-driven birth–death (ddBD) tree prior leads to improvement in Bayesian node age estimates for calibration-poor phylogenies. We show that the ddBD tree prior, along with only a few well-constrained calibrations, can produce excellent node ages and credibility intervals, whereas the use of an uninformative, uniform (flat) tree prior may require more calibrations. Relaxed clock dating with ddBD tree prior also produced better results than a flat tree prior when using diffused node calibrations. We also suggest using ddBD tree priors to improve the detection of outliers and influential calibrations in cross-validation analyses. These results have practical applications because the ddBD tree prior reduces the number of well-constrained calibrations necessary to obtain reliable node age estimates. This would help address key impediments in building the grand timetree of life, revealing the process of speciation and elucidating the dynamics of biological diversification. Availability and implementation An R module for computing the ddBD tree prior, simulated datasets and empirical datasets are available at https://github.com/cathyqqtao/ddBD-tree-prior. 
    more » « less
  2. Abstract Much of our understanding of the history of life hinges upon time calibration, the process of assigning absolute times to cladogenetic events. Bayesian approaches to time‐scaling phylogenetic trees have dramatically grown in complexity, and depend today upon numerous methodological choices. Arriving at objective justifications for all of these is difficult and time‐consuming. Thus, divergence times are routinely inferred under only one or a handful of parametric conditions, often times chosen arbitrarily. Progress towards building robust biological timescales necessitates the development of better methods to visualize and quantify the sensitivity of results to these decisions.Here, we present an R package that assists in this endeavour through the use of chronospaces, that is, graphical representations summarizing variation in the node ages contained in time‐calibrated trees. We further test this approach by estimating divergence times for three empirical datasets—spanning widely differing evolutionary timeframes—using the software PhyloBayes.Our results reveal large differences in the impact of many common methodological decisions, with the choice of clock (uncorrelated vs autocorrelated) and loci having strong effects on inferred ages. Other decisions have comparatively minor consequences, including the use of the computationally intensive site‐heterogeneous model CAT‐GTR, whose effect might only be discernible for exceedingly old divergences (e.g. the deepest eukaryote nodes).The packagechronospaceimplements a range of graphical and analytical tools that assist in the exploration of sensitivity and the prioritization of computational resources in the inference of divergence times. 
    more » « less
  3. Abstract Pathogen spillover from wildlife to humans or domestic animals requires a series of conditions to align with space and time. Comparing these conditions between times and locations where spillover does and does not occur presents opportunities to understand the factors that shape spillover risk. Bovine rabies transmitted by vampire bats was first confirmed in 1911 and has since been detected across the distribution of vampire bats. However, Uruguay is an exception. Uruguay was free of bovine rabies until 2007, despite high-cattle densities, the presence of vampire bats and a strong surveillance system. To explore why Uruguay was free of bovine rabies until recently, we review the historic literature and reconstruct the conditions that would allow rabies invasion into Uruguay. We used available historical records on the abundance of livestock and wildlife, the vampire bat distribution and occurrence of rabies outbreaks, as well as environmental modifications, to propose four alternative hypotheses to explain rabies virus emergence and spillover: bat movement, viral invasion, surveillance failure and environmental changes. While future statistical modelling efforts will be required to disentangle these hypotheses, we here show how a detailed historical analysis can be used to generate testable predictions for the conditions leading to pathogen spillover. 
    more » « less
  4. Su, Bing (Ed.)
    Abstract Confidence intervals (CIs) depict the statistical uncertainty surrounding evolutionary divergence time estimates. They capture variance contributed by the finite number of sequences and sites used in the alignment, deviations of evolutionary rates from a strict molecular clock in a phylogeny, and uncertainty associated with clock calibrations. Reliable tests of biological hypotheses demand reliable CIs. However, current non-Bayesian methods may produce unreliable CIs because they do not incorporate rate variation among lineages and interactions among clock calibrations properly. Here, we present a new analytical method to calculate CIs of divergence times estimated using the RelTime method, along with an approach to utilize multiple calibration uncertainty densities in dating analyses. Empirical data analyses showed that the new methods produce CIs that overlap with Bayesian highest posterior density intervals. In the analysis of computer-simulated data, we found that RelTime CIs show excellent average coverage probabilities, that is, the actual time is contained within the CIs with a 94% probability. These developments will encourage broader use of computationally efficient RelTime approaches in molecular dating analyses and biological hypothesis testing. 
    more » « less
  5. Abstract Time‐scaled phylogenies underpin the interrogation of evolutionary processes across deep timescales, as well as attempts to link these to Earth's history. By inferring the placement of fossils and using their ages as temporal constraints, tip dating under the fossilized birth–death (FBD) process provides a coherent prior on divergence times. At the same time, it also links topological and temporal accuracy, as incorrectly placed fossil terminals should misinform divergence times. This could pose serious issues for obtaining accurate node ages, yet the interaction between topological and temporal error has not been thoroughly explored. We simulate phylogenies and associated morphological datasets using methodologies that incorporate evolution under selection, and are benchmarked against empirical datasets. We find that datasets of 300 characters and realistic levels of missing data generally succeed in inferring the correct placement of fossils on a constrained extant backbone topology, and that true node ages are usually contained within Bayesian posterior distributions. While increased fossil sampling improves the accuracy of inferred ages, topological and temporal errors do not seem to be linked: analyses in which fossils resolve less accurately do not exhibit elevated errors in node age estimates. At the same time, inferred divergence times are biased, probably due to a mismatch between the FBD prior and the shape of our simulated trees. While these results are encouraging, suggesting that even fossils with uncertain affinities can provide useful temporal information, they also emphasize that palaeontological information cannot overturn discrepancies between model priors and the true diversification history. 
    more » « less