Divergence time estimation is crucial to provide temporal signals for dating biologically important events from species divergence to viral transmissions in space and time. With the advent of high-throughput sequencing, recent Bayesian phylogenetic studies have analyzed hundreds to thousands of sequences. Such large-scale analyses challenge divergence time reconstruction by requiring inference on highly correlated internal node heights that often become computationally infeasible. To overcome this limitation, we explore a ratio transformation that maps the original $N-1$ internal node heights into a space of one height parameter and $N-2$ ratio parameters. To make the analyses scalable, we develop a collection of linear-time algorithms to compute the gradient and Jacobian-associated terms of the log-likelihood with respect to these ratios. We then apply Hamiltonian Monte Carlo sampling with the ratio transform in a Bayesian framework to learn the divergence times in 4 pathogenic viruses (West Nile virus, rabies virus, Lassa virus, and Ebola virus) and the coralline red algae. Our method both resolves a mixing issue in the West Nile virus example and improves inference efficiency by at least 5-fold for the Lassa and rabies virus examples as well as for the algae example. Our method now also makes it computationally feasible to incorporate mixed-effects molecular clock models for the Ebola virus example, confirms the findings from the original study, and reveals clearer multimodal distributions of the divergence times of some clades of interest.
more » « less- Award ID(s):
- 1754142
- PAR ID:
- 10556088
- Editor(s):
- Yang, Ziheng
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Systematic Biology
- Volume:
- 72
- Issue:
- 5
- ISSN:
- 1063-5157
- Page Range / eLocation ID:
- 1136 to 1153
- Subject(s) / Keyword(s):
- Bayesian inference divergence time estimation effective sample size Hamiltonian Monte Carlo pathogens phylogenetics ratio transformation
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)Abstract Motivation Precise time calibrations needed to estimate ages of species divergence are not always available due to fossil records' incompleteness. Consequently, clock calibrations available for Bayesian dating analyses can be few and diffused, i.e. phylogenies are calibration-poor, impeding reliable inference of the timetree of life. We examined the role of speciation birth–death (BD) tree prior on Bayesian node age estimates in calibration-poor phylogenies and tested the usefulness of an informative, data-driven tree prior to enhancing the accuracy and precision of estimated times. Results We present a simple method to estimate parameters of the BD tree prior from the molecular phylogeny for use in Bayesian dating analyses. The use of a data-driven birth–death (ddBD) tree prior leads to improvement in Bayesian node age estimates for calibration-poor phylogenies. We show that the ddBD tree prior, along with only a few well-constrained calibrations, can produce excellent node ages and credibility intervals, whereas the use of an uninformative, uniform (flat) tree prior may require more calibrations. Relaxed clock dating with ddBD tree prior also produced better results than a flat tree prior when using diffused node calibrations. We also suggest using ddBD tree priors to improve the detection of outliers and influential calibrations in cross-validation analyses. These results have practical applications because the ddBD tree prior reduces the number of well-constrained calibrations necessary to obtain reliable node age estimates. This would help address key impediments in building the grand timetree of life, revealing the process of speciation and elucidating the dynamics of biological diversification. Availability and implementation An R module for computing the ddBD tree prior, simulated datasets and empirical datasets are available at https://github.com/cathyqqtao/ddBD-tree-prior.more » « less
-
Abstract Much of our understanding of the history of life hinges upon time calibration, the process of assigning absolute times to cladogenetic events. Bayesian approaches to time‐scaling phylogenetic trees have dramatically grown in complexity, and depend today upon numerous methodological choices. Arriving at objective justifications for all of these is difficult and time‐consuming. Thus, divergence times are routinely inferred under only one or a handful of parametric conditions, often times chosen arbitrarily. Progress towards building robust biological timescales necessitates the development of better methods to visualize and quantify the sensitivity of results to these decisions.
Here, we present an R package that assists in this endeavour through the use of chronospaces, that is, graphical representations summarizing variation in the node ages contained in time‐calibrated trees. We further test this approach by estimating divergence times for three empirical datasets—spanning widely differing evolutionary timeframes—using the software PhyloBayes.
Our results reveal large differences in the impact of many common methodological decisions, with the choice of clock (uncorrelated vs autocorrelated) and loci having strong effects on inferred ages. Other decisions have comparatively minor consequences, including the use of the computationally intensive site‐heterogeneous model CAT‐GTR, whose effect might only be discernible for exceedingly old divergences (e.g. the deepest eukaryote nodes).
The package
chronospace implements a range of graphical and analytical tools that assist in the exploration of sensitivity and the prioritization of computational resources in the inference of divergence times. -
ABSTRACT Zoonotic pathogens pose a significant risk to human health, with spillover into human populations contributing to chronic disease, sporadic epidemics, and occasional pandemics. Despite the widely recognized burden of zoonotic spillover, our ability to identify which animal populations serve as primary reservoirs for these pathogens remains incomplete. This challenge is compounded when prevalence reaches detectable levels only at specific times of year. In these cases, statistical models designed to predict the timing of peak prevalence could guide field sampling for active infections. Here we develop a general model that leverages routinely collected serosurveillance data to optimize sampling for elusive pathogens. Using simulated data sets we show that our methodology reliably identifies times when pathogen prevalence is expected to peak. We then apply our method to two putative
Ebolavirus reservoirs, straw-colored fruit bats (Eidolon helvum ) and hammer-headed bats (Hypsignathus monstrosus ) to predict when these species should be sampled to maximize the probability of detecting active infections. In addition to guiding future sampling of these species, our method yields predictions for the times of year that are most likely to produce future spillover events. The generality and simplicity of our methodology make it broadly applicable to a wide range of putative reservoir species where seasonal patterns of birth lead to predictable, but potentially short-lived, pulses of pathogen prevalence.AUTHOR SUMMARY Many deadly pathogens, such as Ebola, Lassa, and Nipah viruses, originate in wildlife and jump to human populations. When this occurs, human health is at risk. At the extreme, this can lead to pandemics such as the West African Ebola epidemic and the COVID-19 pandemic. Despite the widely recognized risk wildlife pathogens pose to humans, identifying host species that serve as primary reservoirs for many pathogens remains challenging. Ebola is a notable example of a pathogen with an unconfirmed wildlife reservoir. A key obstacle to confirming reservoir hosts is sampling animals with active infections. Often, disease prevalence fluctuates seasonally in wildlife populations and only reaches detectable levels at certain times of year. In these cases, statistical models designed to predict the timing of peak prevalence could guide efficient field sampling for active infections. Therefore, we have developed a general model that uses serological data to predict times of year when pathogen prevalence is likely to peak. We demonstrate with simulated data that our method produces reliable predictions, and then apply our method to two hypothesized reservoirs for Ebola virus, straw-colored fruit bats and hammer-headed bats. Our method can be broadly applied to a range of potential reservoir species where seasonal patterns of birth can lead to predictable pulses of peak pathogen prevalence. Overall, our method can guide future sampling of reservoir populations and can also be used to make predictions for times of year that future outbreaks in human populations are most likely to occur.
-
Abstract Pathogen spillover from wildlife to humans or domestic animals requires a series of conditions to align with space and time. Comparing these conditions between times and locations where spillover does and does not occur presents opportunities to understand the factors that shape spillover risk. Bovine rabies transmitted by vampire bats was first confirmed in 1911 and has since been detected across the distribution of vampire bats. However, Uruguay is an exception. Uruguay was free of bovine rabies until 2007, despite high-cattle densities, the presence of vampire bats and a strong surveillance system. To explore why Uruguay was free of bovine rabies until recently, we review the historic literature and reconstruct the conditions that would allow rabies invasion into Uruguay. We used available historical records on the abundance of livestock and wildlife, the vampire bat distribution and occurrence of rabies outbreaks, as well as environmental modifications, to propose four alternative hypotheses to explain rabies virus emergence and spillover: bat movement, viral invasion, surveillance failure and environmental changes. While future statistical modelling efforts will be required to disentangle these hypotheses, we here show how a detailed historical analysis can be used to generate testable predictions for the conditions leading to pathogen spillover.more » « less
-
Su, Bing (Ed.)Abstract Confidence intervals (CIs) depict the statistical uncertainty surrounding evolutionary divergence time estimates. They capture variance contributed by the finite number of sequences and sites used in the alignment, deviations of evolutionary rates from a strict molecular clock in a phylogeny, and uncertainty associated with clock calibrations. Reliable tests of biological hypotheses demand reliable CIs. However, current non-Bayesian methods may produce unreliable CIs because they do not incorporate rate variation among lineages and interactions among clock calibrations properly. Here, we present a new analytical method to calculate CIs of divergence times estimated using the RelTime method, along with an approach to utilize multiple calibration uncertainty densities in dating analyses. Empirical data analyses showed that the new methods produce CIs that overlap with Bayesian highest posterior density intervals. In the analysis of computer-simulated data, we found that RelTime CIs show excellent average coverage probabilities, that is, the actual time is contained within the CIs with a 94% probability. These developments will encourage broader use of computationally efficient RelTime approaches in molecular dating analyses and biological hypothesis testing.more » « less