skip to main content


Title: Unifying Phylogenetic Birth–Death Models in Epidemiology and Macroevolution
Abstract Birth–death stochastic processes are the foundations of many phylogenetic models and are widely used to make inferences about epidemiological and macroevolutionary dynamics. There are a large number of birth–death model variants that have been developed; these impose different assumptions about the temporal dynamics of the parameters and about the sampling process. As each of these variants was individually derived, it has been difficult to understand the relationships between them as well as their precise biological and mathematical assumptions. Without a common mathematical foundation, deriving new models is nontrivial. Here, we unify these models into a single framework, prove that many previously developed epidemiological and macroevolutionary models are all special cases of a more general model, and illustrate the connections between these variants. This unification includes both models where the process is the same for all lineages and those in which it varies across types. We also outline a straightforward procedure for deriving likelihood functions for arbitrarily complex birth–death(-sampling) models that will hopefully allow researchers to explore a wider array of scenarios than was previously possible. By rederiving existing single-type birth–death sampling models, we clarify and synthesize the range of explicit and implicit assumptions made by these models. [Birth–death processes; epidemiology; macroevolution; phylogenetics; statistical inference.]  more » « less
Award ID(s):
2028986
NSF-PAR ID:
10296290
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Albert, James
Date Published:
Journal Name:
Systematic Biology
ISSN:
1063-5157
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Crandall, Keith (Ed.)
    Abstract Viral phylogenies provide crucial information on the spread of infectious diseases, and many studies fit mathematical models to phylogenetic data to estimate epidemiological parameters such as the effective reproduction ratio (Re) over time. Such phylodynamic inferences often complement or even substitute for conventional surveillance data, particularly when sampling is poor or delayed. It remains generally unknown, however, how robust phylodynamic epidemiological inferences are, especially when there is uncertainty regarding pathogen prevalence and sampling intensity. Here, we use recently developed mathematical techniques to fully characterize the information that can possibly be extracted from serially collected viral phylogenetic data, in the context of the commonly used birth-death-sampling model. We show that for any candidate epidemiological scenario, there exists a myriad of alternative, markedly different, and yet plausible “congruent” scenarios that cannot be distinguished using phylogenetic data alone, no matter how large the data set. In the absence of strong constraints or rate priors across the entire study period, neither maximum-likelihood fitting nor Bayesian inference can reliably reconstruct the true epidemiological dynamics from phylogenetic data alone; rather, estimators can only converge to the “congruence class” of the true dynamics. We propose concrete and feasible strategies for making more robust epidemiological inferences from viral phylogenetic data. 
    more » « less
  2. Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, and it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper. 
    more » « less
  3. Abstract

    Gene flow is increasingly recognized as an important macroevolutionary process. The many mechanisms that contribute to gene flow (e.g. introgression, hybridization, lateral gene transfer) uniquely affect the diversification of dynamics of species, making it important to be able to account for these idiosyncrasies when constructing phylogenetic models. Existing phylogenetic‐network simulators for macroevolution are limited in the ways they model gene flow.

    We presentSiPhyNetwork, an R package for simulating phylogenetic networks under a birth–death‐hybridization process.

    Our package unifies the existing birth–death‐hybridization models while also extending the toolkit for modelling gene flow. This tool can create patterns of reticulation such as hybridization, lateral gene transfer, and introgression.

    Specifically, we model different reticulate events by allowing events to either add, remove or keep constant the number of lineages. Additionally, we allow reticulation events to be trait dependent, creating the ability to model the expanse of isolating mechanisms that prevent gene flow. This tool makes it possible for researchers to model many of the complex biological factors associated with gene flow in a phylogenetic context.

     
    more » « less
  4. Abstract

    Identifying along which lineages shifts in diversification rates occur is a central goal of comparative phylogenetics; these shifts may coincide with key evolutionary events such as the development of novel morphological characters, the acquisition of adaptive traits, polyploidization or other structural genomic changes, or dispersal to a new habitat and subsequent increase in environmental niche space. However, while multiple methods now exist to estimate diversification rates and identify shifts using phylogenetic topologies, the appropriate use and accuracy of these methods are hotly debated. Here we test whether five Bayesian methods—Bayesian Analysis of Macroevolutionary Mixtures (BAMM), two implementations of the Lineage-Specific Birth–Death–Shift model (LSBDS and PESTO), the approximate Multi-Type Birth–Death model (MTBD; implemented in BEAST2), and the Cladogenetic Diversification Rate Shift model (ClaDS2)—produce comparable results. We apply each of these methods to a set of 65 empirical time-calibrated phylogenies and compare inferences of speciation rate, extinction rate, and net diversification rate. We find that the five methods often infer different speciation, extinction, and net-diversification rates. Consequently, these different estimates may lead to different interpretations of the macroevolutionary dynamics. The different estimates can be attributed to fundamental differences among the compared models. Therefore, the inference of shifts in diversification rates is strongly method dependent. We advise biologists to apply multiple methods to test the robustness of the conclusions or to carefully select the method based on the validity of the underlying model assumptions to their particular empirical system.

     
    more » « less
  5. Abstract

    World‐wide, infectious diseases represent a major source of mortality in humans and livestock. For wildlife populations, disease‐induced mortality is likely even greater, but remains notoriously difficult to estimate—especially for endemic infections. Approaches for quantifying wildlife mortality due to endemic infections have historically been limited by an inability to directly observe wildlife mortality in nature.

    Here we address a question that can rarely be answered for endemic pathogens of wildlife: what are the population‐ and landscape‐level effects of infection on host mortality? We combined laboratory experiments, extensive field data and novel mathematical models to indirectly estimate the magnitude of mortality induced by an endemic, virulent trematode parasite (Ribeiroia ondatrae) on hundreds of amphibian populations spanning four native species.

    We developed a flexible statistical model that uses patterns of aggregation in parasite abundance to infer host mortality. Our model improves on previous approaches for inferring host mortality from parasite abundance data by (i) relaxing restrictive assumptions on the timing of host mortality and sampling, (ii) placing all mortality inference within a Bayesian framework to better quantify uncertainty and (iii) accommodating data from laboratory experiments and field sampling to allow for estimates and comparisons of mortality within and among host populations.

    Applying our approach to 301 amphibian populations, we found that trematode infection was associated with an average of between 13% and 40% population‐level mortality. For three of the four amphibian species, our models predicted that some populations experienced >90% mortality due to infection, leading to mortality of thousands of amphibian larvae within a pond. At the landscape scale, the total number of amphibians predicted to succumb to infection was driven by a few high mortality sites, with fewer than 20% of sites contributing to greater than 80% of amphibian mortality on the landscape.

    The mortality estimates in this study provide a rare glimpse into the magnitude of effects that endemic parasites can have on wildlife populations and our theoretical framework for indirectly inferring parasite‐induced mortality can be applied to other host–parasite systems to help reveal the hidden death toll of pathogens on wildlife hosts.

     
    more » « less