skip to main content


Title: Fundamental Identifiability Limits in Molecular Epidemiology
Abstract Viral phylogenies provide crucial information on the spread of infectious diseases, and many studies fit mathematical models to phylogenetic data to estimate epidemiological parameters such as the effective reproduction ratio (Re) over time. Such phylodynamic inferences often complement or even substitute for conventional surveillance data, particularly when sampling is poor or delayed. It remains generally unknown, however, how robust phylodynamic epidemiological inferences are, especially when there is uncertainty regarding pathogen prevalence and sampling intensity. Here, we use recently developed mathematical techniques to fully characterize the information that can possibly be extracted from serially collected viral phylogenetic data, in the context of the commonly used birth-death-sampling model. We show that for any candidate epidemiological scenario, there exists a myriad of alternative, markedly different, and yet plausible “congruent” scenarios that cannot be distinguished using phylogenetic data alone, no matter how large the data set. In the absence of strong constraints or rate priors across the entire study period, neither maximum-likelihood fitting nor Bayesian inference can reliably reconstruct the true epidemiological dynamics from phylogenetic data alone; rather, estimators can only converge to the “congruence class” of the true dynamics. We propose concrete and feasible strategies for making more robust epidemiological inferences from viral phylogenetic data.  more » « less
Award ID(s):
2028986
NSF-PAR ID:
10296292
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Crandall, Keith
Date Published:
Journal Name:
Molecular Biology and Evolution
Volume:
38
Issue:
9
ISSN:
1537-1719
Page Range / eLocation ID:
4010 to 4024
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.

     
    more » « less
  2. Abstract We investigated SARS-CoV-2 transmission dynamics in Italy, one of the countries hit hardest by the pandemic, using phylodynamic analysis of viral genetic and epidemiological data. We observed the co-circulation of multiple SARS-CoV-2 lineages over time, which were linked to multiple importations and characterized by large transmission clusters concomitant with a high number of infections. Subsequent implementation of a three-phase nationwide lockdown strategy greatly reduced infection numbers and hospitalizations. Yet we present evidence of sustained viral spread among sporadic clusters acting as “hidden reservoirs” during summer 2020. Mathematical modelling shows that increased mobility among residents eventually catalyzed the coalescence of such clusters, thus driving up the number of infections and initiating a new epidemic wave. Our results suggest that the efficacy of public health interventions is, ultimately, limited by the size and structure of epidemic reservoirs, which may warrant prioritization during vaccine deployment. 
    more » « less
  3. Albert, James (Ed.)
    Abstract Birth–death stochastic processes are the foundations of many phylogenetic models and are widely used to make inferences about epidemiological and macroevolutionary dynamics. There are a large number of birth–death model variants that have been developed; these impose different assumptions about the temporal dynamics of the parameters and about the sampling process. As each of these variants was individually derived, it has been difficult to understand the relationships between them as well as their precise biological and mathematical assumptions. Without a common mathematical foundation, deriving new models is nontrivial. Here, we unify these models into a single framework, prove that many previously developed epidemiological and macroevolutionary models are all special cases of a more general model, and illustrate the connections between these variants. This unification includes both models where the process is the same for all lineages and those in which it varies across types. We also outline a straightforward procedure for deriving likelihood functions for arbitrarily complex birth–death(-sampling) models that will hopefully allow researchers to explore a wider array of scenarios than was previously possible. By rederiving existing single-type birth–death sampling models, we clarify and synthesize the range of explicit and implicit assumptions made by these models. [Birth–death processes; epidemiology; macroevolution; phylogenetics; statistical inference.] 
    more » « less
  4. Abstract

    Infectious diseases are a major threat for biodiversity conservation and can exert strong influence on wildlife population dynamics. Understanding the mechanisms driving infection rates and epidemic outcomes requires empirical data on the evolutionary trajectory of pathogens and host selective processes. Phylodynamics is a robust framework to understand the interaction of pathogen evolutionary processes with epidemiological dynamics, providing a powerful tool to evaluate disease control strategies. Tasmanian devils have been threatened by a fatal transmissible cancer, devil facial tumour disease (DFTD), for more than two decades. Here we employ a phylodynamic approach using tumour mitochondrial genomes to assess the role of tumour genetic diversity in epidemiological and population dynamics in a devil population subject to 12 years of intensive monitoring, since the beginning of the epidemic outbreak. DFTD molecular clock estimates of disease introduction mirrored observed estimates in the field, and DFTD genetic diversity was positively correlated with estimates of devil population size. However, prevalence and force of infection were the lowest when devil population size and tumour genetic diversity was the highest. This could be due to either differential virulence or transmissibility in tumour lineages or the development of host defence strategies against infection. Our results support the view that evolutionary processes and epidemiological trade‐offs can drive host‐pathogen coexistence, even when disease‐induced mortality is extremely high. We highlight the importance of integrating pathogen and population evolutionary interactions to better understand long‐term epidemic dynamics and evaluating disease control strategies.

     
    more » « less
  5. The use of viral sequence data to inform public health intervention has become increasingly common in the realm of epidemiology. Such methods typically utilize multiple sequence alignments and phylogenies estimated from the sequence data. Like all estimation techniques, they are error prone, yet the impacts of such imperfections on downstream epidemiological inferences are poorly understood. To address this, we executed multiple commonly used viral phylogenetic analysis workflows on simulated viral sequence data, modeling Human Immunodeficiency Virus (HIV), Hepatitis C Virus (HCV), and Ebolavirus, and we computed multiple methods of accuracy, motivated by transmission-clustering techniques. For multiple sequence alignment, MAFFT consistently outperformed MUSCLE and Clustal Omega, in both accuracy and runtime. For phylogenetic inference, FastTree 2, IQ-TREE, RAxML-NG, and PhyML had similar topological accuracies, but branch lengths and pairwise distances were consistently most accurate in phylogenies inferred by RAxML-NG. However, FastTree 2 was the fastest, by orders of magnitude, and when the other tools were used to optimize branch lengths along a fixed FastTree 2 topology, the resulting phylogenies had accuracies that were indistinguishable from their original counterparts, but with a fraction of the runtime. 
    more » « less