skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 6, 2026

Title: Accounting for reporting delays in real-time phylodynamic analyses with preferential sampling
The COVID-19 pandemic demonstrated that fast and accurate analysis of continually collected infectious disease surveillance data is crucial for situational awareness and policy making. Coalescent-based phylodynamic analysis can use genetic sequences of a pathogen to estimate changes in its effective population size, a measure of genetic diversity. These changes in effective population size can be connected to the changes in the number of infections in the population of interest under certain conditions. Phylodynamics is an important set of tools because its methods are often resilient to the ascertainment biases present in traditional surveillance data (e.g., preferentially testing symptomatic individuals). Unfortunately, it takes weeks or months to sequence and deposit the sampled pathogen genetic sequences into a database, making them available for such analyses. These reporting delays severely decrease precision of phylodynamic methods closer to present time, and for some models can lead to extreme biases. Here we present a method that affords reliable estimation of the effective population size trajectory closer to the time of data collection, allowing for policy decisions to be based on more recent data. Our work uses readily available historic times between sampling and reporting of sequenced samples for a population of interest, and incorporates this information into the sampling model to mitigate the effects of reporting delay in real-time analyses. We illustrate our methodology on simulated data and on SARS-CoV-2 sequences collected in the state of Washington in 2021.  more » « less
Award ID(s):
2143242
PAR ID:
10589236
Author(s) / Creator(s):
; ;
Corporate Creator(s):
Editor(s):
Barido-Sottani, Joëlle
Publisher / Repository:
Public Library of Science
Date Published:
Journal Name:
PLOS Computational Biology
Edition / Version:
1
Volume:
21
Issue:
5
ISSN:
1553-7358
Page Range / eLocation ID:
e1012970
Subject(s) / Keyword(s):
Phylodynamics preferential sampling coalescent
Format(s):
Medium: X Other: pdf
Sponsoring Org:
National Science Foundation
More Like this
  1. Crandall, Keith (Ed.)
    Abstract Viral phylogenies provide crucial information on the spread of infectious diseases, and many studies fit mathematical models to phylogenetic data to estimate epidemiological parameters such as the effective reproduction ratio (Re) over time. Such phylodynamic inferences often complement or even substitute for conventional surveillance data, particularly when sampling is poor or delayed. It remains generally unknown, however, how robust phylodynamic epidemiological inferences are, especially when there is uncertainty regarding pathogen prevalence and sampling intensity. Here, we use recently developed mathematical techniques to fully characterize the information that can possibly be extracted from serially collected viral phylogenetic data, in the context of the commonly used birth-death-sampling model. We show that for any candidate epidemiological scenario, there exists a myriad of alternative, markedly different, and yet plausible “congruent” scenarios that cannot be distinguished using phylogenetic data alone, no matter how large the data set. In the absence of strong constraints or rate priors across the entire study period, neither maximum-likelihood fitting nor Bayesian inference can reliably reconstruct the true epidemiological dynamics from phylogenetic data alone; rather, estimators can only converge to the “congruence class” of the true dynamics. We propose concrete and feasible strategies for making more robust epidemiological inferences from viral phylogenetic data. 
    more » « less
  2. Rogers, Rebekah (Ed.)
    Abstract The ongoing global pandemic has sharply increased the amount of data available to researchers in epidemiology and public health. Unfortunately, few existing analysis tools are capable of exploiting all of the information contained in a pandemic-scale data set, resulting in missed opportunities for improved surveillance and contact tracing. In this paper, we develop the variational Bayesian skyline (VBSKY), a method for fitting Bayesian phylodynamic models to very large pathogen genetic data sets. By combining recent advances in phylodynamic modeling, scalable Bayesian inference and differentiable programming, along with a few tailored heuristics, VBSKY is capable of analyzing thousands of genomes in a few minutes, providing accurate estimates of epidemiologically relevant quantities such as the effective reproduction number and overall sampling effort through time. We illustrate the utility of our method by performing a rapid analysis of a large number of SARS-CoV-2 genomes, and demonstrate that the resulting estimates closely track those derived from alternative sources of public health data. 
    more » « less
  3. Abstract Phylodynamics is an area of population genetics that uses genetic sequence data to estimate past population dynamics. Modern state‐of‐the‐art Bayesian nonparametric methods for recovering population size trajectories of unknown form use either change‐point models or Gaussian process priors. Change‐point models suffer from computational issues when the number of change‐points is unknown and needs to be estimated. Gaussian process‐based methods lack local adaptivity and cannot accurately recover trajectories that exhibit features such as abrupt changes in trend or varying levels of smoothness. We propose a novel, locally adaptive approach to Bayesian nonparametric phylodynamic inference that has the flexibility to accommodate a large class of functional behaviors. Local adaptivity results from modeling the log‐transformed effective population size a priori as a horseshoe Markov random field, a recently proposed statistical model that blends together the best properties of the change‐point and Gaussian process modeling paradigms. We use simulated data to assess model performance, and find that our proposed method results in reduced bias and increased precision when compared to contemporary methods. We also use our models to reconstruct past changes in genetic diversity of human hepatitis C virus in Egypt and to estimate population size changes of ancient and modern steppe bison. These analyses show that our new method captures features of the population size trajectories that were missed by the state‐of‐the‐art methods. 
    more » « less
  4. Abstract Infectious diseases are a major threat for biodiversity conservation and can exert strong influence on wildlife population dynamics. Understanding the mechanisms driving infection rates and epidemic outcomes requires empirical data on the evolutionary trajectory of pathogens and host selective processes. Phylodynamics is a robust framework to understand the interaction of pathogen evolutionary processes with epidemiological dynamics, providing a powerful tool to evaluate disease control strategies. Tasmanian devils have been threatened by a fatal transmissible cancer, devil facial tumour disease (DFTD), for more than two decades. Here we employ a phylodynamic approach using tumour mitochondrial genomes to assess the role of tumour genetic diversity in epidemiological and population dynamics in a devil population subject to 12 years of intensive monitoring, since the beginning of the epidemic outbreak. DFTD molecular clock estimates of disease introduction mirrored observed estimates in the field, and DFTD genetic diversity was positively correlated with estimates of devil population size. However, prevalence and force of infection were the lowest when devil population size and tumour genetic diversity was the highest. This could be due to either differential virulence or transmissibility in tumour lineages or the development of host defence strategies against infection. Our results support the view that evolutionary processes and epidemiological trade‐offs can drive host‐pathogen coexistence, even when disease‐induced mortality is extremely high. We highlight the importance of integrating pathogen and population evolutionary interactions to better understand long‐term epidemic dynamics and evaluating disease control strategies. 
    more » « less
  5. Abstract Surveillance and monitoring of zoonotic pathogens is key to identifying and mitigating emerging public health threats. Surveillance is often designed to be taxonomically targeted or systematically dispersed across geography, however, those approaches may not represent the breadth of environments inhabited by a host, vector, or pathogen, leaving significant gaps in our understanding of pathogen dynamics in their natural reservoirs and environments. As a case study on the design of pathogen surveillance programs, we assess how well 20 years of small mammal surveys in Panamá have sampled available environments and propose a multistep approach to selecting survey localities in the future. We use >8,000 georeferenced mammal specimen records, collected as part of a long-term hantavirus surveillance program, to test the completeness of country-wide environmental sampling. Despite 20 years of surveillance, our analyses identified a few key environmental sampling gaps. To refine surveillance strategies, we selected a series of core historically sampled localities, supplemented with additional environmentally distinct sites to more completely represent Panama’s environments. Based on lessons learned through decades of surveillance, we propose a series of recommendations to improve strategic sampling for zoonotic pathogen surveillance. 
    more » « less