Background: In the wake of the COVID-19 pandemic, scientists have scrambled to collect and analyze SARS-CoV-2 genomic data to inform public health responses to COVID-19 in real-time. Open-source phylogenetic and data visualization platforms for monitoring SARS-CoV-2 genomic epidemiology have rapidly gained popularity for their ability to illuminate spatial-temporal transmission patterns worldwide. However, the utility of such tools to inform public health decision-making for COVID-19 in real-time remains to be explored. Objective: The objective of this study was to convene experts in public health, infectious diseases, virology, and bioinformatics – many of whom were actively engaged in the COVID-19 response at the time of their participation – to discuss the application of phylodynamic tools to inform pandemic responses. Methods: A series of four virtual focus group discussions were hosted between June 2020 and June 2021, covering the pre- and post-variant and vaccination eras of the COVID-19 crisis. Audio recordings were transcribed verbatim, and an iterative, thematic qualitative framework was used for analysis. Results: Of the 41 individuals invited, 23 total participants (56.1%) agreed to participate. Across the four focus group sessions, 15 (65%) of the participants were female, 17 (74%) were white, and 5 (22%) were black. Participants were described as molecular epidemiologists (ME, n=9), clinician-researchers (n=3), infectious disease experts (ID, n=4), and public health professionals (PH) at the local (n=4), state (n=2), and federal (n=1) levels. Collectively, participants felt that successful uptake of phylodynamic tools relies on the strength of academic-public health partnerships. They called for interoperability standards in sequence data sharing and cited many resource issues that must be addressed, including timeliness and cost, in addition to improving issues related to sampling bias and the translation of phylodynamic findings into public health action. Conclusions: This was the first qualitative study to characterize the perspectives of key experts regarding the utility of phylodynamic tools for the public health response to COVID-19. The focus group participants identified key areas for improvement of existing and future phylogenetic and data visualization platforms for monitoring SARS-CoV-2 genomic epidemiology. This information is critical to both policymakers and developers as they consider how to handle existing and emerging SARS-CoV-2 variants during the ongoing crisis.
more »
« less
Variational Phylodynamic Inference Using Pandemic-scale Data
Abstract The ongoing global pandemic has sharply increased the amount of data available to researchers in epidemiology and public health. Unfortunately, few existing analysis tools are capable of exploiting all of the information contained in a pandemic-scale data set, resulting in missed opportunities for improved surveillance and contact tracing. In this paper, we develop the variational Bayesian skyline (VBSKY), a method for fitting Bayesian phylodynamic models to very large pathogen genetic data sets. By combining recent advances in phylodynamic modeling, scalable Bayesian inference and differentiable programming, along with a few tailored heuristics, VBSKY is capable of analyzing thousands of genomes in a few minutes, providing accurate estimates of epidemiologically relevant quantities such as the effective reproduction number and overall sampling effort through time. We illustrate the utility of our method by performing a rapid analysis of a large number of SARS-CoV-2 genomes, and demonstrate that the resulting estimates closely track those derived from alternative sources of public health data.
more »
« less
- Award ID(s):
- 2052653
- PAR ID:
- 10403463
- Editor(s):
- Rogers, Rebekah
- Date Published:
- Journal Name:
- Molecular Biology and Evolution
- Volume:
- 39
- Issue:
- 8
- ISSN:
- 0737-4038
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract We investigated SARS-CoV-2 transmission dynamics in Italy, one of the countries hit hardest by the pandemic, using phylodynamic analysis of viral genetic and epidemiological data. We observed the co-circulation of multiple SARS-CoV-2 lineages over time, which were linked to multiple importations and characterized by large transmission clusters concomitant with a high number of infections. Subsequent implementation of a three-phase nationwide lockdown strategy greatly reduced infection numbers and hospitalizations. Yet we present evidence of sustained viral spread among sporadic clusters acting as “hidden reservoirs” during summer 2020. Mathematical modelling shows that increased mobility among residents eventually catalyzed the coalescence of such clusters, thus driving up the number of infections and initiating a new epidemic wave. Our results suggest that the efficacy of public health interventions is, ultimately, limited by the size and structure of epidemic reservoirs, which may warrant prioritization during vaccine deployment.more » « less
-
Abstract Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.more » « less
-
Barido-Sottani, Joëlle (Ed.)The COVID-19 pandemic demonstrated that fast and accurate analysis of continually collected infectious disease surveillance data is crucial for situational awareness and policy making. Coalescent-based phylodynamic analysis can use genetic sequences of a pathogen to estimate changes in its effective population size, a measure of genetic diversity. These changes in effective population size can be connected to the changes in the number of infections in the population of interest under certain conditions. Phylodynamics is an important set of tools because its methods are often resilient to the ascertainment biases present in traditional surveillance data (e.g., preferentially testing symptomatic individuals). Unfortunately, it takes weeks or months to sequence and deposit the sampled pathogen genetic sequences into a database, making them available for such analyses. These reporting delays severely decrease precision of phylodynamic methods closer to present time, and for some models can lead to extreme biases. Here we present a method that affords reliable estimation of the effective population size trajectory closer to the time of data collection, allowing for policy decisions to be based on more recent data. Our work uses readily available historic times between sampling and reporting of sequenced samples for a population of interest, and incorporates this information into the sampling model to mitigate the effects of reporting delay in real-time analyses. We illustrate our methodology on simulated data and on SARS-CoV-2 sequences collected in the state of Washington in 2021.more » « less
-
Abstract The Household Pulse Survey (HPS), released by the US Census Bureau at the start of the coronavirus pandemic, gathers timely information about the societal and economic impacts of coronavirus. The first phase of the survey was launched in April 2020 and ran for 12 weeks. To track the immediate impact of the pandemic, individual respondents during this phase were re-sampled for up to three consecutive weeks. Motivated by expected job loss during the pandemic, using public-use microdata, this work proposes unit-level, model-based estimators that incorporate longitudinal dependence at both the response and domain level. In particular, using a pseudo-likelihood, we consider a Bayesian hierarchical unit-level, model-based approach for both Gaussian and binary response data under informative sampling. To facilitate construction of these model-based estimates, we develop an efficient Gibbs sampler. An empirical simulation study is conducted to compare the proposed approach to models that do not account for unit-level longitudinal correlation. Finally, using public-use HPS micro-data, we provide an analysis of ‘expected job loss’ that compares both design- and model-based estimators and demonstrates superior performance for the proposed model-based approaches.more » « less
An official website of the United States government

