skip to main content

Title: Multi-faceted analysis and prediction for the outbreak of pediatric respiratory syncytial virus
Abstract Objectives

Respiratory syncytial virus (RSV) is a significant cause of pediatric hospitalizations. This article aims to utilize multisource data and leverage the tensor methods to uncover distinct RSV geographic clusters and develop an accurate RSV prediction model for future seasons.

Materials and Methods

This study utilizes 5-year RSV data from sources, including medical claims, CDC surveillance data, and Google search trends. We conduct spatiotemporal tensor analysis and prediction for pediatric RSV in the United States by designing (i) a nonnegative tensor factorization model for pediatric RSV diseases and location clustering; (ii) and a recurrent neural network tensor regression model for county-level trend prediction using the disease and location features.


We identify a clustering hierarchy of pediatric diseases: Three common geographic clusters of RSV outbreaks were identified from independent sources, showing an annual RSV trend shifting across different US regions, from the South and Southeast regions to the Central and Northeast regions and then to the West and Northwest regions, while precipitation and temperature were found as correlative factors with the coefficient of determination R2≈0.5, respectively. Our regression model accurately predicted the 2022-2023 RSV season at the county level, achieving R2≈0.3 mean absolute error MAE < 0.4 and a Pearson correlation greater than 0.75, which significantly outperforms the baselines with P-values <.05.


Our proposed framework provides a thorough analysis of RSV disease in the United States, which enables healthcare providers to better prepare for potential outbreaks, anticipate increased demand for services and supplies, and save more lives with timely interventions.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the American Medical Informatics Association
Medium: X Size: p. 198-208
["p. 198-208"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Controlling the spread of infectious diseases―even when safe, transmission-blocking vaccines are available―may require the effective use of non-pharmaceutical interventions (NPIs), e.g., mask wearing, testing, limits on group sizes, venue closure. During the SARS-CoV-2 pandemic, many countries implemented NPIs inconsistently in space and time. This inconsistency was especially pronounced for policies in the United States of America (US) related to venue closure.


    Here, we investigate the impact of inconsistent policies associated with venue closure using mathematical modeling and high-resolution human mobility, Google search, and county-level SARS-CoV-2 incidence data from the USA. Specifically, we look at high-resolution location data and perform a US-county-level analysis of nearly 8 million SARS-CoV-2 cases and 150 million location visits, including 120 million church visitors across 184,677 churches, 14 million grocery visitors across 7662 grocery stores, and 13.5 million gym visitors across 5483 gyms.


    Analyzing the interaction between venue closure and changing mobility using a mathematical model shows that, across a broad range of model parameters, inconsistent or partial closure can be worse in terms of disease transmission as compared to scenarios with no closures at all. Importantly, changes in mobility patterns due to epidemic control measures can lead to increase in the future number of cases. In the most severe cases, individuals traveling to neighboring jurisdictions with different closure policies can result in an outbreak that would otherwise have been contained. To motivate our mathematical models, we turn to mobility data and find that while stay-at-home orders and closures decreased contacts in most areas of the USA, some specific activities and venues saw an increase in attendance and an increase in the distance visitors traveled to attend. We support this finding using search query data, which clearly shows a shift in information seeking behavior concurrent with the changing mobility patterns.


    While coarse-grained observations are not sufficient to validate our models, taken together, they highlight the potential unintended consequences of inconsistent epidemic control policies related to venue closure and stress the importance of balancing the societal needs of a population with the risk of an outbreak growing into a large epidemic.

    more » « less
  2. Objective: Slaughterhouse data has recently been used to enhance animal disease surveillance in many countries, however has been largely underused for syndromic surveillance in the United States. We characterize spatiotemporal patterns and system dynamics of whole carcass swine condemnations in the US. We illustrate the value of data mining and machine learning approaches to more cost-effectively identify: emerging trends by condemnation reason, areas and time periods with higher than predicted condemnation rates, and regions or time periods with similar trends. Methods: Swine slaughter and condemnation data from 2005-2016 were obtained for slaughterhouses inspected by the Food Safety and Inspection Service (FSIS). Time series of condemnation rates by condemnation reason, type of pig, state and month were generated. Data time warping (DTW) and hierarchical clustering methods were used to identify states with similar patterns in the rate of condemnation cases by cause and type of pig. Spatiotemporal scan statistics were used to identify states and months with significantly higher number of condemnation cases than expected. Clusters were compared to historic infectious disease outbreaks in the swine industry. Results: Between 2005-2016, 1,109,300 whole swine carcasses were condemned. The top causes for condemnation were abscess/pyemia, septicemia, pneumonia, icterus, and peritonitis, respectively. DTW and cluster analysis revealed clear spatiotemporal patterns in the rate of condemnations, many with a strong seasonal component. Several clusters were detected in timeframes where widespread outbreaks had occurred. Conclusions: Timely evaluation of spatiotemporal patterns in swine condemnations may provide critical information in predicting disease outbreaks. Identification of spatiotemporal hot spots can direct investigation of primary on-farm risk factors contributing to condemnation. Risk mitigation through targeted decision-making and improved management practices can minimize carcass condemnations and animal losses, improving economic efficiency, profitability and sustainability of the US swine industry 
    more » « less
  3. Background

    Stay-at-home orders were one of the controversial interventions to curb the spread of COVID-19 in the United States. The stay-at-home orders, implemented in 51 states and territories between March 7 and June 30, 2020, impacted the lives of individuals and communities and accelerated the heavy usage of web-based social networking sites. Twitter sentiment analysis can provide valuable insight into public health emergency response measures and allow for better formulation and timing of future public health measures to be released in response to future public health emergencies.


    This study evaluated how stay-at-home orders affect Twitter sentiment in the United States. Furthermore, this study aimed to understand the feedback on stay-at-home orders from groups with different circumstances and backgrounds. In addition, we particularly focused on vulnerable groups, including older people groups with underlying medical conditions, small and medium enterprises, and low-income groups.


    We constructed a multiperiod difference-in-differences regression model based on the Twitter sentiment geographical index quantified from 7.4 billion geo-tagged tweets data to analyze the dynamics of sentiment feedback on stay-at-home orders across the United States. In addition, we used moderated effects analysis to assess differential feedback from vulnerable groups.


    We combed through the implementation of stay-at-home orders, Twitter sentiment geographical index, and the number of confirmed cases and deaths in 51 US states and territories. We identified trend changes in public sentiment before and after the stay-at-home orders. Regression results showed that stay-at-home orders generated a positive response, contributing to a recovery in Twitter sentiment. However, vulnerable groups faced greater shocks and hardships during the COVID-19 pandemic. In addition, economic and demographic characteristics had a significant moderating effect.


    This study showed a clear positive shift in public opinion about COVID-19, with this positive impact occurring primarily after stay-at-home orders. However, this positive sentiment is time-limited, with 14 days later allowing people to be more influenced by the status quo and trends, so feedback on the stay-at-home orders is no longer positively significant. In particular, negative sentiment is more likely to be generated in states with a large proportion of vulnerable groups, and the policy plays a limited role. The pandemic hit older people, those with underlying diseases, and small and medium enterprises directly but hurt states with cross-cutting economic situations and more complex demographics over time. Based on large-scale Twitter data, this sociological perspective allows us to monitor the evolution of public opinion more directly, assess the impact of social events on public opinion, and understand the heterogeneity in the face of pandemic shocks.

    more » « less
  4. Abstract Motivation

    Polygenic risk score (PRS) has been widely exploited for genetic risk prediction due to its accuracy and conceptual simplicity. We introduce a unified Bayesian regression framework, NeuPred, for PRS construction, which accommodates varying genetic architectures and improves overall prediction accuracy for complex diseases by allowing for a wide class of prior choices. To take full advantage of the framework, we propose a summary-statistics-based cross-validation strategy to automatically select suitable chromosome-level priors, which demonstrates a striking variability of the prior preference of each chromosome, for the same complex disease, and further significantly improves the prediction accuracy.


    Simulation studies and real data applications with seven disease datasets from the Wellcome Trust Case Control Consortium cohort and eight groups of large-scale genome-wide association studies demonstrate that NeuPred achieves substantial and consistent improvements in terms of predictive r2 over existing methods. In addition, NeuPred has similar or advantageous computational efficiency compared with the state-of-the-art Bayesian methods.

    Availability and implementation

    The R package implementing NeuPred is available at

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less
  5. Accurate prediction of the transmission of epidemic diseases such as COVID-19 is crucial for implementing effective mitigation measures. In this work, we develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously. We construct a 3-way spatio-temporal tensor (location, attribute, time) of case counts and propose a nonnegative tensor factorization with latent epidemiological model regularization named STELAR. Unlike standard tensor factorization methods which cannot predict slabs ahead, STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete time difference equations of a widely adopted epidemiological model. We use latent instead of location/attribute-level epidemiological dynamics to capture common epidemic profile sub-types and improve collaborative learning and prediction. We conduct experiments using both county- and state level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic. Finally, we evaluate the predictive ability of our method and show superior performance compared to the baselines, achieving up to 21% lower root mean square error and 25% lower mean absolute error for county-level prediction. 
    more » « less