skip to main content


Title: STELAR: Spatio-temporal Tensor Factorization with Latent Epidemiological Regularization
Accurate prediction of the transmission of epidemic diseases such as COVID-19 is crucial for implementing effective mitigation measures. In this work, we develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously. We construct a 3-way spatio-temporal tensor (location, attribute, time) of case counts and propose a nonnegative tensor factorization with latent epidemiological model regularization named STELAR. Unlike standard tensor factorization methods which cannot predict slabs ahead, STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete time difference equations of a widely adopted epidemiological model. We use latent instead of location/attribute-level epidemiological dynamics to capture common epidemic profile sub-types and improve collaborative learning and prediction. We conduct experiments using both county- and state level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic. Finally, we evaluate the predictive ability of our method and show superior performance compared to the baselines, achieving up to 21% lower root mean square error and 25% lower mean absolute error for county-level prediction.  more » « less
Award ID(s):
1704074
NSF-PAR ID:
10347658
Author(s) / Creator(s):
Date Published:
Journal Name:
The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)
Page Range / eLocation ID:
4830 - 4837
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)

    Real-world spatio-temporal data is often incomplete or inaccurate due to various data loading delays. For example, a location-disease-time tensor of case counts can have multiple delayed updates of recent temporal slices for some locations or diseases. Recovering such missing or noisy (under-reported) elements of the input tensor can be viewed as a generalized tensor completion problem. Existing tensor completion methods usually assume that i) missing elements are randomly distributed and ii) noise for each tensor element is i.i.d. zero-mean. Both assumptions can be violated for spatio-temporal tensor data. We often observe multiple versions of the input tensor with different under-reporting noise levels. The amount of noise can be time- or location-dependent as more updates are progressively introduced to the tensor. We model such dynamic data as a multi-version tensor with an extra tensor mode capturing the data updates. We propose a low-rank tensor model to predict the updates over time. We demonstrate that our method can accurately predict the ground-truth values of many real-world tensors. We obtain up to 27.2% lower root mean-squared-error compared to the best baseline method. Finally, we extend our method to track the tensor data over time, leading to significant computational savings.

     
    more » « less
  2. Deep Learning for Time-series plays a key role in AI for healthcare. To predict the progress of infectious disease outbreaks and demonstrate clear population-level impact, more granular analyses are urgently needed that control for important and potentially confounding county-level socioeconomic and health factors. We forecast US county-level COVID-19 infections using the Temporal Fusion Transformer (TFT). We focus on heterogeneous time-series deep learning model prediction while interpreting the complex spatiotemporal features learned from the data. The significance of the work is grounded in a real-world COVID-19 infection prediction with highly non-stationary, finely granular, and heterogeneous data. 1) Our model can capture the detailed daily changes of temporal and spatial model behaviors and achieves better prediction performance compared to other time-series models. 2) We analyzed the attention patterns from TFT to interpret the temporal and spatial patterns learned by the model. 3) We collected around 2.5 years of socioeconomic and health features for 3142 US counties, such as observed cases, and a number of static (age distribution and health disparity) and dynamic features (vaccination, disease spread, transmissible cases, and social distancing). Using the proposed framework, we have shown that our model can learn complex interactions. Interpreting different impacts at the county level would be crucial for understanding the infection process that can help effective public health decision-making. 
    more » « less
  3. Deep Learning for Time-series plays a key role in AI for healthcare. To predict the progress of infectious disease outbreaks and demonstrate clear population-level impact, more granular analyses are urgently needed that control for important and potentially confounding county-level socioeconomic and health factors. We forecast US county-level COVID-19 infections using the Temporal Fusion Transformer (TFT). We focus on heterogeneous time-series deep learning model prediction while interpreting the complex spatiotemporal features learned from the data. The significance of the work is grounded in a real-world COVID-19 infection prediction with highly non-stationary, finely granular, and heterogeneous data. 1) Our model can capture the detailed daily changes of temporal and spatial model behaviors and achieves better prediction performance compared to other time-series models. 2) We analyzed the attention patterns from TFT to interpret the temporal and spatial patterns learned by the model. 3) We collected around 2.5 years of socioeconomic and health features for 3142 US counties, such as observed cases, and a number of static (age distribution and health disparity) and dynamic features (vaccination, disease spread, transmissible cases, and social distancing). Using the proposed framework, we have shown that our model can learn complex interactions. Interpreting different impacts at the county level would be crucial for understanding the infection process that can help effective public health decision-making. 
    more » « less
  4. Abstract

    In this work, we aim to accurately predict the number of hospitalizations during the COVID-19 pandemic by developing a spatiotemporal prediction model. We propose HOIST, an Ising dynamics-based deep learning model for spatiotemporal COVID-19 hospitalization prediction. By drawing the analogy between locations and lattice sites in statistical mechanics, we use the Ising dynamics to guide the model to extract and utilize spatial relationships across locations and model the complex influence of granular information from real-world clinical evidence. By leveraging rich linked databases, including insurance claims, census information, and hospital resource usage data across the U.S., we evaluate the HOIST model on the large-scale spatiotemporal COVID-19 hospitalization prediction task for 2299 counties in the U.S. In the 4-week hospitalization prediction task, HOIST achieves 368.7 mean absolute error, 0.6$${R}^{2}$$R2and 0.89 concordance correlation coefficient score on average. Our detailed number needed to treat (NNT) and cost analysis suggest that future COVID-19 vaccination efforts may be most impactful in rural areas. This model may serve as a resource for future county and state-level vaccination efforts.

     
    more » « less
  5. Abstract Objectives

    Respiratory syncytial virus (RSV) is a significant cause of pediatric hospitalizations. This article aims to utilize multisource data and leverage the tensor methods to uncover distinct RSV geographic clusters and develop an accurate RSV prediction model for future seasons.

    Materials and Methods

    This study utilizes 5-year RSV data from sources, including medical claims, CDC surveillance data, and Google search trends. We conduct spatiotemporal tensor analysis and prediction for pediatric RSV in the United States by designing (i) a nonnegative tensor factorization model for pediatric RSV diseases and location clustering; (ii) and a recurrent neural network tensor regression model for county-level trend prediction using the disease and location features.

    Results

    We identify a clustering hierarchy of pediatric diseases: Three common geographic clusters of RSV outbreaks were identified from independent sources, showing an annual RSV trend shifting across different US regions, from the South and Southeast regions to the Central and Northeast regions and then to the West and Northwest regions, while precipitation and temperature were found as correlative factors with the coefficient of determination R2≈0.5, respectively. Our regression model accurately predicted the 2022-2023 RSV season at the county level, achieving R2≈0.3 mean absolute error MAE < 0.4 and a Pearson correlation greater than 0.75, which significantly outperforms the baselines with P-values <.05.

    Conclusion

    Our proposed framework provides a thorough analysis of RSV disease in the United States, which enables healthcare providers to better prepare for potential outbreaks, anticipate increased demand for services and supplies, and save more lives with timely interventions.

     
    more » « less