skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Towards using Tweet sentiment for infectious disease detection
Social media data has shown potential for identifying infectious disease outbreaks faster than official records of disease incidence. We examine spatial, temporal, and spatiotemporal relationships between COVID-19-related microblog sentiment and COVID-19 cases over space and time to investigate whether microblog-derived sentiment can be used for local infectious disease outbreak early warning. Therefore, we measure the sentiment of 56,755,894 COVID-19 related microblogs (tweets) from the microblogging platform X. We group these tweets by county and by calendar week to investigate spatial and temporal correlation between sentiment and observed cases (in the corresponding county and week). Our temporal analysis shows a significant negative correlation between sentiment and cases between June and September 2020. During this time, tweet sentiment could have served as an early warning for new COVID-19 outbreaks. Our spatial analysis shows that the East of the United States exhibits a significant negative correlation between Sentiment and Cases while the West exhibits a significant positive correlation. In these regions, Tweet Sentiment could have been used as an early warning signal for new outbreaks. Our spatiotemporal analysis discovers even stronger correlations in certain regions during certain time periods. If we could understand when, where, and why this correlation is strong, then we may be able to leverage social media as a successful early warning system.  more » « less
Award ID(s):
2109647 2302970 2302968
PAR ID:
10671958
Author(s) / Creator(s):
; ;
Publisher / Repository:
Public Library of Science
Date Published:
Journal Name:
PLOS One
Volume:
20
Issue:
6
ISSN:
1932-6203
Page Range / eLocation ID:
e0325166
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Background Population mobility is closely associated with COVID-19 transmission, and it could be used as a proximal indicator to predict future outbreaks, which could inform proactive nonpharmaceutical interventions for disease control. South Carolina is one of the US states that reopened early, following which it experienced a sharp increase in COVID-19 cases. Objective The aims of this study are to examine the spatial-temporal relationship between population mobility and COVID-19 outbreaks and use population mobility data to predict daily new cases at both the state and county level in South Carolina. Methods This longitudinal study used disease surveillance data and Twitter-based population mobility data from March 6 to November 11, 2020, in South Carolina and its five counties with the largest number of cumulative confirmed COVID-19 cases. Population mobility was assessed based on the number of Twitter users with a travel distance greater than 0.5 miles. A Poisson count time series model was employed for COVID-19 forecasting. Results Population mobility was positively associated with state-level daily COVID-19 incidence as well as incidence in the top five counties (ie, Charleston, Greenville, Horry, Spartanburg, and Richland). At the state level, the final model with a time window within the last 7 days had the smallest prediction error, and the prediction accuracy was as high as 98.7%, 90.9%, and 81.6% for the next 3, 7, and 14 days, respectively. Among Charleston, Greenville, Horry, Spartanburg, and Richland counties, the best predictive models were established based on their observations in the last 9, 14, 28, 20, and 9 days, respectively. The 14-day prediction accuracy ranged from 60.3%-74.5%. Conclusions Using Twitter-based population mobility data could provide acceptable predictions of COVID-19 daily new cases at both the state and county level in South Carolina. Population mobility measured via social media data could inform proactive measures and resource relocations to curb disease outbreaks and their negative influences. 
    more » « less
  2. Abstract Coronavirus SARS-COV-2 infections continue to spread across the world, yet effective large-scale disease detection and prediction remain limited. COVID Control: A Johns Hopkins University Study, is a novel syndromic surveillance approach, which collects body temperature and COVID-like illness (CLI) symptoms across the US using a smartphone app and applies spatio-temporal clustering techniques and cross-correlation analysis to create maps of abnormal symptomatology incidence that are made publicly available. The results of the cross-correlation analysis identify optimal temporal lags between symptoms and a range of COVID-19 outcomes, with new taste/smell loss showing the highest correlations. We also identified temporal clusters of change in taste/smell entries and confirmed COVID-19 incidence in Baltimore City and County. Further, we utilized an extended simulated dataset to showcase our analytics in Maryland. The resulting clusters can serve as indicators of emerging COVID-19 outbreaks, and support syndromic surveillance as an early warning system for disease prevention and control. 
    more » « less
  3. Deep Learning for Time-series plays a key role in AI for healthcare. To predict the progress of infectious disease outbreaks and demonstrate clear population-level impact, more granular analyses are urgently needed that control for important and potentially confounding county-level socioeconomic and health factors. We forecast US county-level COVID-19 infections using the Temporal Fusion Transformer (TFT). We focus on heterogeneous time-series deep learning model prediction while interpreting the complex spatiotemporal features learned from the data. The significance of the work is grounded in a real-world COVID-19 infection prediction with highly non-stationary, finely granular, and heterogeneous data. 1) Our model can capture the detailed daily changes of temporal and spatial model behaviors and achieves better prediction performance compared to other time-series models. 2) We analyzed the attention patterns from TFT to interpret the temporal and spatial patterns learned by the model. 3) We collected around 2.5 years of socioeconomic and health features for 3142 US counties, such as observed cases, and a number of static (age distribution and health disparity) and dynamic features (vaccination, disease spread, transmissible cases, and social distancing). Using the proposed framework, we have shown that our model can learn complex interactions. Interpreting different impacts at the county level would be crucial for understanding the infection process that can help effective public health decision-making. 
    more » « less
  4. Deep Learning for Time-series plays a key role in AI for healthcare. To predict the progress of infectious disease outbreaks and demonstrate clear population-level impact, more granular analyses are urgently needed that control for important and potentially confounding county-level socioeconomic and health factors. We forecast US county-level COVID-19 infections using the Temporal Fusion Transformer (TFT). We focus on heterogeneous time-series deep learning model prediction while interpreting the complex spatiotemporal features learned from the data. The significance of the work is grounded in a real-world COVID-19 infection prediction with highly non-stationary, finely granular, and heterogeneous data. 1) Our model can capture the detailed daily changes of temporal and spatial model behaviors and achieves better prediction performance compared to other time-series models. 2) We analyzed the attention patterns from TFT to interpret the temporal and spatial patterns learned by the model. 3) We collected around 2.5 years of socioeconomic and health features for 3142 US counties, such as observed cases, and a number of static (age distribution and health disparity) and dynamic features (vaccination, disease spread, transmissible cases, and social distancing). Using the proposed framework, we have shown that our model can learn complex interactions. Interpreting different impacts at the county level would be crucial for understanding the infection process that can help effective public health decision-making. 
    more » « less
  5. The outbreaks of Coronavirus Disease 2019 (COVID-19) have impacted the world significantly. Modeling the trend of infection and real-time forecasting of cases can help decision making and control of the disease spread. However, data-driven methods such as recurrent neural networks (RNN) can perform poorly due to limited daily samples in time. In this work, we develop an integrated spatiotemporal model based on the epidemic differential equations (SIR) and RNN. The former after simplification and discretization is a compact model of temporal infection trend of a region while the latter models the effect of nearest neighboring regions. The latter captures latent spatial information. We trained and tested our model on COVID-19 data in Italy, and show that it out-performs existing temporal models (fully connected NN, SIR, ARIMA) in 1-day, 3-day, and 1-week ahead forecasting especially in the regime of limited training data. 
    more » « less