skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: COVID-19 Surveiller: toward a robust and effective pandemic surveillance system based on social media mining
The outbreak of the novel coronavirus, COVID-19, has become one of the most severe pandemics in human history. In this paper, we propose to leverage social media users as social sensors to simultaneously predict the pandemic trends and suggest potential risk factors for public health experts to understand spread situations and recommend proper interventions. More precisely, we develop novel deep learning models to recognize important entities and their relations over time, thereby establishing dynamic heterogeneous graphs to describe the observations of social media users. A dynamic graph neural network model can then forecast the trends (e.g. newly diagnosed cases and death rates) and identify high-risk events from social media. Based on the proposed computational method, we also develop a web-based system for domain experts without any computer science background to easily interact with. We conduct extensive experiments on large-scale datasets of COVID-19 related tweets provided by Twitter, which show that our method can precisely predict the new cases and death rates. We also demonstrate the robustness of our web-based pandemic surveillance system and its ability to retrieve essential knowledge and derive accurate predictions across a variety of circumstances. Our system is also available at http://scaiweb.cs.ucla.edu/covidsurveiller/ . This article is part of the theme issue ‘Data science approachs to infectious disease surveillance’.  more » « less
Award ID(s):
2031187
PAR ID:
10331494
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Volume:
380
Issue:
2214
ISSN:
1364-503X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Kacprzyk, Janusz; Pal, Nikhil R; Perez, Rafael B; Corchado, Emilio S; Hagras, Hani; Kóczy, László T; Kreinovich, Vladik; Lin, Chin-Teng; Lu, Jie; Melin, Patricia (Ed.)
    The COVID-19 pandemic was lived in real-time on social media. In the current project, we use machine learning to explore the relationship between COVID-19 cases and social media activity on Twitter. We were particularly interested in determining if Twitter activity can be used to predict COVID-19 surges. We also were interested in exploring features of social media, such as replies, to determine their promise for understanding the views of individual users. With the prevalence of mis/disinformation on social media, it is critical to develop a deeper and richer understanding of the relationship between social media and real-world events in order to detect and prevent future influence operations. In the current work, we explore the relationship between COVID-19 cases and social media activity (on Twitter) in three major United States cities with different geographical and political landscapes. We find that Twitter activity resulted in statistically significant correlations using the Granger causality test, with a lag of one week in all three cities. Similarly, the use of replies, which appear more likely to be generated by individual users, not bots or public relations operations, was also strongly correlated with the number of COVID-19 cases using the Granger causality test. Furthermore, we were able to build promising predictive models for the number of future COVID-19 cases using correlation data to select features for input to our models. In contrast, significant correlations were not identified when comparing the number of COVID-19 cases with mainstream media sources or with a sample of all US COVID-related tweets. We conclude that, even for an international event such as COVID-19, social media tracks closely with local conditions. We also suggest that replies can be a valuable feature within a machine learning task that is attempting to gauge the reactions of individual users. 
    more » « less
  2. Abstract—The COVID-19 pandemic brought to the forefront an unprecedented need for experts, as well as citizens, to visualize spatio-temporal disease surveillance data. Web application dashboards were quickly developed to fill t his g ap, b ut a ll of these dashboards supported a particular niche view of the pandemic (ie, current status or specific r egions). I n t his paper, we describe our work developing our COVID-19 Surveillance Dashboard, which offers a unique view of the pandemic while also allowing users to focus on the details that interest them. From the beginning, our goal was to provide a simple visual tool for comparing, organizing, and tracking near-real-time surveillance data as the pandemic progresses. In developing this dashboard, we also identified 6 key metrics which we propose as a standard for the design and evaluation of real-time epidemic science dashboards. Our dashboard was one of the first r eleased t o the public, and continues to be actively visited. Our own group uses it to support federal, state and local public health authorities, and it is used by individuals worldwide to track the evolution of the COVID-19 pandemic, build their own dashboards, and support their organizations as they plan their responses to the pandemic. 
    more » « less
  3. Aboelhadid, Shawky M (Ed.)
    The COVID-19 pandemic has caused over 500 million cases and over six million deaths globally. From these numbers, over 12 million cases and over 250 thousand deaths have occurred on the African continent as of May 2022. Prevention and surveillance remains the cornerstone of interventions to halt the further spread of COVID-19. Google Health Trends (GHT), a free Internet tool, may be valuable to help anticipate outbreaks, identify disease hotspots, or understand the patterns of disease surveillance. We collected COVID-19 case and death incidence for 54 African countries and obtained averages for four, five-month study periods in 2020–2021. Average case and death incidences were calculated during these four time periods to measure disease severity. We used GHT to characterize COVID-19 incidence across Africa, collecting numbers of searches from GHT related to COVID-19 using four terms: ‘coronavirus’, ‘coronavirus symptoms’, ‘COVID19’, and ‘pandemic’. The terms were related to weekly COVID-19 case incidences for the entire study period via multiple linear and weighted linear regression analyses. We also assembled 72 variables assessing Internet accessibility, demographics, economics, health, and others, for each country, to summarize potential mechanisms linking GHT searches and COVID-19 incidence. COVID-19 burden in Africa increased steadily during the study period. Important increases for COVID-19 death incidence were observed for Seychelles and Tunisia. Our study demonstrated a weak correlation between GHT and COVID-19 incidence for most African countries. Several variables seemed useful in explaining the pattern of GHT statistics and their relationship to COVID-19 including: log of average weekly cases, log of cumulative total deaths, and log of fixed total number of broadband subscriptions in a country. Apparently, GHT may best be used for surveillance of diseases that are diagnosed more consistently. Overall, GHT-based surveillance showed little applicability in the studied countries. GHT for an ongoing epidemic might be useful in specific situations, such as when countries have significant levels of infection with low variability. Future studies might assess the algorithm in different epidemic contexts. 
    more » « less
  4. null (Ed.)
    Since the start of coronavirus disease 2019 (COVID-19) pandemic, social media platforms have been filled with discussions about the global health crisis. Meanwhile, the World Health Organization (WHO) has highlighted the importance of seeking credible sources of information on social media regarding COVID-19. In this study, we conducted an in-depth analysis of Twitter posts about COVID-19 during the early days of the COVID-19 pandemic to identify influential sources of COVID-19 information and understand the characteristics of these sources. We identified influential accounts based on an information diffusion network representing the interactions of Twitter users who discussed COVID-19 in the United States over a 24-h period. The network analysis revealed 11 influential accounts that we categorized as: 1) political authorities (elected government officials), 2) news organizations, and 3) personal accounts. Our findings showed that while verified accounts with a large following tended to be the most influential users, smaller personal accounts also emerged as influencers. Our analysis revealed that other users often interacted with influential accounts in response to news about COVID-19 cases and strongly contested political arguments received the most interactions overall. These findings suggest that political polarization was a major factor in COVID-19 information diffusion. We discussed the implications of political polarization on social media for COVID-19 communication. 
    more » « less
  5. Objective Data-informed decision making is valued among school districts, but challenges remain for local health departments to provide data, especially during a pandemic. We describe the rapid planning and deployment of a school-based COVID-19 surveillance system in a metropolitan US county. Methods In 2020, we used several data sources to construct disease- and school-based indicators for COVID-19 surveillance in Franklin County, an urban county in central Ohio. We collected, processed, analyzed, and visualized data in the COVID-19 Analytics and Targeted Surveillance System for Schools (CATS). CATS included web-based applications (public and secure versions), automated alerts, and weekly reports for the general public and decision makers, including school administrators, school boards, and local health departments. Results We deployed a pilot version of CATS in less than 2 months (August–September 2020) and added 21 school districts in central Ohio (15 in Franklin County and 6 outside the county) into CATS during the subsequent months. Public-facing web-based applications provided parents and students with local information for data-informed decision making. We created an algorithm to enable local health departments to precisely identify school districts and school buildings at high risk of an outbreak and active SARS-CoV-2 transmission in school settings. Practice Implications Piloting a surveillance system with diverse school districts helps scale up to other districts. Leveraging past relationships and identifying emerging partner needs were critical to rapid and sustainable collaboration. Valuing diverse skill sets is key to rapid deployment of proactive and innovative public health practices during a global pandemic. 
    more » « less