skip to main content


Title: COVID-19 Surveiller: toward a robust and effective pandemic surveillance system based on social media mining
The outbreak of the novel coronavirus, COVID-19, has become one of the most severe pandemics in human history. In this paper, we propose to leverage social media users as social sensors to simultaneously predict the pandemic trends and suggest potential risk factors for public health experts to understand spread situations and recommend proper interventions. More precisely, we develop novel deep learning models to recognize important entities and their relations over time, thereby establishing dynamic heterogeneous graphs to describe the observations of social media users. A dynamic graph neural network model can then forecast the trends (e.g. newly diagnosed cases and death rates) and identify high-risk events from social media. Based on the proposed computational method, we also develop a web-based system for domain experts without any computer science background to easily interact with. We conduct extensive experiments on large-scale datasets of COVID-19 related tweets provided by Twitter, which show that our method can precisely predict the new cases and death rates. We also demonstrate the robustness of our web-based pandemic surveillance system and its ability to retrieve essential knowledge and derive accurate predictions across a variety of circumstances. Our system is also available at http://scaiweb.cs.ucla.edu/covidsurveiller/ . This article is part of the theme issue ‘Data science approachs to infectious disease surveillance’.  more » « less
Award ID(s):
2031187
NSF-PAR ID:
10331494
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Volume:
380
Issue:
2214
ISSN:
1364-503X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, and it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper. 
    more » « less
  2. Abstract Background

    The Mexican Institute of Social Security (IMSS) is the largest health care provider in Mexico, covering about 48% of the Mexican population. In this report, we describe the epidemiological patterns related to confirmed cases, hospitalizations, intubations, and in-hospital mortality due to COVID-19 and associated factors, during five epidemic waves recorded in the IMSS surveillance system.

    Methods

    We analyzed COVID-19 laboratory-confirmed cases from the Online Epidemiological Surveillance System (SINOLAVE) from March 29th, 2020, to August 27th, 2022. We constructed weekly epidemic curves describing temporal patterns of confirmed cases and hospitalizations by age, gender, and wave. We also estimated hospitalization, intubation, and hospital case fatality rates. The mean days of in-hospital stay and hospital admission delay were calculated across five pandemic waves. Logistic regression models were employed to assess the association between demographic factors, comorbidities, wave, and vaccination and the risk of severe disease and in-hospital death.

    Results

    A total of 3,396,375 laboratory-confirmed COVID-19 cases were recorded across the five waves. The introduction of rapid antigen testing at the end of 2020 increased detection and modified epidemiological estimates. Overall, 11% (95% CI 10.9, 11.1) of confirmed cases were hospitalized, 20.6% (95% CI 20.5, 20.7) of the hospitalized cases were intubated, and the hospital case fatality rate was 45.1% (95% CI 44.9, 45.3). The mean in-hospital stay was 9.11 days, and patients were admitted on average 5.07 days after symptoms onset. The most recent waves dominated by the Omicron variant had the highest incidence. Hospitalization, intubation, and mean hospitalization days decreased during subsequent waves. The in-hospital case fatality rate fluctuated across waves, reaching its highest value during the second wave in winter 2020. A notable decrease in hospitalization was observed primarily among individuals ≥ 60 years. The risk of severe disease and death was positively associated with comorbidities, age, and male gender; and declined with later waves and vaccination status.

    Conclusion

    During the five pandemic waves, we observed an increase in the number of cases and a reduction in severity metrics. During the first three waves, the high in-hospital fatality rate was associated with hospitalization practices for critical patients with comorbidities.

     
    more » « less
  3. Background Web-based resources and social media platforms play an increasingly important role in health-related knowledge and experience sharing. There is a growing interest in the use of these novel data sources for epidemiological surveillance of substance use behaviors and trends. Objective The key aims were to describe the development and application of the drug abuse ontology (DAO) as a framework for analyzing web-based and social media data to inform public health and substance use research in the following areas: determining user knowledge, attitudes, and behaviors related to nonmedical use of buprenorphine and illicitly manufactured opioids through the analysis of web forum data Prescription Drug Abuse Online Surveillance; analyzing patterns and trends of cannabis product use in the context of evolving cannabis legalization policies in the United States through analysis of Twitter and web forum data (eDrugTrends); assessing trends in the availability of novel synthetic opioids through the analysis of cryptomarket data (eDarkTrends); and analyzing COVID-19 pandemic trends in social media data related to 13 states in the United States as per Mental Health America reports. Methods The domain and scope of the DAO were defined using competency questions from popular ontology methodology (101 ontology development). The 101 method includes determining the domain and scope of ontology, reusing existing knowledge, enumerating important terms in ontology, defining the classes, their properties and creating instances of the classes. The quality of the ontology was evaluated using a set of tools and best practices recognized by the semantic web community and the artificial intelligence community that engage in natural language processing. Results The current version of the DAO comprises 315 classes, 31 relationships, and 814 instances among the classes. The ontology is flexible and can easily accommodate new concepts. The integration of the ontology with machine learning algorithms dramatically decreased the false alarm rate by adding external knowledge to the machine learning process. The ontology is recurrently updated to capture evolving concepts in different contexts and applied to analyze data related to social media and dark web marketplaces. Conclusions The DAO provides a powerful framework and a useful resource that can be expanded and adapted to a wide range of substance use and mental health domains to help advance big data analytics of web-based data for substance use epidemiology research. 
    more » « less
  4. null (Ed.)
    The COVID-19 pandemic brought to the forefront an unprecedented need for experts, as well as citizens, to visualize spatio-temporal disease surveillance data. Web application dashboards were quickly developed to fill this gap, including those built by JHU, WHO, and CDC, but all of these dashboards supported a particular niche view of the pandemic (ie, current status or specific regions). In this paper1, we describe our work developing our own COVID-19 Surveillance Dashboard, available at https://nssac.bii.virginia.edu/covid19/dashboard/, which offers a universal view of the pandemic while also allowing users to focus on the details that interest them. From the beginning, our goal was to provide a simple visual way to compare, organize, and track near-real-time surveillance data as the pandemic progresses. Our dashboard includes a number of advanced features for zooming, filtering, categorizing and visualizing multiple time series on a single canvas. In developing this dashboard, we have also identified 6 key metrics we call the 6Cs standard which we propose as a standard for the design and evaluation of real-time epidemic science dashboards. Our dashboard was one of the first released to the public, and remains one of the most visited and highly used. Our group uses it to support federal, state and local public health authorities, and it is used by people worldwide to track the pandemic evolution, build their own dashboards, and support their organizations as they plan their responses to the pandemic. We illustrate the utility of our dashboard by describing how it can be used to support data story-telling – an important emerging area in data science. 
    more » « less
  5. Abstract—The COVID-19 pandemic brought to the forefront an unprecedented need for experts, as well as citizens, to visualize spatio-temporal disease surveillance data. Web application dashboards were quickly developed to fill t his g ap, b ut a ll of these dashboards supported a particular niche view of the pandemic (ie, current status or specific r egions). I n t his paper, we describe our work developing our COVID-19 Surveillance Dashboard, which offers a unique view of the pandemic while also allowing users to focus on the details that interest them. From the beginning, our goal was to provide a simple visual tool for comparing, organizing, and tracking near-real-time surveillance data as the pandemic progresses. In developing this dashboard, we also identified 6 key metrics which we propose as a standard for the design and evaluation of real-time epidemic science dashboards. Our dashboard was one of the first r eleased t o the public, and continues to be actively visited. Our own group uses it to support federal, state and local public health authorities, and it is used by individuals worldwide to track the evolution of the COVID-19 pandemic, build their own dashboards, and support their organizations as they plan their responses to the pandemic. 
    more » « less