skip to main content


Title: SymptomID: A Framework for Rapid Symptom Identification in Pandemics Using News Reports
The ability to quickly learn fundamentals about a new infectious disease, such as how it is transmitted, the incubation period, and related symptoms, is crucial in any novel pandemic. For instance, rapid identification of symptoms can enable interventions for dampening the spread of the disease. Traditionally, symptoms are learned from research publications associated with clinical studies. However, clinical studies are often slow and time intensive, and hence delays can have dire consequences in a rapidly spreading pandemic like we have seen with COVID-19. In this article, we introduce SymptomID, a modular artificial intelligence–based framework for rapid identification of symptoms associated with novel pandemics using publicly available news reports. SymptomID is built using the state-of-the-art natural language processing model (Bidirectional Encoder Representations for Transformers) to extract symptoms from publicly available news reports and cluster-related symptoms together to remove redundancy. Our proposed framework requires minimal training data, because it builds on a pre-trained language model. In this study, we present a case study of SymptomID using news articles about the current COVID-19 pandemic. Our COVID-19 symptom extraction module, trained on 225 articles, achieves an F1 score of over 0.8. SymptomID can correctly identify well-established symptoms (e.g., “fever” and “cough”) and less-prevalent symptoms (e.g., “rashes,” “hair loss,” “brain fog”) associated with the novel coronavirus. We believe this framework can be extended and easily adapted in future pandemics to quickly learn relevant insights that are fundamental for understanding and combating a new infectious disease.  more » « less
Award ID(s):
2031546
PAR ID:
10265913
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ACM Transactions on Management Information Systems
Volume:
12
Issue:
4
ISSN:
2158-656X
Page Range / eLocation ID:
1 to 17
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The ongoing COVID-19 pandemic has justifiably captured the attention of people around the world since late 2019. It has produced in many people a new perspective on or, indeed, a new realization about our potential vulnerability to emerging infectious diseases. However, our species has experienced numerous catastrophic disease pandemics in the past, and in addition to concerns about the harm being produced during the pandemic and the potential long-term sequelae of the disease, what has been frustrating for many public health experts, anthropologists, and historians is awareness that many of the outcomes of COVID-19 are not inevitable and might have been preventable had we actually heeded lessons from the past. We are currently witnessing variation in exposure risk, symptoms, and mortality from COVID-19, but these patterns are not surprising given what we know about past pandemics. We review here the literature on the demographic and evolutionary consequences of the Second Pandemic of Plague (ca. fourteenth–nineteenth centuries C.E.) and the 1918 influenza pandemic, two of the most devastating pandemics in recorded human history. These both provide case studies of the ways in which sociocultural and environmental contexts shape the experiences and outcomes of pandemic disease. Many of the factors at work during these past pandemics continue to be reproduced in modern contexts, and ultimately our hope is that by highlighting the outcomes that are at least theoretically preventable, we can leverage our knowledge about past experiences to prepare for and respond to disease today.

     
    more » « less
  2. The spread of fake news related to COVID-19 is an infodemic that leads to a public health crisis. Therefore, detecting fake news is crucial for an effective management of the COVID-19 pandemic response. Studies have shown that machine learning models can detect COVID-19 fake news based on the content of news articles. However, the use of biomedical information, which is often featured in COVID-19 news, has not been explored in the development of these models. We present a novel approach for predicting COVID-19 fake news by leveraging biomedical information extraction (BioIE) in combination with machine learning models. We analyzed 1164 COVID-19 news articles and used advanced BioIE algorithms to extract 158 novel features. These features were then used to train 15 machine learning classifiers to predict COVID-19 fake news. Among the 15 classifiers, the random forest model achieved the best performance with an area under the ROC curve (AUC) of 0.882, which is 12.36% to 31.05% higher compared to models trained on traditional features. Furthermore, incorporating BioIE-based features improved the performance of a state-of-the-art multi-modality model (AUC 0.914 vs. 0.887). Our study suggests that incorporating biomedical information into fake news detection models improves their performance, and thus could be a valuable tool in the fight against the COVID-19 infodemic.

     
    more » « less
  3. The COVID-19 pandemic highlighted two critical barriers hindering rapid response to novel pathogens. These include inefficient use of existing biological knowledge about treatments, compounds, gene interactions, proteins, etc. to fight new diseases, and the lack of assimilation and analysis of the fast-growing knowledge about new diseases to quickly develop new treatments, vaccines, and compounds. Overcoming these critical challenges has the potential to revolutionize global preparedness for future pandemics. Accordingly, this article introduces a novel knowledge graph application that functions as both a repository of life science knowledge and an analytics platform capable of extracting time-sensitive insights to uncover evolving disease dynamics and, importantly, researchers' evolving understanding. Specifically, we demonstrate how to extract time-bounded key concepts, also leveraging existing ontologies, from evolving scholarly articles to create a single temporal connected source of truth specifically related to COVID-19. By doing so, current knowledge can be promptly accessed by both humans and machines, from which further understanding of disease outbreaks can be derived. We present key findings from the temporal analysis, applied to a subset of the resulting knowledge graph known as the temporal keywords knowledge graph, and delve into the detailed capabilities provided by this innovative approach. 
    more » « less
  4. Beginning in early 2020, the novel coronavirus was the subject of frequent and sustained news coverage. Building on prior literature on the stress-inducing effects of consuming news during a large-scale crisis, we used network analysis to investigate the association between coronavirus disease 2019 (COVID-19) news consumption, COVID-19-related psychological stress, worries about oneself and one’s loved ones getting COVID-19, and sleep quality. Data were collected in March 2020 from 586 adults (45.2% female; 72.9% White) recruited via Amazon Mechanical Turk in the U.S. Participants completed online surveys assessing attitudes and behaviors related to COVID-19 and a questionnaire assessing seven domains of sleep quality. Networks were constructed using partial regularized correlation matrices. As hypothesized, COVID-19 news consumption was positively associated with COVID-19-related psychological stress and concerns about one’s loved ones getting COVID-19. However, there were very few associations between COVID-19 news consumption and sleep quality indices, and gender did not moderate any of the observed relationships. This study replicates and extends previous findings that COVID-19-news consumption is linked with psychological stress related to the pandemic, but even under such conditions, sleep quality can be spared due to the pandemic allowing for flexibility in morning work/school schedules. 
    more » « less
  5. Abstract

    Proteins are direct products of the genome and metabolites are functional products of interactions between the host and other factors such as environment, disease state, clinical information, etc. Omics data, including proteins and metabolites, are useful in characterizing biological processes underlying COVID-19 along with patient data and clinical information, yet few methods are available to effectively analyze such diverse and unstructured data. Using an integrated approach that combines proteomics and metabolomics data, we investigated the changes in metabolites and proteins in relation to patient characteristics (e.g., age, gender, and health outcome) and clinical information (e.g., metabolic panel and complete blood count test results). We found significant enrichment of biological indicators of lung, liver, and gastrointestinal dysfunction associated with disease severity using publicly available metabolite and protein profiles. Our analyses specifically identified enriched proteins that play a critical role in responses to injury or infection within these anatomical sites, but may contribute to excessive systemic inflammation within the context of COVID-19. Furthermore, we have used this information in conjunction with machine learning algorithms to predict the health status of patients presenting symptoms of COVID-19. This work provides a roadmap for understanding the biochemical pathways and molecular mechanisms that drive disease severity, progression, and treatment of COVID-19.

     
    more » « less