skip to main content


Search for: All records

Award ID contains: 2107150

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Public sentiment toward the COVID-19 vaccine as expressed on social media can interfere with communication by public health agencies on the importance of getting vaccinated. We investigated Twitter data to understand differences in sentiment, moral values, and language use between political ideologies on the COVID-19 vaccine. We estimated political ideology, conducted a sentiment analysis, and guided by the tenets of moral foundations theory (MFT), we analyzed 262,267 English language tweets from the United States containing COVID-19 vaccine-related keywords between May 2020 and October 2021. We applied the Moral Foundations Dictionary and used topic modeling and Word2Vec to understand moral values and the context of words central to the discussion of the vaccine debate. A quadratic trend showed that extreme ideologies of both Liberals and Conservatives expressed a higher negative sentiment than Moderates, with Conservatives expressing more negative sentiment than Liberals. Compared to Conservative tweets, we found the expression of Liberal tweets to be rooted in a wider set of moral values, associated with moral foundations of care (getting the vaccine for protection), fairness (having access to the vaccine), liberty (related to the vaccine mandate), and authority (trusting the vaccine mandate imposed by the government). Conservative tweets were found to be associated with harm (around safety of the vaccine) and oppression (around the government mandate). Furthermore, political ideology was associated with the expression of different meanings for the same words, e.g. “science” and “death.” Our results inform public health outreach communication strategies to best tailor vaccine information to different groups. 
    more » « less
  2. Using GUI-based workflows for data analysis is an iterative process. During each iteration, an analyst makes changes to the workflow to improve it, generating a new version each time. The results produced by executing these versions are materialized to help users refer to them in the future. In many cases, a new version of the workflow, when submitted for execution, produces a result equivalent to that of a previous one. Identifying such equivalence can save computational resources and time by reusing the materialized result. One way to optimize the performance of executing a new version is to compare the current version with a previous one and test if they produce the same results using a workflow version equivalence verifier. As the number of versions grows, this testing can become a computational bottleneck. In this paper, we present Raven, an optimization framework to accelerate the execution of a new version request by detecting and reusing the results of previous equivalent versions with the help of a version equivalence verifier. Raven ranks and prunes the set of prior versions to quickly identify those that may produce an equivalent result to the version execution request. Additionally, when the verifier performs computation to verify the equivalence of a version pair, there may be a significant overlap with previously tested version pairs. Raven identifies and avoids such repeated computations by extending the verifier to reuse previous knowledge of equivalence tests. We evaluated the effectiveness of Raven compared to baselines on real workflows and datasets. 
    more » « less
  3. Data analytics using workflows is an iterative process, in which an analyst makes many iterations of changes, such as additions, deletions, and alterations of operators and their links. In many cases, the analyst wants to compare these workflow versions and their execution results to help in deciding the next iterations of changes. Moreover, the analyst needs to know which versions produced undesired results to avoid refining the workflow in those versions. To enable the analyst to get an overview of the workflow versions and their results, we introduce Drove, a framework that manages the end-to-end lifecycle of constructing, refining, and executing workflows on large data sets and provides a dashboard to monitor these execution results. In many cases, the result of an execution is the same as the result of a prior execution. Identifying such equivalence between the execution results of different workflow versions is important for two reasons. First, it can help us reduce the storage cost of the results by storing equivalent results only once. Second, stored results of early executions can be reused for future executions with the same results. Existing tools that track such executions are geared towards small-scale data and lack the means to reuse existing results in future executions. In Drove, we reason the semantic equivalence of the workflow versions to reduce the storage space and reuse the materialized results. 
    more » « less
  4. We will demonstrate a prototype query-processing engine, which utilizes correlations among predicates to accelerate machine learning (ML) inference queries on unstructured data. Expensive operators such as feature extractors and classifiers are deployed as user-defined functions (UDFs), which are not penetrable by classic query optimization techniques such as predicate push-down. Recent optimization schemes (e.g., Probabilistic Predicates or PP) build a cheap proxy model for each predicate offline, and inject proxy models in the front of expensive ML UDFs under the independence assumption in queries. Input records that do not satisfy query predicates are filtered early by proxy models to bypass ML UDFs. But enforcing the independence assumption may result in sub-optimal plans. We use correlative proxy models to better exploit predicate correlations and accelerate ML queries. We will demonstrate our query optimizer called CORE, which builds proxy models online, allocates parameters to each model, and reorders them. We will also show end-to-end query processing with or without proxy models. 
    more » « less
  5. Collaborative data analytics is becoming increasingly important due to the higher complexity of data science, more diverse skills from different disciplines, more common asynchronous schedules of team members, and the global trend of working remotely. In this demo we will show how Texera supports this emerging computing paradigm to achieve high productivity among collaborators with various backgrounds. Based on our active joint projects on the system, we use a scenario of social media analysis to show how a data science task can be conducted on a user friendly yet powerful platform by a multi-disciplinary team including domain scientists with limited coding skills and experienced machine learning experts. We will present how to do collaborative editing of a workflow and collaborative execution of the workflow in Texera. We will focus on data-centric features such as synchronization of operator schemas among the users during the construction phase, and monitoring and controlling the shared runtime during the execution phase. 
    more » « less
  6. We consider accelerating machine learning (ML) inference queries on unstructured datasets. Expensive operators such as feature extractors and classifiers are deployed as user-defined functions (UDFs), which are not penetrable with classic query optimization techniques such as predicate push-down. Recent optimization schemes (e.g., Probabilistic Predicates or PP) assume independence among the query predicates, build a proxy model for each predicate offline, and rewrite a new query by injecting these cheap proxy models in the front of the expensive ML UDFs. In such a manner, unlikely inputs that do not satisfy query predicates are filtered early to bypass the ML UDFs. We show that enforcing the independence assumption in this context may result in sub-optimal plans. In this paper, we propose CORE, a query optimizer that better exploits the predicate correlations and accelerates ML inference queries. Our solution builds the proxy models online for a new query and leverages a branch-and-bound search process to reduce the building costs. Results on three real-world text, image and video datasets show that CORE improves the query throughput by up to 63% compared to PP and up to 80% compared to running the queries as it is. 
    more » « less
  7. While it has been scientifically proven that COVID-19 vaccine is a safe and effective measure to reduce the severity of infection and curbing the spread of the SARS-CoV-2 virus, skepticism remains widespread, and in many countries vaccine mandates have been met with strong opposition. In this study, we applied machine learning-based analyses of the U.S.-based tweets covering the periods leading toward and after the Biden Administration’s announcement of federal vaccine mandates, supplemented by a qualitative content analysis of a random sample of relevant tweets. The objective was to examine the beliefs held among twitter users toward vaccine mandates, as well as the evidence that they used to support their positions. The results show that while approximately 30% of the twitter users included in the dataset supported the measure, more users expressed differing opinions. Concerns raised included questioning on the political motive, infringement of personal liberties, and ineffectiveness in preventing infection. 
    more » « less
  8. Introduction Twitter represents a mainstream news source for the American public, offering a valuable vehicle for learning how citizens make sense of pandemic health threats like Covid-19. Masking as a risk mitigation measure became controversial in the US. The social amplification risk framework offers insight into how a risk event interacts with psychological, social, institutional, and cultural communication processes to shape Covid-19 risk perception. Methods Qualitative content analysis was conducted on 7,024 mask tweets reflecting 6,286 users between January 24 and July 7, 2020, to identify how citizens expressed Covid-19 risk perception over time. Descriptive statistics were computed for (a) proportion of tweets using hyperlinks, (b) mentions, (c) hashtags, (d) questions, and (e) location. Results Six themes emerged regarding how mask tweets amplified and attenuated Covid-19 risk: (a) severity perceptions (18.0%) steadily increased across 5 months; (b) mask effectiveness debates (10.7%) persisted; (c) who is at risk (26.4%) peaked in April and May 2020; (d) mask guidelines (15.6%) peaked April 3, 2020, with federal guidelines; (e) political legitimizing of Covid-19 risk (18.3%) steadily increased; and (f) mask behavior of others (31.6%) composed the largest discussion category and increased over time. Of tweets, 45% contained a hyperlink, 40% contained mentions, 33% contained hashtags, and 16.5% were expressed as a question. Conclusions Users ascribed many meanings to mask wearing in the social media information environment revealing that COVID-19 risk was expressed in a more expanded range than objective risk. The simultaneous amplification and attenuation of COVID-19 risk perception on social media complicates public health messaging about mask wearing. 
    more » « less
  9. Article Authors Metrics Comments Media Coverage Peer Review Abstract Introduction Methods Results Discussion Conclusions Supporting information References Reader Comments Figures Abstract Introduction Twitter represents a mainstream news source for the American public, offering a valuable vehicle for learning how citizens make sense of pandemic health threats like Covid-19. Masking as a risk mitigation measure became controversial in the US. The social amplification risk framework offers insight into how a risk event interacts with psychological, social, institutional, and cultural communication processes to shape Covid-19 risk perception. Methods Qualitative content analysis was conducted on 7,024 mask tweets reflecting 6,286 users between January 24 and July 7, 2020, to identify how citizens expressed Covid-19 risk perception over time. Descriptive statistics were computed for (a) proportion of tweets using hyperlinks, (b) mentions, (c) hashtags, (d) questions, and (e) location. Results Six themes emerged regarding how mask tweets amplified and attenuated Covid-19 risk: (a) severity perceptions (18.0%) steadily increased across 5 months; (b) mask effectiveness debates (10.7%) persisted; (c) who is at risk (26.4%) peaked in April and May 2020; (d) mask guidelines (15.6%) peaked April 3, 2020, with federal guidelines; (e) political legitimizing of Covid-19 risk (18.3%) steadily increased; and (f) mask behavior of others (31.6%) composed the largest discussion category and increased over time. Of tweets, 45% contained a hyperlink, 40% contained mentions, 33% contained hashtags, and 16.5% were expressed as a question. Conclusions Users ascribed many meanings to mask wearing in the social media information environment revealing that COVID-19 risk was expressed in a more expanded range than objective risk. The simultaneous amplification and attenuation of COVID-19 risk perception on social media complicates public health messaging about mask wearing. 
    more » « less