Stance detection is typically framed as predicting the sentiment in a given text towards a target entity. However, this setup overlooks the importance of the source entity, i.e., who is expressing the opinion. In this paper, we emphasize the imperative need for studying interactions among entities when inferring stances. We first introduce a new task, entity-to-entity (E2E) stance detection, which primes models to identify entities in their canonical names and discern stances jointly. To support this study, we curate a new dataset with 10,641 annotations labeled at the sentence level from news articles of different ideological leanings. We present a novel generative framework to allow the generation of canonical names for entities as well as stances among them. We further enhance the model with a graph encoder to summarize entity activities and external knowledge surrounding the entities. Experiments show that our model outperforms strong comparisons by large margins. Further analyses demonstrate the usefulness of E2E stance detection for understanding media quotation and stance landscape as well as inferring entity ideology.
more »
« less
Who Blames or Endorses Whom? Entity-to-Entity Directed Sentiment Extraction in News Text
Understanding who blames or supports whom in news text is a critical research question in computational social science. Traditional methods and datasets for sentiment analysis are, however, not suitable for the domain of political text as they do not consider the direction of sentiments expressed between entities. In this paper, we propose a novel NLP task of identifying directed sentiment relationship between political entities from a given news document, which we call directed sentiment extraction. From a million-scale news corpus, we construct a dataset of news sentences where sentiment relations of political entities are manually annotated. We present a simple but effective approach for utilizing a pretrained transformer, which infers the target class by predicting multiple question-answering tasks and combining the outcomes. We demonstrate the utility of our proposed method for social science research questions by analyzing positive and negative opinions between political entities in two major events: 2016 U.S. presidential election and COVID-19. The newly proposed problem, data, and method will facilitate future studies on interdisciplinary NLP methods and applications.
more »
« less
- Award ID(s):
- 1831848
- NSF-PAR ID:
- 10299078
- Date Published:
- Journal Name:
- Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021,
- Page Range / eLocation ID:
- 4091-4102
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Language models (LMs) are pretrained on diverse data sources—news, discussion forums, books, online encyclopedias. A significant portion of this data includes facts and opinions which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. Our work develops new methods to (1) measure media biases in LMs trained on such corpora, along social and economic axes, and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks. Our findings reveal that pretrained LMs do have political leanings which reinforce the polarization present in pretraining corpora, propagating social biases into hate speech predictions and media biases into misinformation detectors. We discuss the implications of our findings for NLP research and propose future directions to mitigate unfairness.more » « less
-
Mining textual patterns in news, tweets, papers, and many other kinds of text corpora has been an active theme in text mining and NLP research. Previous studies adopt a dependency parsing-based pattern discovery approach. However, the parsing results lose rich context around entities in the patterns, and the process is costly for a corpus of large scale. In this study, we propose a novel typed textual pattern structure, called meta pattern, which is extended to a frequent, informative, and precise subsequence pattern in certain context. We propose an efficient framework, called MetaPAD, which discovers meta patterns from massive corpora with three techniques: (1) it develops a context-aware segmentation method to carefully determine the boundaries of patterns with a learnt pattern quality assessment function, which avoids costly dependency parsing and generates high-quality patterns; (2) it identifies and groups synonymous meta patterns from multiple facets—their types, contexts, and extractions; and (3) it examines type distributions of entities in the instances extracted by each group of patterns, and looks for appropriate type levels to make discovered patterns precise. Experiments demonstrate that our proposed framework discovers high-quality typed textual patterns efficiently from different genres of massive corpora and facilitates information extraction.more » « less
-
CoMe-KE: A New Transformers Based Approach for Knowledge Extraction in Conflict and Mediation DomainKnowledge discovery and extraction approaches attract special attention across industries and areas moving toward the 5V Era. In the political and social sciences, scholars and governments dedicate considerable resources to develop intelligent systems for monitoring, analyzing and predicting conflicts and affairs involving political entities across the globe. Such systems rely on background knowledge from external knowledge bases, that conflict experts commonly maintain manually. The high costs and extensive human efforts associated with updating and extending these repositories often compromise their correctness of. Here we introduce CoMe-KE (Conflict and Mediation Knowledge Extractor) to extend automatically knowledge bases about conflict and mediation events. We explore state-of-the-art natural language models to discover new political entities, their roles and status from news. We propose a distant supervised method and propose an innovative zero-shot approach based on a dynamic hypothesis procedure. Our methods leverage pre-trained models through transfer learning techniques to obtain excellent results with no need for a labeled data. Finally, we demonstrate the superiority of our method through a comprehensive set of experiments involving two study cases in the social sciences domain. CoMe-KE significantly outperforms the existing baseline, with (on average) double of the performance retrieving new political entities.more » « less
-
We present an analytic study on the language of news media in the context of political fact-checking and fake news detection. We compare the language of real news with that of satire, hoaxes, and propaganda to find linguistic characteristics of untrustworthy text. To probe the feasibility of automatic political fact-checking, we also present a case study based on PolitiFact.com using their factuality judgments on a 6-point scale. Experiments show that while media fact-checking remains to be an open research question, stylistic cues can help determine the truthfulness of text.more » « less