skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Unsupervised Explainable Controversy Detection from Online News
Alerting users that a web page is controversial has been proposed as one method to support critical thinking about text and discourse. We propose an approach to discover controversial topics in a generic document using unsupervised training. Our approach comprises iterative training of a controversy classifier using a disagreement signal within comments and explaining the controversy of the document by generating a topic phrase describing it. Experiments show the effectiveness of our proposed training method using an EM algorithm. When controversial topic extraction is restricted to quality phrases and incorporates TextRank signals, it outperforms several baseline approaches.  more » « less
Award ID(s):
1813662
PAR ID:
10110147
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the European Conference on Information Retrieval
Page Range / eLocation ID:
836-843
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We collect a corpus of 1554 online news articles from 23 RSS feeds and analyze it in terms of controversy and sentiment. We use several existing sentiment lexicons and lists of controversial terms to perform a number of statistical analyses that explore how sentiment and controversy are related. We conclude that the negative sentiment and controversy are not necessarily positively correlated as has been claimed in the past. In addition, we apply an information theoretic approach and suggest that entropy might be a good predictor of controversy. 
    more » « less
  2. Existing topic modeling and text segmentation methodologies generally require large datasets for training, limiting their capabilities when only small collections of text are available. In this work, we reexamine the inter-related problems of “topic identification” and “text segmentation” for sparse document learning, when there is a single new text of interest. In developing a methodology to handle single documents, we face two major challenges. First is sparse information : with access to only one document, we cannot train traditional topic models or deep learning algorithms. Second is significant noise : a considerable portion of words in any single document will produce only noise and not help discern topics or segments. To tackle these issues, we design an unsupervised, computationally efficient methodology called Biclustering Approach to Topic modeling and Segmentation (BATS). BATS leverages three key ideas to simultaneously identify topics and segment text: (i) a new mechanism that uses word order information to reduce sample complexity, (ii) a statistically sound graph-based biclustering technique that identifies latent structures of words and sentences, and (iii) a collection of effective heuristics that remove noise words and award important words to further improve performance. Experiments on six datasets show that our approach outperforms several state-of-the-art baselines when considering topic coherence, topic diversity, segmentation, and runtime comparison metrics. 
    more » « less
  3. In this paper, we propose a deep learning approach to tackle the automatic summarization tasks by incorporating topic information into the convolutional sequence-to-sequence (ConvS2S) model and using self-critical sequence training (SCST) for optimization. Through jointly attending to topics and word-level alignment, our approach can improve coherence, diversity, and informativeness of generated summaries via a biased probability generation mechanism. On the other hand, reinforcement training, like SCST, directly optimizes the proposed model with respect to the non-differentiable metric ROUGE, which also avoids the exposure bias during inference. We carry out the experimental evaluation with state-of-the-art methods over the Gigaword, DUC-2004, and LCSTS datasets. The empirical results demonstrate the superiority of our proposed method in the abstractive summarization. 
    more » « less
  4. Bioethics is an important aspect of understanding the relationship between science and society, but studies have not yet examined undergraduate student experiences and comfort in bioethics courses. In this study, we investigated undergraduate bioethics students’ support of and comfort when learning three controversial bioethics topics: gene editing, abortion, and physician-assisted suicide (PAS). Furthermore, student identity has been shown to influence how students perceive and learn about controversial topics at the intersection of science and society. So, we explored how students’ religious affiliation, gender, or political affiliation was associated with their support of and comfort when learning about gene editing, abortion, and PAS. We found that most students entered bioethics with moderated viewpoints on controversial topics but that there were differences in students’ tendency to support each topic based on their gender, religion, and political affiliation. We also saw differences in student comfort levels based on identity: women reported lower comfort than men when learning about gene editing, religious students were less comfortable than nonreligious students when learning about abortion and PAS, and nonliberal students were less comfortable than liberal students when learning about abortion. Students cited that the controversy surrounding these topics and a personal hesitancy to discuss them caused discomfort. These findings indicate that identity impacts comfort and support in a way similar to that previously shown in the public. Thus, it may be important for instructors to consider student identity when teaching bioethics topics to maximize student comfort, ultimately encouraging thoughtful consideration and engagement with these topics. 
    more » « less
  5. Abstract Neural topic modeling is a scalable automated technique for text data mining. In various downstream tasks of topic modeling, it is preferred that the discovered topics well align with labels. However, due to the lack of guidance from labels, unsupervised neural topic models are less powerful in this situation. Existing supervised neural topic models often adopt a label-free prior to generate the latent document-topic distributions and use them to predict the labels and thus achieve label-topic alignment indirectly. Such a mechanism faces the following issues: 1) The label-free prior leads to topics blending the latent patterns of multiple labels; and 2) One is unable to intuitively identify the explicit relationships between labels and the discovered topics. To tackle these problems, we develop a novel supervised neural topic model which utilizes a chain-structured graphical model with a label-conditioned prior. Soft indicators are introduced to explicitly construct the label-topic relationships. To obtain well-organized label-topic relationships, we formalize an entropy-regularized optimal transport problem on the embedding space and model them as the transport plan. Moreover, our proposed method can be flexibly integrated with most existing unsupervised neural topic models. Experimental results on multiple datasets demonstrate that our model can greatly enhance the alignment between labels and topics while maintaining good topic quality. 
    more » « less