skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Unsupervised Explainable Controversy Detection from Online News
Alerting users that a web page is controversial has been proposed as one method to support critical thinking about text and discourse. We propose an approach to discover controversial topics in a generic document using unsupervised training. Our approach comprises iterative training of a controversy classifier using a disagreement signal within comments and explaining the controversy of the document by generating a topic phrase describing it. Experiments show the effectiveness of our proposed training method using an EM algorithm. When controversial topic extraction is restricted to quality phrases and incorporates TextRank signals, it outperforms several baseline approaches.  more » « less
Award ID(s):
1813662
PAR ID:
10110147
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the European Conference on Information Retrieval
Page Range / eLocation ID:
836-843
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We collect a corpus of 1554 online news articles from 23 RSS feeds and analyze it in terms of controversy and sentiment. We use several existing sentiment lexicons and lists of controversial terms to perform a number of statistical analyses that explore how sentiment and controversy are related. We conclude that the negative sentiment and controversy are not necessarily positively correlated as has been claimed in the past. In addition, we apply an information theoretic approach and suggest that entropy might be a good predictor of controversy. 
    more » « less
  2. Existing topic modeling and text segmentation methodologies generally require large datasets for training, limiting their capabilities when only small collections of text are available. In this work, we reexamine the inter-related problems of “topic identification” and “text segmentation” for sparse document learning, when there is a single new text of interest. In developing a methodology to handle single documents, we face two major challenges. First is sparse information : with access to only one document, we cannot train traditional topic models or deep learning algorithms. Second is significant noise : a considerable portion of words in any single document will produce only noise and not help discern topics or segments. To tackle these issues, we design an unsupervised, computationally efficient methodology called Biclustering Approach to Topic modeling and Segmentation (BATS). BATS leverages three key ideas to simultaneously identify topics and segment text: (i) a new mechanism that uses word order information to reduce sample complexity, (ii) a statistically sound graph-based biclustering technique that identifies latent structures of words and sentences, and (iii) a collection of effective heuristics that remove noise words and award important words to further improve performance. Experiments on six datasets show that our approach outperforms several state-of-the-art baselines when considering topic coherence, topic diversity, segmentation, and runtime comparison metrics. 
    more » « less
  3. In this paper, we propose a deep learning approach to tackle the automatic summarization tasks by incorporating topic information into the convolutional sequence-to-sequence (ConvS2S) model and using self-critical sequence training (SCST) for optimization. Through jointly attending to topics and word-level alignment, our approach can improve coherence, diversity, and informativeness of generated summaries via a biased probability generation mechanism. On the other hand, reinforcement training, like SCST, directly optimizes the proposed model with respect to the non-differentiable metric ROUGE, which also avoids the exposure bias during inference. We carry out the experimental evaluation with state-of-the-art methods over the Gigaword, DUC-2004, and LCSTS datasets. The empirical results demonstrate the superiority of our proposed method in the abstractive summarization. 
    more » « less
  4. Bioethics is an important aspect of understanding the relationship between science and society, but studies have not yet examined undergraduate student experiences and comfort in bioethics courses. In this study, we investigated undergraduate bioethics students’ support of and comfort when learning three controversial bioethics topics: gene editing, abortion, and physician-assisted suicide (PAS). Furthermore, student identity has been shown to influence how students perceive and learn about controversial topics at the intersection of science and society. So, we explored how students’ religious affiliation, gender, or political affiliation was associated with their support of and comfort when learning about gene editing, abortion, and PAS. We found that most students entered bioethics with moderated viewpoints on controversial topics but that there were differences in students’ tendency to support each topic based on their gender, religion, and political affiliation. We also saw differences in student comfort levels based on identity: women reported lower comfort than men when learning about gene editing, religious students were less comfortable than nonreligious students when learning about abortion and PAS, and nonliberal students were less comfortable than liberal students when learning about abortion. Students cited that the controversy surrounding these topics and a personal hesitancy to discuss them caused discomfort. These findings indicate that identity impacts comfort and support in a way similar to that previously shown in the public. Thus, it may be important for instructors to consider student identity when teaching bioethics topics to maximize student comfort, ultimately encouraging thoughtful consideration and engagement with these topics. 
    more » « less
  5. Krause, Andreas; Brunskill, Emma; Cho, Kyunghyun; Engelhardt, Barbara; Sabato, Sivan; Scarlett, Jonathan (Ed.)
    Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not easy to add new documents after a model is trained. We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof). Instead we formulate the addition of documents as a constrained optimization problem that makes minimal changes to the network parameters. Although orders of magnitude faster, our approach is competitive with re-training the model on the whole dataset and enables the development of document retrieval systems that can be updated with new information in real-time. Our code for IncDSI is available at \href{https://github.com/varshakishore/IncDSI}{https://github.com/varshakishore/IncDSI}. 
    more » « less