skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: “Nice Try, Kiddo”: Investigating Ad Hominems in Dialogue Responses
Ad hominem attacks are those that target some feature of a person’s character instead of the position the person is maintaining. These attacks are harmful because they propagate implicit biases and diminish a person’s credibility. Since dialogue systems respond directly to user input, it is important to study ad hominems in dialogue responses. To this end, we propose categories of ad hominems, compose an annotated dataset, and build a classifier to analyze human and dialogue system responses to English Twitter posts. We specifically compare responses to Twitter topics about marginalized communities (#BlackLivesMatter, #MeToo) versus other topics (#Vegan, #WFH), because the abusive language of ad hominems could further amplify the skew of power away from marginalized populations. Furthermore, we propose a constrained decoding technique that uses salient n-gram similarity as a soft constraint for top-k sampling to reduce the amount of ad hominems generated. Our results indicate that 1) responses from both humans and DialoGPT contain more ad hominems for discussions around marginalized communities, 2) different quantities of ad hominems in the training data can influence the likelihood of generating ad hominems, and 3) we can use constrained decoding techniques to reduce ad hominems in generated dialogue responses.  more » « less
Award ID(s):
1927554
PAR ID:
10294388
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Page Range / eLocation ID:
750 to 767
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. While research has been conducted with and in marginalized or vulnerable groups, explicit guidelines and best practices centering on specific communities are nascent. An excellent case study to engage within this aspect of research is Black Twitter. This research project considers the history of research with Black communities, combined with empirical work that explores how people who engage with Black Twitter think about research and researchers in order to suggest potential good practices and what researchers should know when studying Black Twitter or other digital traces from marginalized or vulnerable online communities. From our interviews, we gleaned that Black Twitter users feel differently about their content contributing to a research study depending on, for example, the type of content and the positionality of the researcher. Much of the advice participants shared for researchers involved an encouragement to cultivate cultural competency, get to know the community before researching it, and conduct research transparently. Aiming to improve the experience of research for both Black Twitter and researchers, this project is a stepping stone toward future work that further establishes and expands user perceptions of research ethics for online communities composed of vulnerable populations. 
    more » « less
  2. The proliferation of Internet-enabled smartphones has ushered in an era where events are reported on social media websites such as Twitter and Facebook. However, the short text nature of social media posts, combined with a large volume of noise present in such datasets makes event detection challenging. This problem can be alleviated by using other sources of information, such as news articles, that employ a precise and factual vocabulary, and are more descriptive in nature. In this paper, we propose Spatio-Temporal Event Detection (STED), a probabilistic model to discover events, their associated topics, time of occurrence, and the geospatial distribution from multiple data sources, such as news and Twitter. The joint modeling of news and Twitter enables our model to distinguish events from other noisy topics present in Twitter data. Furthermore, the presence of geocoordinates and timestamps in tweets helps find the spatio-temporal distribution of the events. We evaluate our model on a large corpus of Twitter and news data, and our experimental results show that STED can effectively discover events, and outperforms state-of-the-art techniques. 
    more » « less
  3. In the U.S., navigating STEM with marginalized identities can affect scientists' communication practices. There is a critical need for science communication training that accounts for the historical oppressions, discriminations, and inequities of marginalized communities. In this paper we analyzed 712 participant responses from ReclaimingSTEM science communication workshops to understand how marginalized scientists' identities influence their science communication practices. We found that participants' experiences of exclusion and hostility in STEM spaces influenced their engagement in science communication. Scientists from marginalized backgrounds aim to change the culture of STEM through their communication efforts to promote a sense of belonging for their communities. 
    more » « less
  4. Researchers using social media data want to understand the discussions occurring in and about their respective fields. These domain experts often turn to topic models to help them see the entire landscape of the conversation, but unsupervised topic models often produce topic sets that miss topics experts expect or want to see. To solve this problem, we propose Guided Topic-Noise Model (GTM), a semi-supervised topic model designed with large domain-specific social media data sets in mind. The input to GTM is a set of topics that are of interest to the user and a small number of words or phrases that belong to those topics. These seed topics are used to guide the topic generation process, and can be augmented interactively, expanding the seed word list as the model provides new relevant words for different topics. GTM uses a novel initialization and a new sampling algorithm called Generalized Polya Urn (GPU) seed word sampling to produce a topic set that includes expanded seed topics, as well as new unsupervised topics. We demonstrate the robustness of GTM on open-ended responses from a public opinion survey and four domain-specific Twitter data sets. 
    more » « less
  5. When natural disasters occur, various organizations and agencies turn to social media to understand who needs help and how they have been affected. The purpose of this study is twofold: first, to evaluate whether hurricane-related tweets have some consistency over time, and second, whether Twitter-derived content is thematically similar to other private social media data. Through a unique method of using Twitter data gathered from six different hurricanes, alongside private data collected from qualitative interviews conducted in the immediate aftermath of Hurricane Harvey, we hypothesize that there is some level of stability across hurricane-related tweet content over time that could be used for better real-time processing of social media data during natural disasters. We use latent Dirichlet allocation (LDA) to derive topics, and, using Hellinger distance as a metric, find that there is a detectable connection among hurricane topics. By uncovering some persistent thematic areas and topics in disaster-related tweets, we hope these findings can help first responders and government agencies discover urgent content in tweets more quickly and reduce the amount of human intervention needed. 
    more » « less