Online antisocial behavior, such as cyberbullying, harassment, and trolling, is a widespread problem that threatens free discussion and has negative physical and mental health consequences for victims and communities. While prior work has proposed automated methods to identify hostile comments in online discussions, these methods work retrospectively on comments that have already been posted, making it difficult to intervene before an interaction escalates. In this paper we instead consider the problem of forecasting future hostilities in online discussions, which we decompose into two tasks: (1) given an initial sequence of non-hostile comments in a discussion, predict whether some future comment will contain hostility; and (2) given the first hostile comment in a discussion, predict whether this will lead to an escalation of hostility in subsequent comments. Thus, we aim to forecast both the presence and intensity of hostile comments based on linguistic and social features from earlier comments. To evaluate our approach, we introduce a corpus of over 30K annotated Instagram comments from over 1,100 posts. Our approach is able to predict the appearance of a hostile comment on an Instagram post ten or more hours in the future with an AUC of .82 (task 1), and can furthermore distinguish between high and low levels of future hostility with an AUC of .91 (task 2).
more »
« less
Something’s Brewing! Early Prediction of Controversy-causing Posts from Discussion Features
Controversial posts are those that split the preferences of a community, receiving both significant positive and significant negative feedback. Our inclusion of the word “community” here is deliberate: what is controversial to some audiences may not be so to others. Using data from several different communities on reddit.com, we predict the ultimate controversiality of posts, leveraging features drawn from both the textual content and the tree structure of the early comments that initiate the discussion. We find that even when only a handful of comments are available, e.g., the first 5 comments made within 15 minutes of the original post, discussion features often add predictive capacity to strong content-andrate only baselines. Additional experiments on domain transfer suggest that conversations tructure features often generalize to other communities better than conversation-content features do.
more »
« less
- Award ID(s):
- 1741441
- PAR ID:
- 10113281
- Date Published:
- Journal Name:
- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Volume:
- 1
- Page Range / eLocation ID:
- 1648 to 1659
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Online discussion platforms provide a forum to strengthen and propagate belief in misinformed conspiracy theories. Yet, they also offer avenues for conspiracy theorists to express their doubts and experiences of cognitive dissonance. Such expressions of dissonance may shed light on who abandons misguided beliefs and under what circumstances. This paper characterizes self-disclosures of dissonance about QAnon-a conspiracy theory initiated by a mysterious leader "Q" and popularized by their followers ?anons"-in conspiratorial subreddits. To understand what dissonance and disbelief mean within conspiracy communities, we first characterize their social imaginaries-a broad understanding of how people collectively imagine their social existence. Focusing on 2K posts from two image boards, 4chan and 8chan, and 1.2 M comments and posts from 12 subreddits dedicated to QAnon, we adopt a mixed-methods approach to uncover the symbolic language representing the movement,expectations,practices,heroes and foes of the QAnon community. We use these social imaginaries to create a computational framework for distinguishing belief and dissonance from general discussion about QAnon, surfacing in the 1.2M comments. We investigate the dissonant comments to characterize the dissonance expressed along QAnon social imaginaries. Further, analyzing user engagement with QAnon conspiracy subreddits, we find that self-disclosures of dissonance correlate with a significant decrease in user contributions and ultimately with their departure from the community. Our work offers a systematic framework for uncovering the dimensions and coded language related to QAnon social imaginaries and can serve as a toolbox for studying other conspiracy theories across different platforms. We also contribute a computational framework for identifying dissonance self-disclosures and measuring the changes in user engagement surrounding dissonance. Our work provide insights into designing dissonance based interventions that can potentially dissuade conspiracists from engaging in online conspiracy discussion communities.more » « less
-
Budak, Ceren ; Cha, Meeyoung ; Quercia, Daniele ; Xie, Lexing (Ed.)Despite the influence that image-based communication has on online discourse, the role played by images in disinformation is still not well understood. In this paper, we present the first large-scale study of fauxtography, analyzing the use of manipulated or misleading images in news discussion on online communities. First, we develop a computational pipeline geared to detect fauxtography, and identify over 61k instances of fauxtography discussed on Twitter, 4chan, and Reddit. Then, we study how posting fauxtography affects engagement of posts on social media, finding that posts containing it receive more interactions in the form of re-shares, likes, and comments. Finally, we show that fauxtography images are often turned into memes by Web communities. Our findings show that effective mitigation against disinformation need to take images into account, and highlight a number of challenges in dealing with image-based disinformation.more » « less
-
null (Ed.)Stack Overflow is commonly used by software developers to help solve problems they face while working on software tasks such as fixing bugs or building new features. Recent research has explored how the content of Stack Overflow posts affects attraction and how the reputation of users attracts more visitors. However, there is very little evidence on the effect that visual attractors and content quantity have on directing gaze toward parts of a post, and which parts hold the attention of a user longer. Moreover, little is known about how these attractors help developers (students and professionals) answer comprehension questions. This paper presents an eye tracking study on thirty developers constrained to reading only Stack Overflow posts while summarizing four open source methods or classes. Results indicate that on average paragraphs and code snippets were fixated upon most often and longest. When ranking pages by number of appearance of code blocks and paragraphs, we found that while the presence of more code blocks did not affect number of fixations, the presence of increasing numbers of plain text paragraphs significantly drove down the fixations on comments. SO posts that were looked at only by students had longer fixation times on code elements within the first ten fixations. We found that 16 developer summaries contained 5 or more meaningful terms from SO posts they viewed. We discuss how our observations of reading behavior could benefit how users structure their posts.more » « less
-
Due to challenges around low-quality comments and misinformation, many news outlets have opted to turn off commenting features on their websites. The New York Times (NYT), on the other hand, has continued to scale up its online discussion resources to reach large audiences. Through interviews with the NYT moderation team, we present examples of how moderators manage the first ~24 hours of online discussion after a story breaks, while balancing concerns about journalistic credibility. We discuss how managing comments at the NYT is not merely a matter of content regulation, but can involve reporting from the "community beat" to recognize emerging topics and synthesize the multiple perspectives in a discussion to promote community. We discuss how other news organizations---including those lacking moderation resources---might appropriate the strategies and decisions offered by the NYT. Future research should investigate strategies to share and update the information generated about topics in the news through the course of content moderation.more » « less