Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Language models (LMs) are pretrained on diverse data sources—news, discussion forums, books, online encyclopedias. A significant portion of this data includes facts and opinions which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. Our work develops new methods to (1) measure media biases in LMs trained on such corpora, along social and economic axes, and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks. Our findings reveal that pretrained LMs do have political leanings which reinforce the polarization present in pretraining corpora, propagating social biases into hate speech predictions and media biases into misinformation detectors. We discuss the implications of our findings for NLP research and propose future directions to mitigate unfairness.more » « less
-
NLP research on public opinion manipulation campaigns has primarily focused on detecting overt strategies such as fake news and disinformation. However, information manipulation in the ongoing Russia-Ukraine war exemplifies how governments and media also employ more nuanced strategies. We release a new dataset, VoynaSlov, containing 38M+ posts from Russian media outlets on Twitter and VKontakte, as well as public activity and responses, immediately preceding and during the 2022 Russia-Ukraine war. We apply standard and recently-developed NLP models on VoynaSlov to examine agenda setting, framing, and priming, several strategies underlying information manipulation, and reveal variation across media outlet control, social media platform, and time. Our examination of these media effects and extensive discussion of current approaches’ limitations encourage further development of NLP models for understanding information manipulation in emerging crises, as well as other real-world and interdisciplinary tasks.more » « less
-
Emotions are a central driving force of activism; they motivate participation in movements and encourage sustained involvement. We use natural language processing techniques to analyze emotions expressed or solicited in tweets about 2020 Black Lives Matter protests. Traditional off-the-shelf emotion analysis tools often fail to generalize to new datasets and are unable to adapt to how social movements can raise new ideas and perspectives in short time spans. Instead, we use a few-shot domain adaptation approach for measuring emotions perceived in this specific domain: tweets about protests in May 2020 following the death of George Floyd. While our analysis identifies high levels of expressed anger and disgust across overall posts, it additionally reveals the prominence of positive emotions (encompassing, e.g., pride, hope, and optimism), which are more prevalent in tweets with explicit pro-BlackLivesMatter hashtags and correlated with on the ground protests. The prevalence of positivity contradicts stereotypical portrayals of protesters as primarily perpetuating anger and outrage. Our work offers data, analyses, and methods to support investigations of online activism and the role of emotions in social movements.more » « less