skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Title: Studying Reddit: A Systematic Overview of Disciplines, Approaches, Methods, and Ethics
This article offers a systematic analysis of 727 manuscripts that used Reddit as a data source, published between 2010 and 2020. Our analysis reveals the increasing growth in use of Reddit as a data source, the range of disciplines this research is occurring in, how researchers are getting access to Reddit data, the characteristics of the datasets researchers are using, the subreddits and topics being studied, the kinds of analysis and methods researchers are engaging in, and the emerging ethical questions of research in this space. We discuss how researchers need to consider the impact of Reddit’s algorithms, affordances, and generalizability of the scientific knowledge produced using Reddit data, as well as the potential ethical dimensions of research that draws data from subreddits with potentially sensitive populations.  more » « less
Award ID(s):
1947754 1704369
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Social Media + Society
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bots are increasingly being used for governance-related purposes in online communities, yet no instrumentation exists for measuring how users assess their beneficial or detrimental impacts. In order to support future human-centered and community-based research, we developed a new scale called GOVernance Bots in Online communiTies (GOV-BOTs) across two rounds of surveys on Reddit (N=820). We applied rigorous psychometric criteria to demonstrate the validity of GOV-BOTs, which contains two subscales: bot governance (4 items) and bot tensions (3 items). Whereas humans have historically expected communities to be composed entirely of humans, the social participation of bots as non-human agents now raises fundamental questions about psychological, philosophical, and ethical implications. Addressing psychological impacts, our data show that perceptions of effective bot governance positively contribute to users' sense of virtual community (SOVC), whereas perceived bot tensions may only impact SOVC if users are more aware of bots. Finally, we show that users tend to experience the greatest SOVC across groups of subreddits, rather than individual subreddits, suggesting that future research should carefully re-consider uses and operationalizations of the term community.

    more » « less
  2. Social media provides unique opportunities for researchers to learn about a variety of phenomena—it is often publicly available, highly accessible, and affords more naturalistic observation. However, as research using social media data has increased, so too has public scrutiny, highlighting the need to develop ethical approaches to social media data use. Prior work in this area has explored users’ perceptions of researchers’ use of social media data in the context of a single platform. In this paper, we expand on that work, exploring how platforms and their affordances impact how users feel about social media data reuse. We present results from three factorial vignette surveys, each focusing on a different platform—dating apps, Instagram, and Reddit—to assess users’ comfort with research data use scenarios across a variety of contexts. Although our results highlight different expectations between platforms depending on the research domain, purpose of research, and content collected, we find that the factor with the greatest impact across all platforms is consent—a finding which presents challenges for big data researchers. We conclude by offering a sociotechnical approach to ethical decision-making. This approach provides recommendations on how researchers can interpret and respond to platform norms and affordances to predict potential data use sensitivities. The approach also recommends that researchers respond to the predominant expectation of notification and consent for research participation by bolstering awareness of data collection on digital platforms. 
    more » « less
  3. Phillip Bradford, S. Andrew (Ed.)
    The stock market is as volatile as it is unpredictable, the unstable nature of the stock market results in fluctuations in stock prices and invariably, the market performance of stocks. Understanding the underlying factors that contribute to the volatility of the stock market, which has its consequences on stock prices, has become important to researchers and investors alike. Some of the methods that researchers have used in the past as a gauge for understanding market performance include analyzing economic conditions, understanding company performance, following geopolitical events and market trends. To contribute to the vast research field of stock price predictions and the challenge of understanding stock price fluctuations, this study will aim to find a relationship between human sentiments on the social media platform, Reddit, and the S&P 500 stock index. In this study, we will analyze posts from five subreddits that typically discuss the stock market and stock price fluctuations. This will form the first part of our dataset. Historical stock prices for the S&P 500 index will be obtained from Yahoo Finance. This will form our final dataset. Using VADER (Valence aware dictionary and sentiment reasoner), we will extract the sentiments within the five subreddits and categorize them into positive and negative sentiments. The historical stock prices from Yahoo finance will be matched with the aggregate sentiments for each day and this data passed through the LSTM model for training. Our findings provide strong evidence of social media’s impact on stock price predictions. 
    more » « less
  4. Online volunteers are an uncompensated yet valuable labor force for many social platforms. For example, volunteer content moderators perform a vast amount of labor to maintain online communities. However, as social platforms like Reddit favor revenue generation and user engagement, moderators are under-supported to manage the expansion of online communities. To preserve these online communities, developers and researchers of social platforms must account for and support as much of this labor as possible. In this paper, we quantitatively characterize the publicly visible and invisible actions taken by moderators on Reddit, using a unique dataset of private moderator logs for 126 subreddits and over 900 moderators. Our analysis of this dataset reveals the heterogeneity of moderation work across both communities and moderators. Moreover, we find that analyzing only visible work – the dominant way that moderation work has been studied thus far – drastically underestimates the amount of human moderation labor on a subreddit. We discuss the implications of our results on content moderation research and social platforms. 
    more » « less
  5. Research has rarely examined how the COVID-19 pandemic may affect teens’ social media engagement and psychological wellbeing, and even less research has compared the difference between teens with and without mental health concerns. We collected and analyzed weekly data from January to December 2020 from teens in four Reddit communities (subreddits), including teens in r/Teenagers and teens who participated in three mental health subreddits (r/Depression, r/Anxiety, and r/SuicideWatch). The results showed that teens’ weekly subreddit participation, posting/commenting frequency, and emotion expression were related to significant pandemic events. Teen Redditors on r/Teenagers had a higher posting/commenting frequency but lower negative emotion than teen Redditors on the three mental health subreddits. When comparing posts/comments on r/Teenagers, teens who ever visited one of the three mental health subreddits posted/commented twice as frequently as teens who did not, but their emotion expression was similar. The results from the Interrupted Time Series Analysis (ITSA) indicated that both teens with and without mental health concerns reversed the trend in posting frequency and negative emotion from declining to increasing right after the pandemic outbreak, and teens with mental health concerns had a more rapidly increasing trend in posting/commenting. The findings suggest that teens’ social media engagement and emotion expression reflect the pandemic evolution. Teens with mental health concerns are more likely to reveal their emotions on specialized mental health subreddits rather than on the general r/Teenagers subreddit. In addition, the findings indicated that teens with mental health concerns had a strong social interaction desire that various barriers in the real world may inhibit. The findings call for more attention to understand the pandemic’s influence on teens by monitoring and analyzing social media data and offering adequate support to teens regarding their mental health wellbeing. 
    more » « less