skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

Title: A Human-Centered Systematic Literature Review of Cyberbullying Detection Algorithms
Cyberbullying is a growing problem across social media platforms, inflicting short and long-lasting effects on victims. To mitigate this problem, research has looked into building automated systems, powered by machine learning, to detect cyberbullying incidents, or the involved actors like victims and perpetrators. In the past, systematic reviews have examined the approaches within this growing body of work, but with a focus on the computational aspects of the technical innovation, feature engineering, or performance optimization, without centering around the roles, beliefs, desires, or expectations of humans. In this paper, we present a human-centered systematic literature review of the past 10 years of research on automated cyberbullying detection. We analyzed 56 papers based on a three-prong human-centeredness algorithm design framework - spanning theoretical, participatory, and speculative design. We found that the past literature fell short of incorporating human-centeredness across multiple aspects, ranging from defining cyberbullying, establishing the ground truth in data annotation, evaluating the performance of the detection models, to speculating the usage and users of the models, including potential harms and negative consequences. Given the sensitivities of the cyberbullying experience and the deep ramifications cyberbullying incidents bear on the involved actors, we discuss takeaways on how incorporating human-centeredness in future research can aid with developing detection systems that are more practical, useful, and tuned to the diverse needs and contexts of the stakeholders.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the ACM on Human-Computer Interaction
Page Range / eLocation ID:
1 to 34
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Cyberbullying is a prevalent concern within social computing research that has led to the development of several supervised machine learning (ML) algorithms for automated risk detection. A critical aspect of ML algorithm development is how to establish ground truth that is representative of the phenomenon of interest in the real world. Often, ground truth is determined by third-party annotators (i.e., “outsiders”) who are removed from the situational context of the interaction; therefore, they cannot fully understand the perspective of the individuals involved (i.e., “insiders”). To understand the extent of this problem, we compare “outsider” versus “insider” perspectives when annotating 2,000 posts from an online peer-support platform. We interpolate this analysis to a corpus containing over 2.3 million posts on bullying and related topics, and reveal significant gaps in ML models that use third-party annotators to detect bullying incidents. Our results indicate that models based on the insiders’ perspectives yield a significantly higher recall in identifying bullying posts and are able to capture a range of explicit and implicit references and linguistic framings, including person-specific impressions of the incidents. Our study highlights the importance of incorporating the victim’s point of view in establishing effective tools for cyberbullying risk detection. As such, we advocate for the adoption of human-centered and value-sensitive approaches for algorithm development that bridge insider-outsider perspective gaps in a way that empowers the most vulnerable. 
    more » « less
  2. null (Ed.)
    The prevalence of mobile phones and wearable devices enables the passive capturing and modeling of human behavior at an unprecedented resolution and scale. Past research has demonstrated the capability of mobile sensing to model aspects of physical health, mental health, education, and work performance, etc. However, most of the algorithms and models proposed in previous work follow a one-size-fits-all (i.e., population modeling) approach that looks for common behaviors amongst all users, disregarding the fact that individuals can behave very differently, resulting in reduced model performance. Further, black-box models are often used that do not allow for interpretability and human behavior understanding. We present a new method to address the problems of personalized behavior classification and interpretability, and apply it to depression detection among college students. Inspired by the idea of collaborative-filtering, our method is a type of memory-based learning algorithm. It leverages the relevance of mobile-sensed behavior features among individuals to calculate personalized relevance weights, which are used to impute missing data and select features according to a specific modeling goal (e.g., whether the student has depressive symptoms) in different time epochs, i.e., times of the day and days of the week. It then compiles features from epochs using majority voting to obtain the final prediction. We apply our algorithm on a depression detection dataset collected from first-year college students with low data-missing rates and show that our method outperforms the state-of-the-art machine learning model by 5.1% in accuracy and 5.5% in F1 score. We further verify the pipeline-level generalizability of our approach by achieving similar results on a second dataset, with an average improvement of 3.4% across performance metrics. Beyond achieving better classification performance, our novel approach is further able to generate personalized interpretations of the models for each individual. These interpretations are supported by existing depression-related literature and can potentially inspire automated and personalized depression intervention design in the future 
    more » « less
  3. Cyberbullying has become one of the most pressing online risks for young people and has raised serious concerns in society. The emerging literature identifies cyberbullying as repetitive acts that occur over time rather than one-off incidents. Yet, there has been relatively little work to model the hierarchical structure of social media sessions and the temporal dynamics of cyberbullying in online social network sessions. We propose a hierarchical attention network for cyberbullying detection that takes these aspects of cyberbullying into account. The primary distinctive characteristics of our approach include: (i) a hierarchical structure that mirrors the structure of a social media session; (ii) levels of attention mechanisms applied at the word and comment level, thereby enabling the model to pay different amounts of attention to words and comments, depending on the context; and (iii) a cyberbullying detection task that also predicts the interval of time between two adjacent comments. These characteristics allow the model to exploit the commonalities and differences across these two tasks to improve the performance of cyberbullying detection. Experiments on a real-world dataset from Instagram, the social media platform on which the highest percentage of users have reported experiencing cyberbullying, reveal that the proposed architecture outperforms the state-of-the-art method. 
    more » « less
  4. Cyberbullying has become increasingly prevalent, particularly on social media. There has also been a steady rise in cyberbullying research across a range of disciplines. Much of the empirical work from computer science has focused on developing machine learning models for cyberbullying detection. Whereas machine learning cyberbullying detection models can be improved by drawing on psychological theories and perspectives, there is also tremendous potential for machine learning models to contribute to a better understanding of psychological aspects of cyberbullying. In this paper, we discuss how machine learning models can yield novel insights about the nature and defining characteristics of cyberbullying and how machine learning approaches can be applied to help clinicians, families, and communities reduce cyberbullying. Specifically, we discuss the potential for machine learning models to shed light on the repetitive nature of cyberbullying, the imbalance of power between cyberbullies and their victims, and causal mechanisms that give rise to cyberbullying. We orient our discussion on emerging and future research directions, as well as the practical implications of machine learning cyberbullying detection models. 
    more » « less
  5. Cyberbullying has become one of the most pressing online risks for adolescents and has raised serious concerns in society. Traditional efforts are primarily devoted to building a single generic classification model for all users to differentiate bullying behaviors from the normal content [6, 3, 1, 2, 4]. Despite its empirical success, these models treat users equally and inevitably ignore the idiosyncrasies of users. Recent studies from psychology and sociology suggest that the occurrence of cyberbullying has a strong connection with the personality of victims and bullies embedded in the user-generated content, and the peer influence from like-minded users. In this paper, we propose a personalized cyberbullying detection framework PI-Bully with peer influence in a collaborative environment to tailor the prediction for each individual. In particular, the personalized classifier of each individual consists of three components: a global model that captures the commonality shared by all users, a personalized model that expresses the idiosyncratic personality of each specific user, and a third component that encodes the peer influence received from like-minded users. Most of the existing methods adopt a two-stage approach: they first apply feature engineering to capture the cyberbullying patterns and then employ machine learning classifiers to detect cyberbullying behaviors. However, building a personalized cyberbullying detection framework that is customized to each individual remains a challenging task, in large part because: (1) Social media data is often sparse, noisy and high-dimensional (2) It is important to capture the commonality shared by all users as well as idiosyncratic aspects of the personality of each individual for automatic cyberbullying detection; (3) In reality, a potential victim of cyberbullying is often influenced by peers and the influences from different users could be quite diverse. Hence, it is imperative to develop a way to encode the diversity of peer influence for cyberbullying detection. To summarize, we study a novel problem of personalized cyberbullying detection with peer influence in a collaborative environment, which is able to jointly model users' common features, unique personalities and peer influence to identify cyberbullying cases. 
    more » « less