skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Labeled Dataset for Investigating Cyberbullying Content Patterns in Instagram
As online communication continues to become more prevalent, instances of cyberbullying have also become more common, particularly on social media sites. Previous research in this area has studied cyberbullying outcomes, predictors of cyberbullying victimization/perpetration, and computational detection models that rely on labeled datasets to identify the underlying patterns. However, there is a dearth of work examining the content of what is said when cyberbullying occurs and most of the available datasets include only basic la-bels (cyberbullying or not). This paper presents an annotated Instagram dataset with detailed labels about key cyberbullying properties, such as the content type, purpose, directionality, and co-occurrence with other phenomena, as well as demographic information about the individuals who performed the annotations. Additionally, results of an exploratory logistic regression analysis are reported to illustrate how new insights about cyberbullying and its automatic detection can be gained from this labeled dataset.  more » « less
Award ID(s):
2227488
PAR ID:
10384704
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
The 16th International AAAI Conference on Web and Social Media (ICWSM)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cyberbullying has become one of the most pressing online risks for adolescents and has raised serious concerns in society. Recent years have witnessed a surge in research aimed at developing principled learning models to detect cyberbullying behaviors. These efforts have primarily focused on building a single generic classification model to differentiate bullying content from normal (non-bullying) content among all users. These models treat users equally and overlook idiosyncratic information about users that might facilitate the accurate detection of cyberbullying. In this paper, we propose a personalized cyberbullying detection framework, PI-Bully, that draws on empirical findings from psychology highlighting unique characteristics of victims and bullies and peer influence from like-minded users as predictors of cyberbullying behaviors. Our framework is novel in its ability to model peer influence in a collaborative environment and tailor cyberbullying prediction for each individual user. Extensive experimental evaluations on real-world datasets corroborate the effectiveness of the proposed framework. 
    more » « less
  2. Cyberbullying has become increasingly prevalent, particularly on social media. There has also been a steady rise in cyberbullying research across a range of disciplines. Much of the empirical work from computer science has focused on developing machine learning models for cyberbullying detection. Whereas machine learning cyberbullying detection models can be improved by drawing on psychological theories and perspectives, there is also tremendous potential for machine learning models to contribute to a better understanding of psychological aspects of cyberbullying. In this paper, we discuss how machine learning models can yield novel insights about the nature and defining characteristics of cyberbullying and how machine learning approaches can be applied to help clinicians, families, and communities reduce cyberbullying. Specifically, we discuss the potential for machine learning models to shed light on the repetitive nature of cyberbullying, the imbalance of power between cyberbullies and their victims, and causal mechanisms that give rise to cyberbullying. We orient our discussion on emerging and future research directions, as well as the practical implications of machine learning cyberbullying detection models. 
    more » « less
  3. Cyberbullying has become one of the most pressing online risks for adolescents and has raised serious concerns in society. Traditional efforts are primarily devoted to building a single generic classification model for all users to differentiate bullying behaviors from the normal content [6, 3, 1, 2, 4]. Despite its empirical success, these models treat users equally and inevitably ignore the idiosyncrasies of users. Recent studies from psychology and sociology suggest that the occurrence of cyberbullying has a strong connection with the personality of victims and bullies embedded in the user-generated content, and the peer influence from like-minded users. In this paper, we propose a personalized cyberbullying detection framework PI-Bully with peer influence in a collaborative environment to tailor the prediction for each individual. In particular, the personalized classifier of each individual consists of three components: a global model that captures the commonality shared by all users, a personalized model that expresses the idiosyncratic personality of each specific user, and a third component that encodes the peer influence received from like-minded users. Most of the existing methods adopt a two-stage approach: they first apply feature engineering to capture the cyberbullying patterns and then employ machine learning classifiers to detect cyberbullying behaviors. However, building a personalized cyberbullying detection framework that is customized to each individual remains a challenging task, in large part because: (1) Social media data is often sparse, noisy and high-dimensional (2) It is important to capture the commonality shared by all users as well as idiosyncratic aspects of the personality of each individual for automatic cyberbullying detection; (3) In reality, a potential victim of cyberbullying is often influenced by peers and the influences from different users could be quite diverse. Hence, it is imperative to develop a way to encode the diversity of peer influence for cyberbullying detection. To summarize, we study a novel problem of personalized cyberbullying detection with peer influence in a collaborative environment, which is able to jointly model users' common features, unique personalities and peer influence to identify cyberbullying cases. 
    more » « less
  4. null (Ed.)
    Cyberbullying is rapidly becoming one of the most serious online risks for adolescents. This has motivated work on machine learning methods to automate the process of cyberbullying detection, which have so far mostly viewed cyberbullying as one-off incidents that occur at a single point in time. Comparatively less is known about how cyberbullying behavior occurs and evolves over time. This oversight highlights a crucial open challenge for cyberbullying-related research, given that cyberbullying is typically defined as intentional acts of aggression via electronic communication that occur repeatedly and persistently . In this article, we center our discussion on the challenge of modeling temporal patterns of cyberbullying behavior. Specifically, we investigate how temporal information within a social media session, which has an inherently hierarchical structure (e.g., words form a comment and comments form a session), can be leveraged to facilitate cyberbullying detection. Recent findings from interdisciplinary research suggest that the temporal characteristics of bullying sessions differ from those of non-bullying sessions and that the temporal information from users’ comments can improve cyberbullying detection. The proposed framework consists of three distinctive features: (1) a hierarchical structure that reflects how a social media session is formed in a bottom-up manner; (2) attention mechanisms applied at the word- and comment-level to differentiate the contributions of words and comments to the representation of a social media session; and (3) the incorporation of temporal features in modeling cyberbullying behavior at the comment-level. Quantitative and qualitative evaluations are conducted on a real-world dataset collected from Instagram, the social networking site with the highest percentage of users reporting cyberbullying experiences. Results from empirical evaluations show the significance of the proposed methods, which are tailored to capture temporal patterns of cyberbullying detection. 
    more » « less
  5. Social media cyberbullying has a detrimental effect on human life. As online social networking grows daily, the amount of hate speech also increases. Such terrible content can cause depression and actions related to suicide. This paper proposes a trustable LSTM Autoencoder Network for cyberbullying detection on social media using synthetic data. We have demonstrated a cutting-edge method to address data availability difficulties by producing machine-translated data. However, several languages such as Hindi and Bangla still lack adequate investigations due to a lack of datasets. We carried out experimental identification of aggressive comments on Hindi, Bangla, and English datasets using the proposed model and traditional models, including Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), LSTM-Autoencoder, Word2vec, Bidirectional Encoder Representations from Transformers (BERT), and Generative Pre-trained Transformer 2 (GPT-2) models. We employed evaluation metrics such as f1-score, accuracy, precision, and recall to assess the models’ performance. Our proposed model outperformed all the models on all datasets, achieving the highest accuracy of 95%. Our model achieves state-of-the-art results among all the previous works on the dataset we used in this paper. 
    more » « less