skip to main content


Title: Anti-Asian Hate Speech Detection via Data Augmented Semantic Relation Inference
With the spreading of hate speech on social media in recent years, automatic detection of hate speech is becoming a crucial task and has attracted attention from various communities. This task aims to recognize online posts (e.g., tweets) that contain hateful information. The peculiarities of languages in social media, such as short and poorly written content, lead to the difficulty of learning semantics and capturing discriminative features of hate speech. Previous studies have utilized additional useful resources, such as sentiment hashtags, to improve the performance of hate speech detection. Hashtags are added as input features serving either as sentiment-lexicons or extra context information. However, our close investigation shows that directly leveraging these features without considering their context may introduce noise to classifiers. In this paper, we propose a novel approach to leverage sentiment hashtags to enhance hate speech detection in a natural language inference framework. We design a novel framework SRIC that simultaneously performs two tasks: (1) semantic relation inference between online posts and sentiment hashtags, and (2) sentiment classification on these posts. The semantic relation inference aims to encourage the model to encode sentiment-indicative information into representations of online posts. We conduct extensive experiments on two real-world datasets and demonstrate the effectiveness of our proposed framework compared with state-of-the-art representation learning models.  more » « less
Award ID(s):
2047843 1948432
NSF-PAR ID:
10407012
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the International AAAI Conference on Web and Social Media
Volume:
16
ISSN:
2162-3449
Page Range / eLocation ID:
607 to 617
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Ideological divisions in the United States have become increasingly prominent in daily communication. Accordingly, there has been much research on political polarization, including many recent efforts that take a computational perspective. By detecting political biases in a text document, one can attempt to discern and describe its polarity. Intuitively, the named entities (i.e., the nouns and the phrases that act as nouns) and hashtags in text often carry information about political views. For example, people who use the term “pro-choice” are likely to be liberal and people who use the term “pro-life” are likely to be conservative. In this paper, we seek to reveal political polarities in social-media text data and to quantify these polarities by explicitly assigning a polarity score to entities and hashtags. Although this idea is straightforward, it is difficult to perform such inference in a trustworthy quantitative way. Key challenges include the small number of known labels, the continuous spectrum of political views, and the preservation of both a polarity score and a polarity-neutral semantic meaning in an embedding vector of words. To attempt to overcome these challenges, we propose thePolarity-awareEmbeddingMulti-task learning (PEM) model. This model consists of (1) a self-supervised context-preservation task, (2) an attention-based tweet-level polarity-inference task, and (3) an adversarial learning task that promotes independence between an embedding’s polarity component and its semantic component. Our experimental results demonstrate that ourPEMmodel can successfully learn polarity-aware embeddings that perform well at tweet-level and account-level classification tasks. We examine a variety of applications—including a study of spatial and temporal distributions of polarities and a comparison between tweets from Twitter and posts from Parler—and we thereby demonstrate the effectiveness of ourPEMmodel. We also discuss important limitations of our work and encourage caution when applying thePEMmodel to real-world scenarios.

     
    more » « less
  2. Over the last decade, research has revealed the high prevalence of cyberbullying among youth and raised serious concerns in society. Information on the social media platforms where cyberbullying is most prevalent (e.g., Instagram, Facebook, Twitter) is inherently multi-modal, yet most existing work on cyberbullying identification has focused solely on building generic classification models that rely exclusively on text analysis of online social media sessions (e.g., posts). Despite their empirical success, these efforts ignore the multi-modal information manifested in social media data (e.g., image, video, user profile, time, and location), and thus fail to offer a comprehensive understanding of cyberbullying. Conventionally, when information from different modalities is presented together, it often reveals complementary insights about the application domain and facilitates better learning performance. In this paper, we study the novel problem of cyberbullying detection within a multi-modal context by exploiting social media data in a collaborative way. This task, however, is challenging due to the complex combination of both cross-modal correlations among various modalities and structural dependencies between different social media sessions, and the diverse attribute information of different modalities. To address these challenges, we propose XBully, a novel cyberbullying detection framework, that first reformulates multi-modal social media data as a heterogeneous network and then aims to learn node embedding representations upon it. Extensive experimental evaluations on real-world multi-modal social media datasets show that the XBully framework is superior to the state-of-the-art cyberbullying detection models. 
    more » « less
  3. Analyzing gender is critical to study mental health (MH) support in CVD (cardiovascular disease). The existing studies on using social media for extracting MH symptoms consider symptom detection and tend to ignore user context, disease, or gender. The current study aims to design and evaluate a system to capture how MH symptoms associated with CVD are expressed differently with the gender on social media. We observe that the reliable detection of MH symptoms expressed by persons with heart disease in user posts is challenging because of the co-existence of (dis)similar MH symptoms in one post and due to variation in the description of symptoms based on gender. We collect a corpus of 150k items (posts and comments) annotated using the subreddit labels and transfer learning approaches. We propose GeM, a novel task-adaptive multi-task learning approach to identify the MH symptoms in CVD patients based on gender. Specifically, we adopt a knowledge-assisted RoBERTa based bi-encoder model to capture CVD-related MH symptoms. Moreover, it enhances the reliability for differentiating the gender language in MH symptoms when compared to the state-of-art language models. Our model achieves high (statistically significant) performance and predicts four labels of MH issues and two gender labels, which outperforms RoBERTa, improving the recall by 2.14% on the symptom identification task and by 2.55% on the gender identification task. 
    more » « less
  4. Analyzing gender is critical to study mental health (MH) support in CVD (cardiovascular disease). The existing studies on using social media for extracting MH symptoms consider symptom detection and tend to ignore user context, disease, or gender. The current study aims to design and evaluate a system to capture how MH symptoms associated with CVD are expressed differently with the gender on social media. We observe that the reliable detection of MH symptoms expressed by persons with heart disease in user posts is challenging because of the co-existence of (dis)similar MH symptoms in one post and due to variation in the description of symptoms based on gender. We collect a corpus of 150k items (both posts and comments) annotated using the subreddit labels and transfer learning approaches. We propose GeM, a novel task-adaptive multi-task learning approach to identify the MH symptoms in CVD patients based on gender. Specifically, we adapt a knowledge-assisted RoBERTa based bi-encoder model to capture CVD-related MH symptoms. Moreover, it enhances the reliability for differentiating the gender language in MH symptoms when compared to the state-of-art language models. Our model achieves high (statistically significant) performance and predicts four labels of MH issues and two gender labels, which outperforms RoBERTa, improving the recall by 2.14% on the symptom identification task and by 2.55% on the gender identification task. 
    more » « less
  5. Prejudice and hate directed toward Asian individuals has increased in prevalence and salience during the COVID-19 pandemic, with notable rises in physical violence. Concurrently, as many governments enacted stay-at-home mandates, the spread of anti-Asian content increased in online spaces, including social media. In the present study, we investigated temporal and geographical patterns in social media content relevant to anti-Asian prejudice during the COVID-19 pandemic. Using the Twitter Data Collection API, we queried over 13 million tweets posted between January 30, 2020, and April 30, 2021, for both negative (e.g., #kungflu) and positive (e.g., #stopAAPIhate) hashtags and keywords related to anti-Asian prejudice. In a series of descriptive analyses, we found differences in the frequency of negative and positive keywords based on geographic location. Using burst detection, we also identified distinct increases in negative and positive content in relation to key political tweets and events. These largely exploratory analyses shed light on the role of social media in the expression and proliferation of prejudice as well as positive responses online. 
    more » « less