Analyzing gender is critical to study mental health (MH) support in CVD (cardiovascular disease). The existing studies on using social media for extracting MH symptoms consider symptom detection and tend to ignore user context, disease, or gender. The current study aims to design and evaluate a system to capture how MH symptoms associated with CVD are expressed differently with the gender on social media. We observe that the reliable detection of MH symptoms expressed by persons with heart disease in user posts is challenging because of the co-existence of (dis)similar MH symptoms in one post and due to variation in the description of symptoms based on gender. We collect a corpus of 150k items (posts and comments) annotated using the subreddit labels and transfer learning approaches. We propose GeM, a novel task-adaptive multi-task learning approach to identify the MH symptoms in CVD patients based on gender. Specifically, we adopt a knowledge-assisted RoBERTa based bi-encoder model to capture CVD-related MH symptoms. Moreover, it enhances the reliability for differentiating the gender language in MH symptoms when compared to the state-of-art language models. Our model achieves high (statistically significant) performance and predicts four labels of MH issues and two gender labels, which outperforms RoBERTa, improving the recall by 2.14% on the symptom identification task and by 2.55% on the gender identification task.
more »
« less
A Computational Approach to Understand Mental Health from Reddit: Knowledge-Aware Multitask Learning Framework.
Analyzing gender is critical to study mental health (MH) support in CVD (cardiovascular disease). The existing studies on using social media for extracting MH symptoms consider symptom detection and tend to ignore user context, disease, or gender. The current study aims to design and evaluate a system to capture how MH symptoms associated with CVD are expressed differently with the gender on social media. We observe that the reliable detection of MH symptoms expressed by persons with heart disease in user posts is challenging because of the co-existence of (dis)similar MH symptoms in one post and due to variation in the description of symptoms based on gender. We collect a corpus of 150k items (both posts and comments) annotated using the subreddit labels and transfer learning approaches. We propose GeM, a novel task-adaptive multi-task learning approach to identify the MH symptoms in CVD patients based on gender. Specifically, we adapt a knowledge-assisted RoBERTa based bi-encoder model to capture CVD-related MH symptoms. Moreover, it enhances the reliability for differentiating the gender language in MH symptoms when compared to the state-of-art language models. Our model achieves high (statistically significant) performance and predicts four labels of MH issues and two gender labels, which outperforms RoBERTa, improving the recall by 2.14% on the symptom identification task and by 2.55% on the gender identification task.
more »
« less
- PAR ID:
- 10339119
- Date Published:
- Journal Name:
- Sixteenth International AAAI Conference onWeb and Social Media (ICWSM 2022)
- Page Range / eLocation ID:
- 640-650
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Artificial Intelligence (AI) systems for mental healthcare (MHCare) have been ever-growing after realizing the importance of early interventions for patients with chronic mental health (MH) conditions. Social media (SocMedia) emerged as the go-to platform for supporting patients seeking MHCare. The creation of peer-support groups without social stigma has resulted in patients transitioning from clinical settings to SocMedia supported interactions for quick help. Researchers started exploring SocMedia content in search of cues that showcase correlation or causation between different MH conditions to design better interventional strategies. User-level Classification-based AI systems were designed to leverage diverse SocMedia data from various MH conditions, to predict MH conditions. Subsequently, researchers created classification schemes to measure the severity of each MH condition. Such ad-hoc schemes, engineered features, and models not only require a large amount of data but fail to allow clinically acceptable and explainable reasoning over the outcomes. To improve Neural-AI for MHCare, infusion of clinical symbolic knowledge that clinicans use in decision making is required. An impactful use case of Neural-AI systems in MH is conversational systems. These systems require coordination between classification and generation to facilitate humanistic conversation in conversational agents (CA). Current CAs with deep language models lack factual correctness, medical relevance, and safety in their generations, which intertwine with unexplainable statistical classification techniques. This lecture-style tutorial will demonstrate our investigations into Neuro-symbolic methods of infusing clinical knowledge to improve the outcomes of Neural-AI systems to improve interventions for MHCare:(a) We will discuss the use of diverse clinical knowledge in creating specialized datasets to train Neural-AI systems effectively. (b) Patients with cardiovascular disease express MH symptoms differently based on gender differences. We will show that knowledge-infused Neural-AI systems can identify gender-specific MH symptoms in such patients. (c) We will describe strategies for infusing clinical process knowledge as heuristics and constraints to improve language models in generating relevant questions and responses.more » « less
-
Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-theshelf tools are usually trained on formal text and cannot explicitly handle noise found in short online posts. Moreover, the variety of frequently occurring linguistic variations presents several challenges, even for humans who might not be able to comprehend the meaning of such posts, especially when they contain slang and abbreviations. Text Normalization aims to transform online user-generated text to a canonical form. Current text normalization systems rely on string or phonetic similarity and classification models that work on a local fashion. We argue that processing contextual information is crucial for this task and introduce a social media text normalization hybrid word-character attention-based encoder-decoder model that can serve as a pre-processing step for NLP applications to adapt to noisy text in social media. Our character-based component is trained on synthetic adversarial examples that are designed to capture errors commonly found in online user-generated text. Experiments show that our model surpasses neural architectures designed for text normalization and achieves comparable performance with state-of-the-art related work.more » « less
-
The dramatic surge of health misinformation on social media platforms poses a significant threat to public health, contributing to hesitancy in vaccines, delayed medical interventions, and the adoption of untested or harmful treatments. We present a novel, hybrid AI-driven framework designed for the real-time detection of health misinformation on social media platforms while prioritizing user privacy. The framework integrates the strengths of Large Language Models (LLMs), such as DistilBERT, with domain-specific Knowledge Graphs (KGs) to enhance the detection of nuanced and contextually dependent misinformation. LLMs excel at understanding the complexities of human language, while KGs provide a structured representation of medical knowledge, allowing factual verification and identification of inconsistencies. Furthermore, the framework incorporates robust privacy-preserving mechanisms, including differential privacy and secure data pipelines, to address user privacy concerns and comply with healthcare data protection regulations. Our experimental results on a dataset of Reddit posts related to chronic health conditions demonstrate the performance of this hybrid approach compared to models that only use text or KG, highlighting the synergistic effect of combining LLMs and KGs for improved misinformation detection.more » « less
-
Over the last decade, research has revealed the high prevalence of cyberbullying among youth and raised serious concerns in society. Information on the social media platforms where cyberbullying is most prevalent (e.g., Instagram, Facebook, Twitter) is inherently multi-modal, yet most existing work on cyberbullying identification has focused solely on building generic classification models that rely exclusively on text analysis of online social media sessions (e.g., posts). Despite their empirical success, these efforts ignore the multi-modal information manifested in social media data (e.g., image, video, user profile, time, and location), and thus fail to offer a comprehensive understanding of cyberbullying. Conventionally, when information from different modalities is presented together, it often reveals complementary insights about the application domain and facilitates better learning performance. In this paper, we study the novel problem of cyberbullying detection within a multi-modal context by exploiting social media data in a collaborative way. This task, however, is challenging due to the complex combination of both cross-modal correlations among various modalities and structural dependencies between different social media sessions, and the diverse attribute information of different modalities. To address these challenges, we propose XBully, a novel cyberbullying detection framework, that first reformulates multi-modal social media data as a heterogeneous network and then aims to learn node embedding representations upon it. Extensive experimental evaluations on real-world multi-modal social media datasets show that the XBully framework is superior to the state-of-the-art cyberbullying detection models.more » « less
An official website of the United States government

