skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Weakly Supervised Learning Framework for Detecting Social Anxiety and Depression
Although social anxiety and depression are common, they are often under diagnosed and under treated, in part due to difficulties identifying and accessing individuals in need of services. Current assessments rely on client self-report and clinician judgment, which are vulnerable to social desirability and other subjective biases. Identifying objective, non-burdensome markers of these mental health problems, such as features of speech, could help advance assessment, prevention, and treatment approaches. Prior research examining speech detection methods has focused on fully supervised learning approaches employing strongly labeled data. However, strong labeling of individuals high in symptoms or state affect in speech audio data is impractical, in part because it is not possible to identify with high confidence which regions of a long speech indicate the person’s symptoms or affective state. We propose a weakly supervised learning framework for detecting social anxiety and depression from long audio clips. Specifically, we present a novel feature modeling technique named NN2Vec that identies and exploits the inherent relationship between speakers’ vocal states and symptoms/affective states. Detecting speakers high in social anxiety or depression symptoms using NN2Vec features achieves F-1 scores 17% and 13% higher than those of the best available baselines. In addition, we present a new multiple instance learning adaptation of a BLSTM classifier, named BLSTM-MIL. Our novel framework of using NN2Vec features with the BLSTM-MIL classifier achieves F-1 scores of 90.1% and 85.44% in detecting speakers high in social anxiety and depression symptoms.  more » « less
Award ID(s):
1521722
PAR ID:
10059943
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
ACM Interactive, Mobile, Wearable, and Ubiquitous Technologies,
Volume:
2(2)
Issue:
Article 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Arai, K. (Ed.)
    Individuals spend a significant portion of their time on social media. It has become a platform for expression of feelings, sharing of ideas and connecting with other individuals using video and audio posts, textual data such as comments and descriptions and so on. Social media has a considerable impact on people’s daily life. In recent time, there is an enormous growth in number of people using Twitter and Instagram to share their emotions and sentiments which represents their actual feelings. In this work, we apply Machine Learning techniques on social media data to perform a comprehensive investigation to detect the risk of depression in people. Our work can help to detect various symptoms such sadness, loneliness, detachment etc. providing an insight for forensic analysts and law enforcement agencies about the person’s mental state. The experimental results show that Extra Tree Classifier performs significantly better over the other models in detecting the sentiment of people using social media data. 
    more » « less
  2. Automatic assessment of depression from speech signals is affected by variabilities in acoustic content and speakers. In this study, we focused on addressing these variabilities. We used a database comprised of recordings of interviews from a large number of female speakers: 735 individuals suffering from depressive (dysthymia and major depression) and anxiety disorders (generalized anxiety disorder, panic disorder with or without agoraphobia) and 953 healthy individuals. Leveraging this unique and extensive database, we built an i-vector framework. In order to capture various aspects of speech signals, we used voice quality features in addition to conventional cepstral features. The features (F0, F1, F2, F3, H1-H2, H2-H4, H4-H2k, A1, A2, A3, and CPP) were inspired by a psychoacoustic model of voice quality [1]. An i-vector-based system using Mel Frequency Cepstral Coefficients (MFCCs) and another using voice quality features was developed. Voice quality features performed as well as MFCCs. A score-level fusion was then used to combine these two systems, resulting in a 6% relative improvement in accuracy in comparison with the i-vector system based on MFCCs alone. The system was robust even when the duration of the utterances was shortened to 10 seconds. 
    more » « less
  3. Depression is a very common mental health disorder with a devastating social and economic impact. It can be costly and difficult to detect, traditionally requiring a significant number of hours by a trained mental health professional. Recently, machine learning and deep learning models have been trained for depression screening using modalities extracted from videos of clinical interviews conducted by a virtual agent. This complex task is challenging for deep learning models because of the multiple modalities and limited number of participants in the dataset. To address these challenges we propose AudiFace, a multimodal deep learning model that inputs temporal facial features, audio, and transcripts to screen for depression. To incorporate all three modalities, AudiFace combines multiple pre-trained transfer learning models and bidirectional LSTM with self-Attention. When compared with the state-of-the-art models, AudiFace achieves the highest F1 scores for thirteen of the fifteen different datasets. AudiFace notably improves the depression screening capabilities of general wellbeing questions. Eye gaze proved to be the most valuable of the temporal facial features, both in the unimodal and multimodal models. Our results can be used to determine the best combination of modalities, temporal facial features, as well as clinical interview questions for future depression screening applications. 
    more » « less
  4. Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio transcriptions, which is expensive and time-consuming to obtain. We propose a taskagnostic unsupervised way of incorporating semantic information from LLMs into selfsupervised speech encoders without labeled audio transcriptions. By introducing semantics, we improve existing speech encoder spoken language understanding (SLU) performance by over 5% on intent classification (IC), with modest gains in named entity resolution (NER) and slot filling (SF), and spoken question answering (SQA) FF1 score by over 2%. Our approach, which uses no ASR data, achieves similar performance as methods trained on over 100 hours of labeled audio transcripts, demonstrating the feasibility of unsupervised semantic augmentations to existing speech encoders. 
    more » « less
  5. Abstract Emotion regulation is a powerful predictor of youth mental health and a crucial ingredient of interventions. A growing body of evidence indicates that the beliefs individuals hold about the extent to which emotions are controllable (emotion controllability beliefs) influence both the degree and the ways in which they regulate emotions. A systematic review was conducted that investigated the associations between emotion controllability beliefs and youth anxiety and depression symptoms. The search identified 21 peer-reviewed publications that met the inclusion criteria. Believing that emotions are relatively controllable was associated with fewer anxiety and depression symptoms, in part because these beliefs were associated with more frequent use of adaptive emotion regulation strategies. These findings support theoretical models linking emotion controllability beliefs with anxiety and depression symptoms via emotion regulation strategies that target emotional experience, like reappraisal. Taken together, the review findings demonstrate that emotion controllability beliefs matter for youth mental health. Understanding emotion controllability beliefs is of prime importance for basic science and practice, as it will advance understanding of mental health and provide additional targets for managing symptoms of anxiety and depression in young people. 
    more » « less