skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Objectively Quantifying Pediatric Psychiatric Severity Using Artificial Intelligence, Voice Recognition Technology, and Universal Emotions: Pilot Study for Artificial Intelligence-Enabled Innovation to Address Youth Mental Health Crisis
Psychotherapy, particularly for youth, is a pressing challenge in the health care system. Traditional methods are resource-intensive, and there is a need for objective benchmarks to guide therapeutic interventions. Automated emotion detection from speech, using artificial intelligence, presents an emerging approach to address these challenges. Speech can carry vital information about emotional states, which can be used to improve mental health care services, especially when the person is suffering. ObjectiveThis study aims to develop and evaluate automated methods for detecting the intensity of emotions (anger, fear, sadness, and happiness) in audio recordings of patients’ speech. We also demonstrate the viability of deploying the models. Our model was validated in a previous publication by Alemu et al with limited voice samples. This follow-up study used significantly more voice samples to validate the previous model. MethodsWe used audio recordings of patients, specifically children with high adverse childhood experience (ACE) scores; the average ACE score was 5 or higher, at the highest risk for chronic disease and social or emotional problems; only 1 in 6 have a score of 4 or above. The patients’ structured voice sample was collected by reading a fixed script. In total, 4 highly trained therapists classified audio segments based on a scoring process of 4 emotions and their intensity levels for each of the 4 different emotions. We experimented with various preprocessing methods, including denoising, voice-activity detection, and diarization. Additionally, we explored various model architectures, including convolutional neural networks (CNNs) and transformers. We trained emotion-specific transformer-based models and a generalized CNN-based model to predict emotion intensities. ResultsThe emotion-specific transformer-based model achieved a test-set precision and recall of 86% and 79%, respectively, for binary emotional intensity classification (high or low). In contrast, the CNN-based model, generalized to predict the intensity of 4 different emotions, achieved test-set precision and recall of 83% for each. ConclusionsAutomated emotion detection from patients’ speech using artificial intelligence models is found to be feasible, leading to a high level of accuracy. The transformer-based model exhibited better performance in emotion-specific detection, while the CNN-based model showed promise in generalized emotion detection. These models can serve as valuable decision-support tools for pediatricians and mental health providers to triage youth to appropriate levels of mental health care services.  more » « less
Award ID(s):
2126811
PAR ID:
10554915
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Corporate Creator(s):
Editor(s):
JMIR
Publisher / Repository:
JMIR Research Protocols
Date Published:
Journal Name:
JMIR Research Protocols
Edition / Version:
1
Volume:
12
Issue:
1
ISSN:
1929-0748
Page Range / eLocation ID:
1-11
Subject(s) / Keyword(s):
pediatric trauma voice AI machine learning mental health predictive modeling artificial intelligence social determinants of health speech-recognition adverse childhood experiences trauma and emotional distress voice marker speech biomarker pediatrics at-risk youth
Format(s):
Medium: X Size: 2.5MB Other: PDF; XML
Size(s):
2.5MB
Sponsoring Org:
National Science Foundation
More Like this
  1. Background Even before the onset of the COVID-19 pandemic, children and adolescents were experiencing a mental health crisis, partly due to a lack of quality mental health services. The rate of suicide for Black youth has increased by 80%. By 2025, the health care system will be short of 225,000 therapists, further exacerbating the current crisis. Therefore, it is of utmost importance for providers, schools, youth mental health, and pediatric medical providers to integrate innovation in digital mental health to identify problems proactively and rapidly for effective collaboration with other health care providers. Such approaches can help identify robust, reproducible, and generalizable predictors and digital biomarkers of treatment response in psychiatry. Among the multitude of digital innovations to identify a biomarker for psychiatric diseases currently, as part of the macrolevel digital health transformation, speech stands out as an attractive candidate with features such as affordability, noninvasive, and nonintrusive. Objective The protocol aims to develop speech-emotion recognition algorithms leveraging artificial intelligence/machine learning, which can establish a link between trauma, stress, and voice types, including disrupting speech-based characteristics, and detect clinically relevant emotional distress and functional impairments in children and adolescents. Methods Informed by theoretical foundations (the Theory of Psychological Trauma Biomarkers and Archetypal Voice Categories), we developed our methodology to focus on 5 emotions: anger, happiness, fear, neutral, and sadness. Participants will be recruited from 2 local mental health centers that serve urban youths. Speech samples, along with responses to the Symptom and Functioning Severity Scale, Patient Health Questionnaire 9, and Adverse Childhood Experiences scales, will be collected using an Android mobile app. Our model development pipeline is informed by Gaussian mixture model (GMM), recurrent neural network, and long short-term memory. Results We tested our model with a public data set. The GMM with 128 clusters showed an evenly distributed accuracy across all 5 emotions. Using utterance-level features, GMM achieved an accuracy of 79.15% overall, while frame selection increased accuracy to 85.35%. This demonstrates that GMM is a robust model for emotion classification of all 5 emotions and that emotion frame selection enhances accuracy, which is significant for scientific evaluation. Recruitment and data collection for the study were initiated in August 2021 and are currently underway. The study results are likely to be available and published in 2024. Conclusions This study contributes to the literature as it addresses the need for speech-focused digital health tools to detect clinically relevant emotional distress and functional impairments in children and adolescents. The preliminary results show that our algorithm has the potential to improve outcomes. The findings will contribute to the broader digital health transformation. International Registered Report Identifier (IRRID) DERR1-10.2196/46970 
    more » « less
  2. Abstract Emotion regulation is a powerful predictor of youth mental health and a crucial ingredient of interventions. A growing body of evidence indicates that the beliefs individuals hold about the extent to which emotions are controllable (emotion controllability beliefs) influence both the degree and the ways in which they regulate emotions. A systematic review was conducted that investigated the associations between emotion controllability beliefs and youth anxiety and depression symptoms. The search identified 21 peer-reviewed publications that met the inclusion criteria. Believing that emotions are relatively controllable was associated with fewer anxiety and depression symptoms, in part because these beliefs were associated with more frequent use of adaptive emotion regulation strategies. These findings support theoretical models linking emotion controllability beliefs with anxiety and depression symptoms via emotion regulation strategies that target emotional experience, like reappraisal. Taken together, the review findings demonstrate that emotion controllability beliefs matter for youth mental health. Understanding emotion controllability beliefs is of prime importance for basic science and practice, as it will advance understanding of mental health and provide additional targets for managing symptoms of anxiety and depression in young people. 
    more » « less
  3. The emerging field of smart healthcare has identified emotion detection as a key component in improving patient care, diagnostics, and therapeutic interventions. This paper introduces an innovative approach to emotion detection within the healthcare domain by integrating a Convolutional Neural Network (CNN) with a Maximum A Posterior (MAP) estimator prepared for Magnitude-Squared Spectrum (MSS) analysis. The effectiveness of CNN’s advanced feature extraction capabilities with the statistical strength of MAP estimation offers a promising avenue for interpreting complex physiological signals. The proposed methodology aims to accurately discern and quantify emotional states, thus contributing to the personalization and effectiveness of healthcare services. To validate the efficacy of this approach, the work conducted extensive experiments on a diverse data set composed of physiological signals, demonstrating that the proposed model outperforms existing limitations in emotion recognition tasks. The integration of MSS into CNN frameworks, added with MAP estimation, provides a significant improvement in the detection and analysis of emotions, resulting in more responsive and intelligent healthcare systems. This proposed paper not only presents a novel methodological contribution, but also demonstrates the groundwork for future research toward the intersection of emotional intelligence and healthcare technology. 
    more » « less
  4. na (Ed.)
    The problem of predicting emotional attributes from speech has often focused on predicting a single value from a sentence or short speaking turn. These methods often ignore that natural emotions are both dynamic and dependent on context. To model the dynamic nature of emotions, we can treat the prediction of emotion from speech as a time-series problem. We refer to the problem of predicting these emotional traces as dynamic speech emotion recognition. Previous studies in this area have used models that treat all emotional traces as coming from the same underlying distribution. Since emotions are dependent on contextual information, these methods might obscure the context of an emotional interaction. This paper uses a neural process model with a segment-level speech emotion recognition (SER) model for this problem. This type of model leverages information from the time-series and predictions from the SER model to learn a prior that defines a distribution over emotional traces. Our proposed model performs 21% better than a bidirectional long short-term memory (BiLSTM) baseline when predicting emotional traces for valence. 
    more » « less
  5. Patent applications provide insight into how inventors imagine and legitimize uses of their imagined technologies; as part of this imagining they envision social worlds and produce sociotechnical imaginaries. Examining sociotechnical imaginaries is important for emerging technologies in high-stakes contexts such as the case of emotion AI to address mental health care. We analyzed emotion AI patent applications (N=58) filed in the U.S. concerned with monitoring and detecting emotions and/or mental health. We examined the described technologies' imagined uses and the problems they were positioned to address. We found that inventors justified emotion AI inventions as solutions to issues surrounding data accuracy, care provision and experience, patient-provider communication, emotion regulation, and preventing harms attributed to mental health causes. We then applied an ethical speculation lens to anticipate the potential implications of the promissory emotion AI-enabled futures described in patent applications. We argue that such a future is one filled with mental health conditions' (or 'non-expected' emotions') stigmatization, equating mental health with propensity for crime, and lack of data subjects' agency. By framing individuals with mental health conditions as unpredictable and not capable of exercising their own agency, emotion AI mental health patent applications propose solutions that intervene in this imagined future: intensive surveillance, an emphasis on individual responsibility over structural barriers, and decontextualized behavioral change interventions. Using ethical speculation, we articulate the consequences of these discourses, raising questions about the role of emotion AI as positive, inherent, or inevitable in health and care-related contexts. We discuss our findings' implications for patent review processes, and advocate for policy makers, researchers and technologists to refer to patent (applications) to access, evaluate and (re)consider potentially harmful sociotechnical imaginaries before they become our reality. 
    more » « less