skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Psychophysiological Arousal in Young Children Who Stutter: An Interpretable AI Approach
The presented first-of-its-kind study effectively identifies and visualizes the second-by-second pattern differences in the physiological arousal of preschool-age children who do stutter (CWS) and who do not stutter (CWNS) while speaking perceptually fluently in two challenging conditions: speaking in stressful situations and narration. The first condition may affect children's speech due to high arousal; the latter introduces linguistic, cognitive, and communicative demands on speakers. We collected physiological parameters data from 70 children in the two target conditions. First, we adopt a novel modality-wise multiple-instance-learning (MI-MIL) approach to classify CWS vs. CWNS in different conditions effectively. The evaluation of this classifier addresses four critical research questions that align with state-of-the-art speech science studies' interests. Later, we leverage SHAP classifier interpretations to visualize the salient, fine-grain, and temporal physiological parameters unique to CWS at the population/group-level and personalized-level. While group-level identification of distinct patterns would enhance our understanding of stuttering etiology and development, the personalized-level identification would enable remote, continuous, and real-time assessment of stuttering children's physiological arousal, which may lead to personalized, just-in-time interventions, resulting in an improvement in speech fluency. The presented MI-MIL approach is novel, generalizable to different domains, and real-time executable. Finally, comprehensive evaluations are done on multiple datasets, presented framework, and several baselines that identified notable insights on CWSs' physiological arousal during speech production.  more » « less
Award ID(s):
2124285
PAR ID:
10606587
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Association for Computing Machinery (ACM)
Date Published:
Journal Name:
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Volume:
6
Issue:
3
ISSN:
2474-9567
Format(s):
Medium: X Size: p. 1-32
Size(s):
p. 1-32
Sponsoring Org:
National Science Foundation
More Like this
  1. Purpose: Stuttering-like disfluencies (SLDs) and typical disfluencies (TDs) are both more likely to occur as utterance length increases. However, longer and shorter utterances differ by more than the number of morphemes: They may also serve different communicative functions or describe different ideas. Decontextualized language, or language that describes events and concepts outside of the “here and now,” is associated with longer utterances. Prior work has shown that language samples taken in decontextualized contexts contain more disfluencies, but averaging across an entire language sample creates a confound between utterance length and decontextualization as contributors to stuttering. We coded individual utterances from naturalistic play samples to test the hypothesis that decontextualized language leads to increased disfluencies above and beyond the effects of utterance length. Method: We used archival transcripts of language samples from 15 preschool children who stutter (CWS) and 15 age- and sex-matched children who do not stutter (CWNS). Utterances were coded as either contextualized or decontextualized, and we used mixed-effects logistic regression to investigate the impact of utterance length and decontextualization on SLDs and TDs. Results: CWS were more likely to stutter when producing decontextualized utterances, even when controlling for utterance length. An interaction between decontextualization and utterance length indicated that the effect of decontextualization was greatest for shorter utterances. TDs increased in decontextualized utterances when controlling for utterance length for both CWS and CWNS. The effect of decontextualization on TDs did not differ statistically between the two groups. Conclusions: The increased working memory demands associated with decontextualized language contribute to increased language planning effort. This leads to increased TD in CWS and CWNS. Under a multifactorial dynamic model of stuttering, the increased language demands may also contribute to increased stuttering in CWS due to instabilities in their speech motor systems. 
    more » « less
  2. Clinical diagnosis of stuttering requires an assessment by a licensed speech-language pathologist. However, this process is time-consuming and requires clinicians with training and experience in stuttering and fluency disorders. Unfortunately, only a small percentage of speech-language pathologists report being comfortable working with individuals who stutter, which is inadequate to accommodate for the 80 million individuals who stutter worldwide. Developing machine learning models for detecting stuttered speech would enable universal and automated screening for stuttering, enabling speech pathologists to identify and follow up with patients who are most likely to be diagnosed with a stuttering speech disorder. Previous research in this area has predominantly focused on utterance-level detection, which is not sufficient for clinical settings where word-level annotation of stuttering is the norm. In this study, we curated a stuttered speech dataset with word-level annotations and introduced a word-level stuttering speech detection model leveraging self-supervised speech models. Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection. Additionally, we conducted an extensive ablation analysis of our method, providing insight into the most important aspects of adapting self-supervised speech models for stuttered speech detection. 
    more » « less
  3. The lack of authentic stuttered speech data has significantly limited the development of stuttering friendly automatic speech recognition (ASR) models. In previous work, we collaborated with StammerTalk, a grassroots community of Chinese-speaking people who stutter (PWS), to collect the first stuttered speech dataset in Mandarin Chinese, containing 50 hours of conversational and command-recitation speech from 72 PWS. This work examines both the technical and social dimensions of the dataset. Through quantitative and qualitative analysis, as well as benchmarking and fine-tuning ASR models using the dataset, we demonstrate its technical value in capturing stuttered speech at an unprecedented scale and diversity – enabling better understanding and mitigation of fluency bias in ASR – and its social value in promoting self-advocacy and structural change for PWS in China. By foregrounding lived experiences of PWS in their own voices, we also see the potential of this dataset to normalize speech disfluencies and cultivate deeper empathy for stuttering within the AI research community. 
    more » « less
  4. Autistic and neurotypical children do not handle audiovisual speech in the same manner. Current evidence suggests that this difference occurs at the level of cue combination. Here, we test whether differences in autistic and neurotypical audiovisual speech perception can be explained by a neural theory of sensory perception in autism, which proposes that heightened levels of neural excitation can account for sensory differences in autism. Through a linking hypothesis that integrates a standard probabilistic cognitive model of cue integration with representations of neural activity, we derive a model that can simulate audio-visual speech perception at a neural population level. Simulations of an audiovisual lexical identification task demonstrate that heightened levels of neural excitation at the level of cue combination cannot account for the observed differences in autistic and neurotypical children's audiovisual speech perception. 
    more » « less
  5. This paper describes an original dataset of children's speech, collected through the use of JIBO, a social robot. The dataset encompasses recordings from 110 children, aged 4–7 years old, who participated in a letter and digit identification task and extended oral discourse tasks requiring explanation skills, totaling 21 h of session data. Spanning a 2-year collection period, this dataset contains a longitudinal component with a subset of participants returning for repeat recordings. The dataset, with session recordings and transcriptions, is publicly available, providing researchers with a valuable resource to advance investigations into child language development. 
    more » « less