skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

Title: The effect of a social robot mediator on speech characteristics of children with autism spectrum disorder
Robot-mediated interventions have been investigated for the treatment of social skill deficits amongst children with Autism Spectrum Disorder (ASD). Does the use of a Nao robot as a mediator increase vocal interaction between children with ASD? The present study examined the vocalization and turn-taking rate in six children with ASD (mean age = 11.4 years, SD = 0.86 years) interacting with and without a Nao robot for 10 sessions, order counterbalanced. Each session lasted nine minutes. In the Robot condition, the robot provided vocal prompts; in the No Robot condition, children interacted freely. Child vocalization and turn-taking rate defined as the number of utterances/turns per second were measured. Results demonstrated that three children produced higher vocalization and turn-taking rates when a robot was present, and two when it was absent. One participant produced higher vocalization rates when the robot was not present, but more conversational turns when the robot was present. The findings suggest that the use of a Nao robot as a social mediator increases vocalization and turn-taking rates among children with ASD, but large individual variability is observed. The effect of the robot as a mediator on lexical diversity of child speech will also be investigated.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
The Journal of the Acoustical Society of America
Page Range / eLocation ID:
A139 to A139
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deep breaths are one of three breathing patterns in rodents characterized by an increased tidal volume. While humans incorporate deep breaths into vocal behavior, it was unknown whether nonhuman mammals use deep breaths for vocal production. We have utilized subglottal pressure recordings in awake, spontaneously behaving male Sprague-Dawley rats in five contexts: sleep, rest, noxious stimulation, exposure to a female in estrus, and exposure to an unknown male. Deep breaths were produced at rates ranging between 17.5 and 90.3 deep breaths per hour. While overall breathing and vocal rates were higher in social and noxious contexts, the rate of deep breaths was only increased during the male’s interaction with a female. Results also inform our understanding of vocal-respiratory integration in rats. The rate of deep breaths that were associated with a vocalization during the exhalation phase increased with vocal activity. The proportion of deep breaths that were associated with a vocalization (on average 22%) was similar to the proportion of sniffing or eupnea breaths that contain a vocalization. Therefore, vocal motor patterns appear to be entrained to the prevailing breathing rhythm, i.e., vocalization uses the available breathing pattern rather than recruiting a specific breathing pattern. Furthermore, the pattern of a deep breath was different when it was associated with a vocalization, suggesting that motor planning occurs. Finally, deep breaths are a source for acoustic variation; for example, call duration and fundamental frequency modulation were both larger in 22-kHz calls produced following a deep inhalation. NEW & NOTEWORTHY The emission of a long, deep, audible breath can express various emotions. The investigation of deep breaths, also known as sighing, in a nonhuman mammal demonstrated the occasional use of deep breaths for vocal production. Similar to the human equivalent, acoustic features of a deep breath vocalization are characteristic. 
    more » « less
  2. Abstract

    Turn‐taking interactions are foundational to the development of social, communicative, and cognitive skills. In infants, vocal turn‐taking experience is predictive of infants' socioemotional and language development. However, different forms of turn‐taking interactions may have different effects on infant vocalizing. It is presently unknown how caregiver vocal, non‐vocal and multimodal responses to infant vocalizations compare in extending caregiver‐infant vocal turn‐taking bouts. In bouts that begin with an infant vocalization, responses that maintain versus change the communicative modality may differentially affect the likelihood of further infant vocalizing. No studies have examined how caregiver response modalities that either matched or differed from the infant acoustic (vocal) modality might affect the temporal structure of vocal turn‐taking beyond the initial serve‐and‐return exchanges. We video‐recorded free‐play sessions of 51 caregivers with their 9‐month‐old infants. Caregivers responded to babbling most often with vocalizations. In turn, caregiver vocal responses were significantly more likely to elicit subsequent infant babbling. Bouts following an initial caregiver vocal response contained significantly more turns than those following a non‐vocal or multimodal response. Thus prelinguistic turn‐taking is sensitive to the modality of caregivers' responses. Future research should investigate if such sensitivity is grounded in attentional constraints, which may influence the structure of turn‐taking interactions.

    more » « less
  3. Abstract

    Classroom engagement plays a crucial role in preschoolers' development, yet the correlates of engagement, especially among children with autism spectrum disorder (ASD) and developmental delays (DD), remains unknown. This study examines levels of engagement with classroom social partners and tasks among children in three groups ASD, DD, and typical development (TD). Here, we asked whether children's vocal interactions (vocalizations to and from peers and teachers) were associated with their classroom engagement with social partners (peers and teachers) and with tasks, and whether the association between classroom engagement and vocal interactions differed between children in the ASD group and their peers in the DD and TD groups. Automated measures of vocalizations and location quantified children's vocal interactions with peers and teachers over the course of the school year. Automated location and vocalization data were used to capture both (1) children's vocal output to specific peers and teachers, and (2) the vocal input they received from those peers and teachers. Participants were 72 3–5‐year‐olds (Mage = 48.6 months, SD = 7.0, 43% girls) and their teachers. Children in the ASD group displayed lower engagement with peers, teachers, and tasks than children in the TD group; they also showed lower engagement with peers than children in the DD group. Overall, children's own vocalizations were positively associated with engagement with social partners. Thus, although children in the ASD group tend to have lower engagement scores than children in the TD group, active participation in vocal interactions appears to support their classroom engagement with teachers and peers.

    more » « less
  4. Understanding and assessing child verbal communication patterns is critical in facilitating effective language development. Typically speaker diarization is performed to explore children’s verbal engagement. Understanding which activity areas stimulate verbal communication can help promote more efficient language development. In this study, we present a two stage children vocal engagement prediction system that consists of (1) a near to real-time, noise robust system that measures the duration of child-to-adult and child-to-child conversations, and tracks the number of conversational turn-takings, (2) a novel child location tracking strategy, that determines in which activity areas a child spends most/least of their time. A proposed child–adult turn-taking solution relies exclusively on vocal cues observed during the interaction between a child and other children, and/or classroom teachers. By employing a threshold optimized speech activity detection using a linear combination of voicing measures, it is possible to achieve effective speech/non-speech segment detection prior to conversion assessment. This TO-COMBO-SAD reduces classification error rates for adult-child audio by 21.34% and 27.3% compared to a baseline i-Vector and standard Bayesian Information Criterion diarization systems, respectively. In addition, this study presents a unique location tracking system adult-child that helps determine the quantity of child–adult communication in specific activity areas, and which activities stimulate voice communication engagement in a child–adult education space. We observe that our proposed location tracking solution offers unique opportunities to assess speech and language interaction for children, and quantify the location context which would contribute to improve verbal communication. 
    more » « less
  5. Recognizing the affective state of children with autism spectrum disorder (ASD) in real-world settings poses challenges due to the varying head poses, illumination levels, occlusion and a lack of datasets annotated with emotions in in-the-wild scenarios. Understanding the emotional state of children with ASD is crucial for providing personalized interventions and support. Existing methods often rely on controlled lab environments, limiting their applicability to real-world scenarios. Hence, a framework that enables the recognition of affective states in children with ASD in uncontrolled settings is needed. This paper presents a framework for recognizing the affective state of children with ASD in an in-the-wild setting using heart rate (HR) information. More specifically, an algorithm is developed that can classify a participant’s emotion as positive, negative, or neutral by analyzing the heart rate signal acquired from a smartwatch. The heart rate data are obtained in real time using a smartwatch application while the child learns to code a robot and interacts with an avatar. The avatar assists the child in developing communication skills and programming the robot. In this paper, we also present a semi-automated annotation technique based on facial expression recognition for the heart rate data. The HR signal is analyzed to extract features that capture the emotional state of the child. Additionally, in this paper, the performance of a raw HR-signal-based emotion classification algorithm is compared with a classification approach based on features extracted from HR signals using discrete wavelet transform (DWT). The experimental results demonstrate that the proposed method achieves comparable performance to state-of-the-art HR-based emotion recognition techniques, despite being conducted in an uncontrolled setting rather than a controlled lab environment. The framework presented in this paper contributes to the real-world affect analysis of children with ASD using HR information. By enabling emotion recognition in uncontrolled settings, this approach has the potential to improve the monitoring and understanding of the emotional well-being of children with ASD in their daily lives.

    more » « less