skip to main content


Title: A Social Robot System for Modeling Children's Word Pronunciation
Autonomous educational social robots can be used to help promote literacy skills in young children. Such robots, which emulate the emotive, perceptual, and empathic abilities of human teachers, are capable of replicating some of the benefits of one-on-one tutoring from human teachers, in part by leveraging individual student’s behavior and task performance data to infer sophisticated models of their knowledge. These student models are then used to provide personalized educational experiences by, for example, determining the optimal sequencing of curricular material. In this paper, we introduce an integrated system for autonomously analyzing and assessing children’s speech and pronunciation in the context of an interactive word game between a social robot and a child. We present a novel game environment and its computational formulation, an integrated pipeline for capturing and analyzing children’s speech in real-time, and an autonomous robot that models children’s word pronunciation via Gaussian Process Regression (GPR), augmented with an Active Learning protocol that informs the robot’s behavior. We show that the system is capable of autonomously assessing children’s pronunciation ability, with ground truth determined by a post-experiment evaluation by human raters. We also compare phoneme- and word-level GPR models and discuss trade-offs of each approach in modeling children’s pronunciation. Finally, we describe and analyze a pipeline for automatic analysis of children’s speech and pronunciation, including an evaluation of Speech Ace as a tool for future development of autonomous, speech-based language tutors.  more » « less
Award ID(s):
1734443
NSF-PAR ID:
10072816
Author(s) / Creator(s):
Date Published:
Journal Name:
autonomous agents and multi agent systems 2018
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Social robots are becoming increasingly influential in shaping the behavior of humans with whom they interact. Here, we examine how the actions of a social robot can influence human-to-human communication, and not just robot–human communication, using groups of three humans and one robot playing 30 rounds of a collaborative game ( n = 51 groups). We find that people in groups with a robot making vulnerable statements converse substantially more with each other, distribute their conversation somewhat more equally, and perceive their groups more positively compared to control groups with a robot that either makes neutral statements or no statements at the end of each round. Shifts in robot speech have the power not only to affect how people interact with robots, but also how people interact with each other, offering the prospect for modifying social interactions via the introduction of artificial agents into hybrid systems of humans and machines. 
    more » « less
  2. This paper presents the results of a pilot study that introduces social robots into kindergarten and first-grade classroom tasks. This study aims to understand 1) how effective social robots are in administering educational activities and assessments, and 2) if these interactions with social robots can serve as a gateway into learning about robotics and STEM for young children. We administered a commonly-used assessment (GFTA3) of speech production using a social robot and compared the quality of recorded responses to those obtained with a human assessor. In a comparison done between 40 children, we found no significant differences in the student responses between the two conditions over the three metrics used: word repetition accuracy, number of times additional help was needed, and similarity of prosody to the assessor. We also found that interactions with the robot were successfully able to stimulate curiosity in robotics, and therefore STEM, from a large number of the 164 student participants. 
    more » « less
  3. Mobile robots must navigate efficiently, reliably, and appropriately around people when acting in shared social environments. For robots to be accepted in such environments, we explore robot navigation for the social contexts of each setting. Navigating through dynamic environments solely considering a collision-free path has long been solved. In human-robot environments, the challenge is no longer about efficiently navigating from one point to another. Autonomously detecting the context and adapting to an appropriate social navigation strategy is vital for social robots’ long-term applicability in dense human environments. As complex social environments, museums are suitable for studying such behavior as they have many different navigation contexts in a small space.Our prior Socially-Aware Navigation model considered con-text classification, object detection, and pre-defined rules to define navigation behavior in more specific contexts, such as a hallway or queue. This work uses environmental context, object information, and more realistic interaction rules for complex social spaces. In the first part of the project, we convert real-world interactions into algorithmic rules for use in a robot’s navigation system. Moreover, we use context recognition, object detection, and scene data for context-appropriate rule selection. We introduce our methodology of studying social behaviors in complex contexts, different analyses of our text corpus for museums, and the presentation of extracted social norms. Finally, we demonstrate applying some of the rules in scenarios in the simulation environment. 
    more » « less
  4. Deployed social robots are increasingly relying on wakeword-based interaction, where interactions are human-initiated by a wakeword like “Hey Jibo”. While wakewords help to increase speech recognition accuracy and ensure privacy, there is concern that wakeword-driven interaction could encourage impolite behavior because wakeword-driven speech is typically phrased as commands. To address these concerns, companies have sought to use wake- word design to encourage interactant politeness, through wakewords like “⟨Name⟩, please”. But while this solution is intended to encourage people to use more “polite words”, researchers have found that these wakeword designs actually decrease interactant politeness in text-based communication, and that other wakeword designs could better encourage politeness by priming users to use Indirect Speech Acts. Yet there has been no previous research to directly compare these wakewords designs in in-person, voice-based human-robot interaction experiments, and previous in-person HRI studies could not effectively study carryover of wakeword-driven politeness and impoliteness into human-human interactions. In this work, we conceptually reproduced these previous studies (n=69) to assess how the wakewords “Hey ⟨Name⟩”, “Excuse me ⟨Name⟩”, and “⟨Name⟩, please” impact robot-directed and human-directed politeness. Our results demonstrate the ways that different types of linguistic priming interact in nuanced ways to induce different types of robot-directed and human-directed politeness. 
    more » « less
  5. INTRODUCTION: Apollo-11 (A-11) was the first manned space mission to successfully bring astronauts to the moon and return them safely. Effective team based communications is required for mission specialists to work collaboratively to learn, engage, and solve complex problems. As part of NASA’s goal in assessing team and mission success, all vital speech communications between these personnel were recorded using the multi-track SoundScriber system onto analog tapes, preserving their contribution in the success of one of the greatest achievements in human history. More than +400 personnel served as mission specialists/support who communicated across 30 audio loops, resulting in +9k hours of data for A-11. To ensure success of this mission, it was necessary for teams to communicate, learn, and address problems in a timely manner. Previous research has found that compatibility of individual personalities within teams is important for effective team collaboration of those individuals. Hence, it is essential to identify each speaker’s role during an Apollo mission and analyze group communications for knowledge exchange and problem solving to achieve a common goal. Assessing and analyzing speaker roles during the mission can allow for exploring engagement analysis for multi-party speaker situations. METHOD: The UTDallas Fearless steps Apollo data is comprised of 19,000 hours (A-11,A-13,A-1) possessing unique and multiple challenges as it is characterized by severe noise and degradation as well as overlap instances over the 30 channels. For our study, we have selected a subset of 100 hours manually transcribed by professional annotators for speaker labels. The 100 hours are obtained from three mission critical events: 1. Lift-Off (25 hours) 2. Lunar-Landing (50 hours) 3. Lunar-Walking (25 hours). Five channels of interest, out of 30 channels were selected with the most speech activity, the primary speakers operating these five channels are command/owners of these channels. For our analysis, we select five speaker roles: Flight Director (FD), Capsule Communicator (CAPCOM), Guidance, Navigation and, Control (GNC), Electrical, environmental, and consumables manager (EECOM), and Network (NTWK). To track and tag individual speakers across our Fearless Steps audio dataset, we use the concept of ‘where’s Waldo’ to identify all instances of our speakers-of-interest across a cluster of other speakers. Also, to understand speaker roles of our speaker-of-interests, we use speaker duration of primary speaker vs secondary speaker and speaker turns as our metrics to determine the role of the speaker and to understand their responsibility during the three critical phases of the mission. This enables a content linking capability as well as provide a pathway to analyzing group engagement, group dynamics of people working together in an enclosed space, psychological effects, and cognitive analysis in such individuals. IMPACT: NASA’s Apollo Program stands as one of the most significant contributions to humankind. This collection opens new research options for recognizing team communication, group dynamics, and human engagement/psychology for future deep space missions. Analyzing team communications to achieve such goals would allow for the formulation of educational and training technologies for assessment of STEM knowledge, task learning, and educational feedback. Also, identifying these personnel can help pay tribute and yield personal recognition to the hundreds of notable engineers and scientist who made this feat possible. ILLUSTRATION: In this work, we propose to illustrate how a pre-trained speech/language network can be used to obtain powerful speaker embeddings needed for speaker diarization. This framework is used to build these learned embeddings to label unique speakers over sustained audio streams. To train and test our system, we will make use of Fearless Steps Apollo corpus, allowing us to effectively leverage a limited label information resource (100 hours of labeled data out of +9000 hours). Furthermore, we use the concept of 'Finding Waldo' to identify key speakers of interest (SOI) throughout the Apollo-11 mission audio across multiple channel audio streams. 
    more » « less