skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An Exploratory Analysis of Air Traffic Controller Speech Intelligibility Using Voice Data from a Simulation Experiment
Air Traffic Controllers (ATCs) communicate with pilots through radio communication. Speech intelligibility is vital in ensuring that the message is conveyed accurately. Factors such as speech rate affect this. Additionally, workload and stress have been shown to affect how people communicate significantly. In this paper, we attempt to analyze the voice data of ATCs who participated in a simulated experiment in the context of these non-verbal aspects of communication, particularly transmission length and speech rate. To better understand, we analyzed our data at two levels: aggregate and individual. Moreover, we focused on a single participant to see how such non-verbal characteristics evolve. Understanding these intricacies would contribute to building automated detectors in real-time voice transmissions that would leverage technology to avert any incidents brought about by stress and workload.  more » « less
Award ID(s):
1828010
PAR ID:
10432732
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the Human Factors and Ergonomics Society Annual Meeting
Volume:
66
Issue:
1
ISSN:
2169-5067
Page Range / eLocation ID:
1038 to 1041
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Various technologies mediate synchronous audio-visual one-on-one communication (SAVOC) between Deaf and Hard-of-Hearing (DHH) and hearing colleagues, including automatic-captioning smartphone apps for in-person settings, or text-chat features of videoconferencing software in remote settings. Speech and non-verbal behaviors of hearing speakers, e.g. speaking too quietly, can make SAVOC difficult for DHH users, but prior work had not examined technology-mediated contexts. In an in-person study (N=20) with an automatic captioning smartphone app, variations in a hearing actor's enunciation and intonation dynamics affected DHH users' satisfaction. In a remote study (N=23) using a videoconferencing platform with text chat, variations in speech rate, voice intensity, enunciation, intonation dynamics, and eye contact led to such differences. This work contributes empirical evidence that specific behaviors of hearing speakers affect the accessibility of technology-mediated SAVOC for DHH users, providing motivation for future work on detecting or encouraging useful communication behaviors among hearing individuals. 
    more » « less
  2. Understanding and assessing child verbal communication patterns is critical in facilitating effective language development. Typically speaker diarization is performed to explore children’s verbal engagement. Understanding which activity areas stimulate verbal communication can help promote more efficient language development. In this study, we present a two stage children vocal engagement prediction system that consists of (1) a near to real-time, noise robust system that measures the duration of child-to-adult and child-to-child conversations, and tracks the number of conversational turn-takings, (2) a novel child location tracking strategy, that determines in which activity areas a child spends most/least of their time. A proposed child–adult turn-taking solution relies exclusively on vocal cues observed during the interaction between a child and other children, and/or classroom teachers. By employing a threshold optimized speech activity detection using a linear combination of voicing measures, it is possible to achieve effective speech/non-speech segment detection prior to conversion assessment. This TO-COMBO-SAD reduces classification error rates for adult-child audio by 21.34% and 27.3% compared to a baseline i-Vector and standard Bayesian Information Criterion diarization systems, respectively. In addition, this study presents a unique location tracking system adult-child that helps determine the quantity of child–adult communication in specific activity areas, and which activities stimulate voice communication engagement in a child–adult education space. We observe that our proposed location tracking solution offers unique opportunities to assess speech and language interaction for children, and quantify the location context which would contribute to improve verbal communication. 
    more » « less
  3. The human-robot interaction (HRI) field has rec- ognized the importance of enabling robots to interact with teams. Human teams rely on effective communication for suc- cessful collaboration in time-sensitive environments. Robots can play a role in enhancing team coordination through real-time assistance. Despite significant progress in human-robot teaming research, there remains an essential gap in how robots can effectively communicate with action teams using multimodal interaction cues in time-sensitive environments. This study addresses this knowledge gap in an experimental in-lab study to investigate how multimodal robot communication in action teams affects workload and human perception of robots. We explore team collaboration in a medical training scenario where a robotic crash cart (RCC) provides verbal and non-verbal cues to help users remember to perform iterative tasks and search for supplies. Our findings show that verbal cues for object search tasks and visual cues for task reminders reduce team workload and increase perceived ease of use and perceived usefulness more effectively than a robot with no feedback. Our work contributes to multimodal interaction research in the HRI field, highlighting the need for more human-robot teaming research to understand best practices for integrating collaborative robots in time-sensitive environments such as in hospitals, search and rescue, and manufacturing applications. 
    more » « less
  4. Prior work has shown that embodiment can benefit virtual agents, such as increasing rapport and conveying non-verbal information. However, it is unclear if users prefer an embodied to a speech-only agent for augmented reality (AR) headsets that are designed to assist users in completing real-world tasks. We conducted a study to examine users' perceptions and behaviors when interacting with virtual agents in AR. We asked 24 adults to wear the Microsoft HoloLens and find objects in a hidden object game while interacting with an agent that would offer assistance. We presented participants with four different agents: voice-only, non-human, full-size embodied, and a miniature embodied agent. Overall, users preferred the miniature embodied agent due to the novelty of his size and reduced uncanniness as opposed to the larger agent. From our results, we draw conclusions about how agent representation matters and derive guidelines on designing agents for AR headsets. 
    more » « less
  5. null (Ed.)
    This paper analyzes the musical surrogate encoding of Seenku (Mande, Burkina Faso) syllable structure on the balafon, a resonator xylophone used by the Sambla ethnicity. The elements of syllable structure that are encoded include vowel length, sesquisyllabicity, diphthongs, and nasal codas. Certain elements, like vowel length and sesquisyllabicity, involve categorical encoding through conscious rules of surrogate speech, while others, like diphthongs and nasal codas, vary between being treated as simple or complex. Beyond these categorical encodings, subtler aspects of rhythmic structure find their way into the speech surrogate through durational differences; these include duration differences from phonemic distinctions like vowel length in addition to subphonemic differences due to phrasal position. I argue that these subconscious durational differences arise from a “phonetic filter”, which mediates between the musician’s inner voice and their non-verbal behavior. Specifically, syllables encoded on the balafon may be timed according to the perceptual center (p-center) of natural spoken rhythm, pointing to a degree of phonetic detail in a musician’s inner speech. 
    more » « less