skip to main content


Title: FEARLESS STEPS: ADVANCEMENTS IN SPEECH TECHNOLOGY AND CORPUS DEVELOPMENT FOR NATURALISTIC AUDIO
INTRODUCTION: CRSS-UTDallas initiated and oversaw the efforts to recover APOLLO mission communications by re-engineering the NASA SoundScriber playback system, and digitizing 30-channel analog audio tapes – with the entire Apollo-11, Apollo-13, and Gemini-8 missions during 2011-17 [1,6]. This vast data resource was made publicly available along with supplemental speech & language technologies meta-data based on CRSS pipeline diarization transcripts and conversational speaker time-stamps for Apollo team at NASA Mission Control Center, [2,4]. In 2021, renewed efforts over the past year have resulted in the digitization of an additional +50,000hrs of audio from Apollo 7,8,9,10,12 missions, and remaining A-13 tapes. Cumulative digitization efforts have enabled the development of the largest publicly available speech data resource with unprompted, real conversations recorded in naturalistic environments. Deployment of this massive corpus has inspired multiple collaborative initiatives such as Web resources ExploreApollo (https://app.exploreapollo.org) LanguageARC (https://languagearc.com/projects/21) [3]. ExploreApollo.org serves as the visualization and play-back tool, and LanguageARC the crowd source subject content tagging resource developed by UG/Grad. Students, intended as an educational resource for k-12 students, and STEM/Apollo enthusiasts. Significant algorithmic advancements have included advanced deep learning models that are now able to improve automatic transcript generation quality, and even extract high level knowledge such as ID labels of topics being spoken across different mission stages. Efficient transcript generation and topic extraction tools for this naturalistic audio have wide applications including content archival and retrieval, speaker indexing, education, group dynamics and team cohesion analysis. Some of these applications have been deployed in our online portals to provide a more immersive experience for students and researchers. Continued worldwide outreach in the form of the Fearless Steps Challenges has proven successful with the most recent Phase-4 of the Challenge series. This challenge has motivated research in low level tasks such as speaker diarization and high level tasks like topic identification. IMPACT: Distribution and visualization of the Apollo audio corpus through the above mentioned online portals and Fearless Steps Challenges have produced significant impact as a STEM education resource for K-12 students as well as a SLT development resource with real-world applications for research organizations globally. The speech technologies developed by CRSS-UTDallas using the Fearless Steps Apollo corpus have improved previous benchmarks on multiple tasks [1, 5]. The continued initiative will extend the current digitization efforts to include over 150,000 hours of audio recorded during all Apollo missions. ILLUSTRATION: We will demonstrate WebExploreApollo and LanguageARC online portals with newly digitized audio playback in addition to improved SLT baseline systems, the results from ASR and Topic Identification systems which will include research performed on the corpus conversational. Performance analysis visualizations will also be illustrated. We will also display results from the past challenges and their state-of-the-art system improvements.  more » « less
Award ID(s):
2016725
NSF-PAR ID:
10484504
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
NASA Human Research Program
Date Published:
Journal Name:
NASA Human Research Program Investigators Conference
Page Range / eLocation ID:
paper #WV9G7L7DV6
Format(s):
Medium: X
Location:
Galveston, TX
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent developments in deep learning strategies have revolutionized Speech and Language Technologies(SLT). Deep learning models often rely on massive naturalistic datasets to produce the necessary complexity required for generating superior performance. However, most massive SLT datasets are not publicly available, limiting the potential for academic research. Through this work, we showcase the CRSS-UTDallas led efforts to recover, digitize, and openly distribute over 50,000 hrs of speech data recorded during the 12 NASA Apollo manned missions, and outline our continuing efforts to digitize and create meta-data through diarization of the remaining 100,000hrs. We present novel deep learning-based speech processing solutions developed to extract high-level information from this massive dataset. Fearless-Steps APOLLO resource is a 50,000 hrs audio collection from 30-track analog tapes originally used to document Apollo missions 1,7,8,10,11,&13. A customized tape read-head developed to digitize all 30 channels simultaneously has been deployed to expedite digitization of remaining mission tapes. Diarized transcripts for these unlabeled audio communications have also been generated to facilitate open research from speech sciences, historical archives, education, and speech technology communities. Robust technologies developed to generate human-readable transcripts include: (i) speaker diarization, (ii) speaker tracking, and (iii) text output from speech recognition systems. 
    more » « less
  2. Fearless Steps (FS) APOLLO is a + 50,000 hr audio resource established by CRSS-UTDallas capturing all communications between NASA-MCC personnel, backroom staff, and Astronauts across manned Apollo Missions. Such a massive audio resource without metadata/unlabeled corpus provides limited benefit for communities outside Speech-and-Language Technology (SLT). Supplementing this audio with rich metadata developed using robust automated mechanisms to transcribe and highlight naturalistic communications can facilitate open research opportunities for SLT, speech sciences, education, and historical archival communities. In this study, we focus on customizing keyword spotting (KWS) and topic detection systems as an initial step towards conversational understanding. Extensive research in automatic speech recognition (ASR), speech activity, and speaker diarization using manually transcribed 125 h FS Challenge corpus has demonstrated the need for robust domain-specific model development. A major challenge in training KWS systems and topic detection models is the availability of word-level annotations. Forced alignment schemes evaluated using state-of-the-art ASR show significant degradation in segmentation performance. This study explores challenges in extracting accurate keyword segments using existing sentence-level transcriptions and proposes domain-specific KWS-based solutions to detect conversational topics in audio streams. 
    more » « less
  3. INTRODUCTION: Apollo-11 (A-11) was the first manned space mission to successfully bring astronauts to the moon and return them safely. Effective team based communications is required for mission specialists to work collaboratively to learn, engage, and solve complex problems. As part of NASA’s goal in assessing team and mission success, all vital speech communications between these personnel were recorded using the multi-track SoundScriber system onto analog tapes, preserving their contribution in the success of one of the greatest achievements in human history. More than +400 personnel served as mission specialists/support who communicated across 30 audio loops, resulting in +9k hours of data for A-11. To ensure success of this mission, it was necessary for teams to communicate, learn, and address problems in a timely manner. Previous research has found that compatibility of individual personalities within teams is important for effective team collaboration of those individuals. Hence, it is essential to identify each speaker’s role during an Apollo mission and analyze group communications for knowledge exchange and problem solving to achieve a common goal. Assessing and analyzing speaker roles during the mission can allow for exploring engagement analysis for multi-party speaker situations. METHOD: The UTDallas Fearless steps Apollo data is comprised of 19,000 hours (A-11,A-13,A-1) possessing unique and multiple challenges as it is characterized by severe noise and degradation as well as overlap instances over the 30 channels. For our study, we have selected a subset of 100 hours manually transcribed by professional annotators for speaker labels. The 100 hours are obtained from three mission critical events: 1. Lift-Off (25 hours) 2. Lunar-Landing (50 hours) 3. Lunar-Walking (25 hours). Five channels of interest, out of 30 channels were selected with the most speech activity, the primary speakers operating these five channels are command/owners of these channels. For our analysis, we select five speaker roles: Flight Director (FD), Capsule Communicator (CAPCOM), Guidance, Navigation and, Control (GNC), Electrical, environmental, and consumables manager (EECOM), and Network (NTWK). To track and tag individual speakers across our Fearless Steps audio dataset, we use the concept of ‘where’s Waldo’ to identify all instances of our speakers-of-interest across a cluster of other speakers. Also, to understand speaker roles of our speaker-of-interests, we use speaker duration of primary speaker vs secondary speaker and speaker turns as our metrics to determine the role of the speaker and to understand their responsibility during the three critical phases of the mission. This enables a content linking capability as well as provide a pathway to analyzing group engagement, group dynamics of people working together in an enclosed space, psychological effects, and cognitive analysis in such individuals. IMPACT: NASA’s Apollo Program stands as one of the most significant contributions to humankind. This collection opens new research options for recognizing team communication, group dynamics, and human engagement/psychology for future deep space missions. Analyzing team communications to achieve such goals would allow for the formulation of educational and training technologies for assessment of STEM knowledge, task learning, and educational feedback. Also, identifying these personnel can help pay tribute and yield personal recognition to the hundreds of notable engineers and scientist who made this feat possible. ILLUSTRATION: In this work, we propose to illustrate how a pre-trained speech/language network can be used to obtain powerful speaker embeddings needed for speaker diarization. This framework is used to build these learned embeddings to label unique speakers over sustained audio streams. To train and test our system, we will make use of Fearless Steps Apollo corpus, allowing us to effectively leverage a limited label information resource (100 hours of labeled data out of +9000 hours). Furthermore, we use the concept of 'Finding Waldo' to identify key speakers of interest (SOI) throughout the Apollo-11 mission audio across multiple channel audio streams. 
    more » « less
  4. Speech activity detection (SAD) serves as a crucial front-end system to several downstream Speech and Language Technology (SLT) tasks such as speaker diarization, speaker identification, and speech recognition. Recent years have seen deep learning (DL)-based SAD systems designed to improve robustness against static background noise and interfering speakers. However, SAD performance can be severely limited for conversations recorded in naturalistic environments due to dynamic acoustic scenarios and previously unseen non-speech artifacts. In this letter, we propose an end-to-end deep learning framework designed to be robust to time-varying noise profiles observed in naturalistic audio. We develop a novel SAD solution for the UTDallas Fearless Steps Apollo corpus based on NASA’s Apollo missions. The proposed system leverages spectro-temporal correlations with a threshold optimization mechanism to adjust to acoustic variabilities across multiple channels and missions. This system is trained and evaluated on the Fearless Steps Challenge (FSC) corpus (a subset of the Apollo corpus). Experimental results indicate a high degree of adaptability to out-of-domain data, achieving a relative Detection Cost Function (DCF) performance improvement of over 50% compared to the previous FSC baselines and state-of-the-art (SOTA) SAD systems. The proposed model also outperforms the most recent DL-based SOTA systems from FSC Phase-4. Ablation analysis is conducted to confirm the efficacy of the proposed spectro-temporal features. 
    more » « less
  5. Naturalistic team based speech communications requires specific protocols/procedures to be followed to allow for effective task completion for distributed team members. NASA Apollo-11 was the first manned space mission to successfully bring astronauts to the moon and return them safely. Mission specialists roles within NASA Mission Control (MOCR) are complex and reflected in their communications. In this study, we perform speaker clustering to identify speech segments uttered by the same speaker from recently recovered Fearless Steps APOLLO corpus (CRSS-UTDallas). We propose a pretrained network to obtain speaker embeddings and use a framework that builds on these learned embeddings which achieves a clustering accuracy of 73.4%. We also track/tag key speakers-of-interest across three critical mission phases and analyze speaker roles based on speech duration. NASA communication protocols dictate that information be communicated in a concise manner. In automated communication analysis, individuals higher in trait dominance generally speak more and gain more control over group processes. Hence, speaker duration of primary- versus -secondary speaker and speaker turns are metrics used to determine speaker role. This analysis provides greater understanding of communications protocol and serves as a lasting tribute to the «Heroes Behind the Heroes of Apollo» as well as preserve “words spoken in space.” 
    more » « less