NSF PAR Search | NSF Public Access Repository

Apollo’s Unheard Voices: Graph Attention Networks for Speaker Diarization and Clustering for Fearless Steps Apollo Collection

https://doi.org/10.1109/ICASSP48485.2024.10446231

Shekar, Meena_M C; Hansen, John_H L (April 2024, IEEE)

Speaker diarization has traditionally been explored using datasets that are either clean, feature a limited number of speakers, or have a large volume of data but lack the complexities of real-world scenarios. This study takes a unique approach by focusing on the Fearless Steps APOLLO audio resource, a challenging data that contains over 70,000 hours of audio data (A-11: 10k hrs), the majority of which remains unlabeled. This corpus presents considerable challenges such as diverse acoustic conditions, high levels of background noise, overlapping speech, data imbalance, and a variable number of speakers with varying utterance duration. To address these challenges, we propose a robust speaker diarization framework built on dynamic Graph Attention Network optimized using data augmentation. Our proposed framework attains a Diarization Error Rate (DER) of 19.6% when evaluated using ground truth speech segments. Notably, our work is the first to recognize, track, and perform conversational analysis on the entire Apollo-11 mission for speakers who were unidentified until now. This work stands as a significant contribution to both historical archiving and the development of robust diarization systems, particularly relevant for challenging real-world scenarios.

Full Text Available

The Fearless Steps Apollo (FS-APOLLO) resource is a collection of 150,000 hours of audio, associated meta-data, and supplemental speech technology infrastructure intended to benefit the (i) speech processing technology, (ii) communication science, team-based psychology, and (iii) education/STEM, history/preservation/archival communities. The FS-APOLLO initiative which started in 2014 has since resulted in the preservation of over 75,000 hours of NASA Apollo Missions audio. Systems created for this audio collection have led to the emergence of several new Speech and Language Technologies (SLT). This paper seeks to provide an overview of the latest advancements in the FS-Apollo effort and explore upcoming strategies in big-data deployment, outreach, and novel avenues of K-12 and STEM education facilitated through this resource.

Search for: All records