Introducing Semantics into Speech Encoders

Xu, Derek; Dong, Shuyan; Wang, Changhan; Kim, Suyoun; Lin, Zhaojiang; Liu, Bing; Shrivastava, Akshat; Li, Shang-Wen; Tseng, Liang-Hsuan; Lin, Guan-Ting; Baevski, Alexei; Lee, Hung-yi; Sun, Yizhou; Wang, Wei

doi:10.18653/v1/2023.acl-long.639

Citation Details

Introducing Semantics into Speech Encoders

Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio transcriptions, which is expensive and time-consuming to obtain. We propose a taskagnostic unsupervised way of incorporating semantic information from LLMs into selfsupervised speech encoders without labeled audio transcriptions. By introducing semantics, we improve existing speech encoder spoken language understanding (SLU) performance by over 5% on intent classification (IC), with modest gains in named entity resolution (NER) and slot filling (SF), and spoken question answering (SQA) FF1 score by over 2%. Our approach, which uses no ASR data, achieves similar performance as methods trained on over 100 hours of labeled audio transcripts, demonstrating the feasibility of unsupervised semantic augmentations to existing speech encoders. more »

Award ID(s):: 2211557 1937599

PAR ID:: 10464432

Author(s) / Creator(s):: Xu, Derek; Dong, Shuyan; Wang, Changhan; Kim, Suyoun; Lin, Zhaojiang; Liu, Bing; Shrivastava, Akshat; Li, Shang-Wen; Tseng, Liang-Hsuan; Lin, Guan-Ting; Baevski, Alexei; Lee, Hung-yi; Sun, Yizhou; Wang, Wei

Date Published:: 2023-07-01

Journal Name:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics

Volume:: 1

Page Range / eLocation ID:: 11413 to 11429

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2023.acl-long.639

More Like this