ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding

Sunder, Vishal; Fosler-Lussier, Eric; Thomas, Samuel; Kuo, Hong-Kwang J; Kingsbury, Brian

doi:10.21437/Interspeech.2023-2018

Citation Details

ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding

Dialog history enhances downstream classification performance in both speech and text based dialog systems. However, there still exists a gap in dialog history integration in a fully end-to-end (E2E) spoken dialog system (SDS) versus a textual dia- log system. Text-based dialog systems use large language models (LLMs) to encode long-range dependencies by attending to the entire conversation as a contiguous token sequence. This is not possible in an E2E SDS, as speech sequences can be intractably long. We propose a convolution subsampling approach to make the speech sequence of a conversation tractable and use a conformer to attend to the speech-based conversation in a fine-grained manner. This model is further enhanced via a conversation-level knowledge transfer from a LLM using a token-level alignment strategy. Finetuning the E2E model pretrained this way gives significant gains, of up to 8%, over strong non-contextual baselines in the E2E dialog act classification task on two datasets. more »

Award ID(s):: 2008043

PAR ID:: 10560472

Author(s) / Creator(s):: Sunder, Vishal; Fosler-Lussier, Eric; Thomas, Samuel; Kuo, Hong-Kwang J; Kingsbury, Brian

Publisher / Repository:: ISCA

Date Published:: 2023-08-20

ISSN:: 2958-1796

ISBN:: 9781713888802

Page Range / eLocation ID:: 1129 to 1133

Format(s):: Medium: X

Location:: Dublin, Ireland

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.21437/Interspeech.2023-2018

More Like this