Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

Chen, Junkun; Ma, Mingbo; Zheng, Renjie; Huang, Liang

doi:10.18653/v1/2021.findings-acl.406

Citation Details

Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

Simultaneous speech-to-text translation is widely useful in many scenarios. The conventional cascaded approach uses a pipeline of streaming ASR followed by simultaneous MT, but suffers from error propagation and extra latency. To alleviate these issues, recent efforts attempt to directly translate the source speech into target text simultaneously, but this is much harder due to the combination of two separate tasks. We instead propose a new paradigm with the advantages of both cascaded and endto-end approaches. The key idea is to use two separate, but synchronized, decoders on streaming ASR and direct speech-to-text translation (ST), respectively, and the intermediate results of ASR guide the decoding policy of (but is not fed as input to) ST. During training time, we use multitask learning to jointly learn these two tasks with a shared encoder. En-toDe and En-to-Es experiments on the MuSTC dataset demonstrate that our proposed technique achieves substantially better translation quality at similar levels of latency. more »

Award ID(s):: 2009071 1817231

NSF-PAR ID:: 10398230

Author(s) / Creator(s):: Chen, Junkun; Ma, Mingbo; Zheng, Renjie; Huang, Liang

Date Published:: 2021-01-01

Journal Name:: Proceedings of ACL 2021: Findings

Page Range / eLocation ID:: 4618 to 4624

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2021.findings-acl.406

More Like this