LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions

Agostinelli, Victor; Hong, Sanghyun; Chen, Lizhong

Citation Details

A promising approach to preserving model performance in linearized transformers is to employ position-based re-weighting functions. However, state-of-the-art re-weighting functions rely heavily on target sequence lengths, making it difficult or impossible to apply them to autoregressive and simultaneous tasks, where the target and sometimes even the input sequence length are unknown. To address this issue, we propose Learned Proportions (LeaP) and LeaPformers. Our contribution is built on two major components. First, we generalize the dependence on explicit positional representations and sequence lengths into dependence on sequence proportions for re-weighting. Second, we replace static positional representations with dynamic proportions derived via a compact module, enabling more flexible attention concentration patterns. We evaluate LeaPformer against eight representative efficient transformers on the Long-Range Arena benchmark, where we show that LeaPformer achieves the best quality-throughput trade-off, as well as apply LeaPformer to Wikitext-103b autoregressive language modeling and simultaneous speech-to-text translation for two language pairs, achieving competitive results in both tasks. more »

Award ID(s):: 2223483

PAR ID:: 10539867

Author(s) / Creator(s):: Agostinelli, Victor; Hong, Sanghyun; Chen, Lizhong

Publisher / Repository:: International Conference on Machine Learning (ICML)

Date Published:: 2024-07-24

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Proceeding:
The DOI is not currently available.

More Like this