S3Attention: Improving Long Sequence Attention with Smoothed Skeleton Sketching

Wang, Xue; Zhou, Tian; Zhu, Jianqing; Liu, Jialin; Yuan, Kun; Yao, Tao; Yin, Wotao; Jin, Rong; Cai, HanQin

doi:10.1109/JSTSP.2024.3446173

Citation Details

S3Attention: Improving Long Sequence Attention with Smoothed Skeleton Sketching

Attention-based models have achieved many remarkable breakthroughs in numerous applications. However, the quadratic complexity of Attention makes the vanilla Attentionbased models hard to apply to long sequence tasks. Various improved Attention structures are proposed to reduce the computation cost by inducing low rankness and approximating the whole sequence by sub-sequences. The most challenging part of those approaches is maintaining the proper balance between information preservation and computation reduction: the longer sub-sequences used, the better information is preserved, but at the price of introducing more noise and computational costs. In this paper, we propose a smoothed skeleton sketching based Attention structure, coined S3Attention, which significantly improves upon the previous attempts to negotiate this trade-off. S3Attention has two mechanisms to effectively minimize the impact of noise while keeping the linear complexity to the sequence length: a smoothing block to mix information over long sequences and a matrix sketching method that simultaneously selects columns and rows from the input matrix. We verify the effectiveness of S3Attention both theoretically and empirically. Extensive studies over Long Range Arena (LRA) datasets and six time-series forecasting show that S3Attention significantly outperforms both vanilla Attention and other state-of-the-art variants of Attention structures. more »

Award ID(s):: 2304489

PAR ID:: 10536045

Author(s) / Creator(s):: Wang, Xue; Zhou, Tian; Zhu, Jianqing; Liu, Jialin; Yuan, Kun; Yao, Tao; Yin, Wotao; Jin, Rong; Cai, HanQin

Publisher / Repository:: IEEE

Date Published:: 2024-08-22

Journal Name:: IEEE Journal of Selected Topics in Signal Processing

ISSN:: 1932-4553

Page Range / eLocation ID:: 1-18

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1109/JSTSP.2024.3446173

More Like this