IRET: Incremental Resolution Enhancing Transformer

Saber_Latibari, Banafsheh; Salehi, Soheil; Homayoun, Houman; Sasan, Avesta

doi:10.1145/3649476.3660380

Citation Details

IRET: Incremental Resolution Enhancing Transformer

In our research paper, we introduce a revolutionary approach to designing energy-aware dynamically prunable Vision Trans- formers for use in edge applications. Our solution denoted as Incremental Resolution Enhancing Transformer (IRET), works by the sequential sampling of the input image. However, in our case, the embedding size of input tokens is considerably smaller than prior-art solutions. This embedding is used in the first few layers of the IRET vision transformer until a reliable attention matrix is formed. Then the attention matrix is used to sample additional information using a learnable 2D lifting scheme only for important tokens and IRET drops the tokens receiving low attention scores. Hence, as the model pays more attention to a subset of tokens for its task, its focus and resolu- tion also increase. This incremental attention-guided sampling of input and dropping of unattended tokens allow IRET to sig- nificantly prune its computation tree on demand. By controlling the threshold for dropping unattended tokens and increasing the focus of attended ones, we can train a model that dynami- cally trades off complexity for accuracy. This is especially useful for edge devices, where accuracy and complexity could be dy- namically traded based on factors such as battery life, reliability, etc. more »

Award ID(s):: 2233893 2228240

PAR ID:: 10554695

Author(s) / Creator(s):: Saber_Latibari, Banafsheh; Salehi, Soheil; Homayoun, Houman; Sasan, Avesta

Publisher / Repository:: ACM

Date Published:: 2024-06-12

ISBN:: 9798400706059

Page Range / eLocation ID:: 620 to 625

Subject(s) / Keyword(s):: Vision Transformer, Token Dropping, Attention, Focus

Format(s):: Medium: X

Location:: Clearwater FL USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3649476.3660380

More Like this