Searching Intrinsic Dimensions of Vision Transformers

Xue, Fanghui; Yang, Biao; Qi, Yingyong; Xin, Jack

doi:https://doi.org/10.17758/HEAIG10.H0622602

Citation Details

Searching Intrinsic Dimensions of Vision Transformers

It has been shown by many researchers that transformers perform as well as convolutional neural networks in many computer vision tasks. Meanwhile, the large computational costs of its attention module hinder further studies and applications on edge devices. Some pruning methods have been developed to construct efficient vision transformers, but most of them have considered image classification tasks only. Inspired by these results, we propose SiDT, a method for pruning vision transformer backbones on more complicated vision tasks like object detection, based on the search of transformer dimensions. Experiments on CIFAR-100 and COCO datasets show that the backbones with 20% or 40% dimensions/parameters pruned can have similar or even better performance than the unpruned models. Moreover, we have also provided the complexity analysis and comparisons with the previous pruning methods. more »

Award ID(s):: 1854434 1952644

PAR ID:: 10335724

Author(s) / Creator(s):: Xue, Fanghui; Yang, Biao; Qi, Yingyong; Xin, Jack

Date Published:: 2022-06-01

Journal Name:: The 20th International Conference on Innovations in Engineering and Sciences

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/https://doi.org/10.17758/HEAIG10.H0622602

More Like this