HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models

Arif, Kazi_Hasan Ibn; Yoon, JinYi; Nikolopoulos, Dimitrios S; Vandierendonck, Hans; John, Deepu; Ji, Bo

doi:10.1609/aaai.v39i2.32171

Citation Details

This content will become publicly available on April 11, 2026

HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models

High-resolution Vision-Language Models (VLMs) are widely used in multimodal tasks to enhance accuracy by preserving detailed image information. However, these models often generate an excessive number of visual tokens due to the need to encode multiple partitions of a high-resolution image input. Processing such a large number of visual tokens poses significant computational challenges, particularly for resource-constrained commodity GPUs. To address this challenge, we propose High-Resolution Early Dropping (HiRED), a plug-and-play token-dropping method designed to operate within a fixed token budget. HiRED leverages the attention of CLS token in the vision transformer (ViT) to assess the visual content of the image partitions and allocate an optimal token budget for each partition accordingly. The most informative visual tokens from each partition within the allocated budget are then selected and passed to the subsequent Large Language Model (LLM). We showed that HiRED achieves superior accuracy and performance, compared to existing token-dropping methods. Empirically, HiRED-20% (i.e., a 20% token budget) on LLaVA-Next-7B achieves a 4.7x increase in token generation throughput, reduces response latency by 78%, and saves 14% of GPU memory for single inference on an NVIDIA TESLA P40 (24 GB). For larger batch sizes (e.g., 4), HiRED-20% prevents out-of-memory errors by cutting memory usage by 30%, while preserving throughput and latency benefits. more »

Award ID(s):: 2315851

PAR ID:: 10621667

Author(s) / Creator(s):: Arif, Kazi_Hasan Ibn; Yoon, JinYi; Nikolopoulos, Dimitrios S; Vandierendonck, Hans; John, Deepu; Ji, Bo

Publisher / Repository:: Association for the Advancement of Artificial Intelligence!

Date Published:: 2025-04-11

Journal Name:: Proceedings of the AAAI Conference on Artificial Intelligence

Volume:: 39

Issue:: 2

ISSN:: 2159-5399

Page Range / eLocation ID:: 1773 to 1781

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on April 11, 2026
Journal Article:
https://doi.org/10.1609/aaai.v39i2.32171

More Like this