NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

S2M3: Split-and-Share Multi-Modal Models for Distributed Multi-Task Inference on the Edge

Yoon, Jinyi; Lee, Jiho; He, Ting; Choi, Nakjung; Ji, Bo (July 2025, IEEE)

Free, publicly-accessible full text available July 20, 2026
Taming heavy-tailed losses in adversarial bandits and the best-of-both-worlds setting

Cheng, Duo; Zhou, Xingyu; Ji, Bo (June 2025, NIPS '24: Proceedings of the 38th International Conference on Neural Information Processing Systems)

Free, publicly-accessible full text available June 5, 2026
Multimodal Remote Inference

https://doi.org/10.1109/MASS66014.2025.00039

Zhang, Keyuan; Sun, Yin; Ji, Bo (October 2025, IEEE)

Free, publicly-accessible full text available October 6, 2026
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models

https://doi.org/10.1609/aaai.v39i2.32171

Arif, Kazi_Hasan Ibn; Yoon, JinYi; Nikolopoulos, Dimitrios S; Vandierendonck, Hans; John, Deepu; Ji, Bo (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

High-resolution Vision-Language Models (VLMs) are widely used in multimodal tasks to enhance accuracy by preserving detailed image information. However, these models often generate an excessive number of visual tokens due to the need to encode multiple partitions of a high-resolution image input. Processing such a large number of visual tokens poses significant computational challenges, particularly for resource-constrained commodity GPUs. To address this challenge, we propose High-Resolution Early Dropping (HiRED), a plug-and-play token-dropping method designed to operate within a fixed token budget. HiRED leverages the attention of CLS token in the vision transformer (ViT) to assess the visual content of the image partitions and allocate an optimal token budget for each partition accordingly. The most informative visual tokens from each partition within the allocated budget are then selected and passed to the subsequent Large Language Model (LLM). We showed that HiRED achieves superior accuracy and performance, compared to existing token-dropping methods. Empirically, HiRED-20% (i.e., a 20% token budget) on LLaVA-Next-7B achieves a 4.7x increase in token generation throughput, reduces response latency by 78%, and saves 14% of GPU memory for single inference on an NVIDIA TESLA P40 (24 GB). For larger batch sizes (e.g., 4), HiRED-20% prevents out-of-memory errors by cutting memory usage by 30%, while preserving throughput and latency benefits.
more » « less
Free, publicly-accessible full text available April 11, 2026
FrameFeedback: A Closed-Loop Control System for Dynamic Offloading Real-Time Edge Inference

https://doi.org/10.1109/IPDPSW63119.2024.00116

Jackson, Matthew; Ji, Bo; Nikolopoulos, Dimitrios S (May 2024, IEEE)

Full Text Available
Distributed Linear Bandits With Differential Privacy

https://doi.org/10.1109/TNSE.2024.3362978

Li, Fengjiao; Zhou, Xingyu; Ji, Bo (May 2024, IEEE Transactions on Network Science and Engineering)

Full Text Available
Learning-Augmented Online Minimization of Age of Information and Transmission Costs

https://doi.org/10.1109/TNSE.2025.3561736

Liu, Zhongdong; Zhang, Keyuan; Li, Bin; Sun, Yin; Hou, Y Thomas; Ji, Bo (September 2025, IEEE Transactions on Network Science and Engineering)

Free, publicly-accessible full text available September 1, 2026
Motion-Prediction-Based Wireless Scheduling for Interactive Panoramic Scene Delivery

https://doi.org/10.1109/TNSE.2023.3325420

Chen, Jiangong; Qin, Xudong; Zhu, Guangyu; Ji, Bo; Li, Bin (March 2024, IEEE Transactions on Network Science and Engineering)

Full Text Available
Totoro: A Scalable Federated Learning Engine for the Edge

https://doi.org/10.1145/3627703.3629575

Ching, Cheng-Wei; Chen, Xin; Kim, Taehwan; Ji, Bo; Wang, Qingyang; Da_Silva, Dilma; Hu, Liting (April 2024, ACM)

Full Text Available
Securing Bystander Privacy in Mixed Reality While Protecting the User Experience

https://doi.org/10.1109/MSEC.2023.3331649

Corbett, Matthew; David-John, Brendan; Shang, Jiacheng; Hu, Y Charlie; Ji, Bo (January 2024, IEEE Security & Privacy)

Full Text Available

« Prev Next »

Search for: All records