- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources4
- Resource Type
-
0002100001000000
- More
- Availability
-
31
- Author / Contributor
- Filter by Author / Creator
-
-
Zhang, Tianrong (4)
-
Chen, Jinghui (3)
-
Ma, Fenglong (3)
-
Wang, Ting (3)
-
Ye, Muchao (3)
-
Yin, Ziyi (3)
-
Du, Tianyu (2)
-
Jin, Ze (1)
-
Liu, Han (1)
-
Matteson, David S (1)
-
Wang, Jiaqi (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
& Aina, D.K. Jr. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available September 1, 2025
-
Yin, Ziyi; Ye, Muchao; Zhang, Tianrong; Wang, Jiaqi; Liu, Han; Chen, Jinghui; Wang, Ting; Ma, Fenglong (, Proceedings of the AAAI Conference on Artificial Intelligence)Visual Question Answering (VQA) is a fundamental task in computer vision and natural language process fields. Although the “pre-training & finetuning” learning paradigm significantly improves the VQA performance, the adversarial robustness of such a learning paradigm has not been explored. In this paper, we delve into a new problem: using a pre-trained multimodal source model to create adversarial image-text pairs and then transferring them to attack the target VQA models. Correspondingly, we propose a novel VQATTACK model, which can iteratively generate both im- age and text perturbations with the designed modules: the large language model (LLM)-enhanced image attack and the cross-modal joint attack module. At each iteration, the LLM-enhanced image attack module first optimizes the latent representation-based loss to generate feature-level image perturbations. Then it incorporates an LLM to further enhance the image perturbations by optimizing the designed masked answer anti-recovery loss. The cross-modal joint attack module will be triggered at a specific iteration, which updates the image and text perturbations sequentially. Notably, the text perturbation updates are based on both the learned gradients in the word embedding space and word synonym-based substitution. Experimental results on two VQA datasets with five validated models demonstrate the effectiveness of the proposed VQATTACK in the transferable attack setting, compared with state-of-the-art baselines. This work revealsa significant blind spot in the “pre-training & fine-tuning” paradigm on VQA tasks. The source code can be found in the link https://github.com/ericyinyzy/VQAttack.more » « less
-
Ye, Muchao; Yin, Ziyi; Zhang, Tianrong; Du, Tianyu; Chen, Jinghui; Wang, Ting; Ma, Fenglong (, Annual Conference on Neural Information Processing Systems (NeurIPS’23))
-
Jin, Ze; Matteson, David S; Zhang, Tianrong (, IEEE)