NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision

Huang, Jiani; Li, Ziyang; Naik, Mayur; Lim, Ser Nam (April 2025, ICLR 2025)

Free, publicly-accessible full text available April 1, 2026
Composing Object Relations and Attributes for Image-Text Matching

https://doi.org/10.1109/CVPR52733.2024.01361

Pham, Khoi; Huynh, Chuong; Lim, Ser-Nam; Shrivastava, Abhinav (June 2024, IEEE)

Full Text Available
Video Decomposition Prior: Editing Videos Layer By Layer

Shrivastava, Gaurav; Lim, Ser-Nam; Shrivastava, Abhinav (May 2024, The Twelfth International Conference on Learning Representations (ICLR))

Full Text Available
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks

Cui, Xuanming; Aparcedo, Alejandro; Jang, Young Kyun; Lim, Ser-Nam (June 2024, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Recent advances in instruction tuning have led to the development of State-of-the-Art Large Multimodal Models (LMMs). Given the novelty of these models the impact of visual adversarial attacks on LMMs has not been thoroughly examined. We conduct a comprehensive study of the robustness of various LMMs against different adversarial attacks evaluated across tasks including image classification image captioning and Visual Question Answer (VQA). We find that in general LMMs are not robust to visual adversarial inputs. However our findings suggest that context provided to the model via prompts--such as questions in a QA pair--helps to mitigate the effects of visual adversarial inputs. Notably the LMMs evaluated demonstrated remarkable resilience to such attacks on the ScienceQA task with only an 8.10% drop in performance compared to their visual counterparts which dropped 99.73%. We also propose a new approach to real-world image classification which we term query decomposition. By incorporating existence queries into our input prompt we observe diminished attack effectiveness and improvements in image classification accuracy. This research highlights a previously under explored facet of LMM robustness and sets the stage for future work aimed at strengthening the resilience of multimodal systems in adversarial environments.
more » « less
Full Text Available
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

https://doi.org/10.1109/CVPR52733.2024.01282

He, Bo; Li, Hengduo; Jang, Young Kyun; Jia, Menglin; Cao, Xuefei; Shah, Ashish; Lim, Ser-Nam; Shrivastava, Abhinav (June 2024, IEEE)

Full Text Available
Visual Prompt Tuning

https://doi.org/10.1007/978-3-031-19827-4_41

Jia, Menglin; Tang, Luming; Chen, Bor-Chun; Cardie, Claire; Belongie, Serge; Hariharan, Bharath; Lim, Ser-Nam (October 2022, European Conference on Computer Vision)

The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost.
more » « less
Full Text Available
Neural Manifold Ordinary Differential Equations

Lou, Aaron; Lim, Derek; Katsman, Isay; Huang, Leo; Jiang, Qingxuan; Lim, Ser Nam (January 2021, 2020 Advances in Neural Information Processing Systems (NeurIPS 2020))
Larochelle, Hugo; Ranzato, Marc'Aurelio; Hadsell, Raia; Balcan, Maria-Florina; Lin, Hsuan-Tien (Ed.)
To better conform to data geometry, recent deep generative modelling techniques adapt Euclidean constructions to non-Euclidean spaces. In this paper, we study normalizing flows on manifolds. Previous work has developed flow models for specific cases; however, these advancements hand craft layers on a manifold-by-manifold basis, restricting generality and inducing cumbersome design constraints. We overcome these issues by introducing Neural Manifold Ordinary Differential Equations, a manifold generalization of Neural ODEs, which enables the construction of Manifold Continuous Normalizing Flows (MCNFs). MCNFs require only local geometry (therefore generalizing to arbitrary manifolds) and compute probabilities with continuous change of variables (allowing for a simple and expressive flow construction). We find that leveraging continuous manifold dynamics produces a marked improvement for both density estimation and downstream tasks.
more » « less
Full Text Available

Search for: All records