NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

R.A.C.E. : Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model

https://doi.org/10.1007/978-3-031-73010-8_27

Kim, Changhoon; Min, Kyle; Yang, Yezhou (November 2024, Springer Nature Switzerland)

Full Text Available
Getting it Right: Improving Spatial Consistency in Text-to-Image Models

https://doi.org/10.1007/978-3-031-72670-5_12

Chatterjee, Agneet; Stan, Gabriela_Ben Melech; Aflalo, Estelle; Paul, Sayak; Ghosh, Dhruba; Gokhale, Tejas; Schmidt, Ludwig; Hajishirzi, Hannaneh; Lal, Vasudev; Baral, Chitta; et al (September 2024, Springer Nature Switzerland)

Full Text Available
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation

Chatterjee, A; Gokhale, T; Baral, C; Yang, Y (June 2024, CVPR)

Recent advances in monocular depth estimation have been made by incorporating natural language as additional guidance. Although yielding impressive results the impact of the language prior particularly in terms of generalization and robustness remains unexplored. In this paper we address this gap by quantifying the impact of this prior and introduce methods to benchmark its effectiveness across various settings. We generate "low-level" sentences that convey object-centric three-dimensional spatial relationships incorporate them as additional language priors and evaluate their downstream impact on depth estimation. Our key finding is that current language-guided depth estimators perform optimally only with scene-level descriptions and counter-intuitively fare worse with low level descriptions. Despite leveraging additional data these methods are not robust to directed adversarial attacks and decline in performance with an increase in distribution shift. Finally to provide a foundation for future research we identify points of failures and offer insights to better understand these shortcomings. With an increasing number of methods using language for depth estimation our findings highlight the opportunities and pitfalls that require careful consideration for effective deployment in real-world settings.
more » « less
Full Text Available
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

https://doi.org/10.1609/aaai.v38i13.29371

Patel, Maitreya; Gokhale, Tejas; Baral, Chitta; Yang, Yezhou (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

The ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision. Recent advances in text-to-image (T2I) models have lead to high definition and realistic image quality generation by learning from large databases of images and their descriptions. However, the evaluation of T2I models has focused on photorealism and limited qualitative measures of visual understanding. To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a.k.a. personalized T2I), we introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts, and 33K composite text prompts. Along with the dataset, we propose an evaluation metric, Concept Confidence Deviation (CCD), that uses the confidence of oracle concept classifiers to measure the alignment between concepts generated by T2I generators and concepts contained in target images. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our human study shows that CCD is highly correlated with human understanding of concepts. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome. The data, code, and interactive demo is available at: https://conceptbed.github.io/
more » « less
Full Text Available
Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model

https://doi.org/10.18653/v1/2024.findings-emnlp.211

Cheng, Sheng; Patel, Maitreya; Yang, Yezhou (January 2024, Association for Computational Linguistics)

Full Text Available
Improving Diversity with Adversarially Learned Transformations for Domain Generalization

https://doi.org/10.1109/WACV56688.2023.00051

Gokhale, Tejas; Anirudh, Rushil; Thiagarajan, Jayaraman J.; Kailkhura, Bhavya; Baral, Chitta; Yang, Yezhou (January 2023, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV))

Full Text Available
CAVAN: Commonsense Knowledge Anchored Video Captioning

https://doi.org/10.1109/ICPR56361.2022.9956241

Shao, Huiliang; Fang, Zhiyuan; Yang, Yezhou (August 2022, 2022 26th International Conference on Pattern Recognition (ICPR))

Full Text Available
Injecting Semantic Concepts into End-to-End Image Captioning

https://doi.org/10.1109/CVPR52688.2022.01748

Fang, Zhiyuan; Wang, Jianfeng; Hu, Xiaowei; Liang, Lin; Gan, Zhe; Wang, Lijuan; Yang, Yezhou; Liu, Zicheng (June 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
Semantically Distributed Robust Optimization for Vision-and-Language Inference

https://doi.org/10.18653/v1/2022.findings-acl.118

Gokhale, Tejas; Chaudhary, Abhishek; Banerjee, Pratyay; Baral, Chitta; Yang, Yezhou (January 2022, ACL 2022 Findings)

Full Text Available
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering

https://doi.org/10.1109/ICCV48922.2021.00192

Banerjee, Pratyay; Gokhale, Tejas; Yang, Yezhou; Baral, Chitta (October 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV))

Full Text Available

« Prev Next »

Search for: All records