NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Aligned datasets improve detection of latent diffusion-generated images

Rajan, Anirudh Sundara; Ojha, Utkarsh; Schloesser, Jedidiah; Lee, Yong Jae (April 2025, ICLR)

Free, publicly-accessible full text available April 24, 2026
An Investigation on LLMs' Visual Understanding Ability Using SVG for Image-Text Bridging

https://doi.org/10.1109/WACV61041.2025.00525

Cai, Mu; Huang, Zeyi; Li, Yuheng; Ojha, Utkarsh; Wang, Haohan; Lee, Yong Jae (February 2025, IEEE)

Free, publicly-accessible full text available February 26, 2026
Yo’LLaVA: Your Personalized Language and Vision Assistant

Nguyen, Thao; Liu, Haotian; Li, Yuheng; Cai, Mu; Ojha, Utkarsh; Lee, Yong Jae (December 2024, NeurIPS)

Full Text Available
Edit One for All: Interactive Batch Image Editing

Nguyen, Thao; Ojha, Utkarsh; Li, Yuheng; Liu, Haotian; Lee, Yong Jae (June 2024, IEEE Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
Dissecting Knowledge Distillation: An Exploration of its Inner Workings and Applications

Ojha, Utkarsh; Li, Yuheng; Rajan, Anirudh S; Liang, Yingyu; Lee, Yong Jae (December 2023, NeurIPS 2023)

Full Text Available
Towards Universal Fake Image Detectors that Generalize Across Generative Models

https://doi.org/10.1109/CVPR52729.2023.02345

Ojha, Utkarsh; Li, Yuheng; Lee, Yong Jae (June 2023, IEEE)

Full Text Available
Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains

Ojha, Utkarsh; Singh, Krishna Kumar; Lee, Yong Jae (January 2021, International Conference on Learning Representations (ICLR))

We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e.g., dogs and cars). The goal is to learn a generative model that learns an intermediate distribution, which borrows a subset of properties from each domain, enabling the generation of images that did not exist in any domain exclusively. This challenging problem requires an accurate disentanglement of object shape, appearance, and background from each domain, so that the appearance and shape factors from the two domains can be interchanged. We augment an existing approach that can disentangle factors within a single domain but struggles to do so across domains. Our key technical contribution is to represent object appearance with a differentiable histogram of visual features, and to optimize the generator so that two images with the same latent appearance factor but different latent shape factors produce similar histograms. On multiple multi-domain datasets, we demonstrate our method leads to accurate and consistent appearance and shape transfer across domains.
more » « less
Full Text Available
Few-shot Image Generation via Cross-domain Correspondence

https://doi.org/10.1109/cvpr46437.2021.01060

Ojha, Utkarsh; Li, Yijun; Lu, Jingwan; Efros, Alexei A.; Lee, Yong Jae; Shechtman, Eli; Zhang, Richard (June 2021, Conference on Computer Vision and Pattern Recognition (CVPR))

Training generative models, such as GANs, on a target domain containing limited examples (e.g., 10) can easily result in overfitting. In this work, we seek to utilize a large source domain for pretraining and transfer the diversity information from source to target. We propose to preserve the relative similarities and differences between instances in the source via a novel cross-domain distance consistency loss. To further reduce overfitting, we present an anchor-based strategy to encourage different levels of realism over different regions in the latent space. With extensive results in both photorealistic and non-photorealistic domains, we demonstrate qualitatively and quantitatively that our few-shot model automatically discovers correspondences between source and target domains and generates more diverse and realistic images than previous methods.
more » « less
Full Text Available
MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

https://doi.org/10.1109/CVPR42600.2020.00806

Li, Yuheng; Singh, Krishna Kumar; Ojha, Utkarsh; Lee, Yong Jae (June 2020, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation. We build upon FineGAN, an unconditional generative model, to learn the desired disentanglement and image generator, and leverage adversarial joint image-code distribution matching to learn the latent factor encoders. MixNMatch requires bounding boxes during training to model background, but requires no other supervision. Through extensive experiments, we demonstrate MixNMatch's ability to accurately disentangle, encode, and combine multiple factors for mix-and-match image generation, including sketch2color, cartoon2img, and img2gif applications. Our code/models/demo can be found at https://github.com/Yuheng-Li/MixNMatch
more » « less
Full Text Available
Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data

Ojha, Utkarsh; Singh, Krishna Kumar; Hsieh, Cho-Jui; Lee, Yong Jae (January 2020, Advances in neural information processing systems)

We propose a novel unsupervised generative model that learns to disentangle object identity from other low-level aspects in class-imbalanced data. We first investigate the issues surrounding the assumptions about uniformity made by InfoGAN [10], and demonstrate its ineffectiveness to properly disentangle object identity in imbalanced data. Our key idea is to make the discovery of the discrete latent factor of variation invariant to identity-preserving transformations in real images, and use that as a signal to learn the appropriate latent distribution representing object identity. Experiments on both artificial (MNIST, 3D cars, 3D chairs, ShapeNet) and real-world (YouTube-Faces) imbalanced datasets demonstrate the effectiveness of our method in disentangling object identity as a latent factor of variation.
more » « less
Full Text Available

« Prev Next »

Search for: All records