NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Customizing Text-to-Image Diffusion with Object Viewpoint Control

https://doi.org/10.1145/3680528.3687564

Kumari, Nupur; Su, Grace; Zhang, Richard; Park, Taesung; Shechtman, Eli; Zhu, Jun-Yan (December 2024, ACM)

Full Text Available
One-Step Diffusion with Distribution Matching Distillation

https://doi.org/10.1109/CVPR52733.2024.00632

Yin, Tianwei; Gharbi, Michaël; Zhang, Richard; Shechtman, Eli; Durand, Frédo; Freeman, William T; Park, Taesung (June 2024, IEEE)

Full Text Available
Ablating Concepts in Text-to-Image Diffusion Models

https://doi.org/10.1109/ICCV51070.2023.02074

Kumari, Nupur; Zhang, Bingliang; Wang, Sheng-Yu; Shechtman, Eli; Zhang, Richard; Zhu, Jun-Yan (October 2023, IEEE)
Contrastive Learning for Diverse Disentangled Foreground Generation

https://doi.org/10.1007/978-3-031-19787-1_19

Li, Yuheng; Li, Yijun; Lu, Jingwan; Shechtman, Eli; Lee, Yong Jae; Singh, Krishna Kumar (January 2022, European Conference on Computer Vision (ECCV))

Full Text Available
Modulated Periodic Activations for Generalizable Local Functional Representations

https://doi.org/10.1109/ICCV48922.2021.01395

Mehta, Ishit; Gharbi, Michael; Barnes, Connelly; Shechtman, Eli; Ramamoorthi, Ravi; Chandraker, Manmohan (October 2021, IEEE/CVF International Conference on Computer Vision)

Full Text Available
Collaging Class-specific GANs for Semantic Image Synthesis

https://doi.org/10.1109/iccv48922.2021.01415

Li, Yuheng; Li, Yijun; Lu, Jingwan; Shechtman, Eli; Lee, Yong Jae; Singh, Krishna Kumar (October 2021, International Conference on Computer Vision (ICCV))

We propose a new approach for high resolution semantic image synthesis. It consists of one base image generator and multiple class-specific generators. The base generator generates high quality images based on a segmentation map. To further improve the quality of different objects, we create a bank of Generative Adversarial Networks (GANs) by separately training class-specific models. This has several benefits including – dedicated weights for each class; centrally aligned data for each model; additional training data from other sources, potential of higher resolution and quality; and easy manipulation of a specific object in the scene. Experiments show that our approach can generate high quality images in high resolution while having flexibility of object-level control by using class-specific generators. Project page: https://yuheng-li.github.io/CollageGAN/
more » « less
Full Text Available
Few-shot Image Generation via Cross-domain Correspondence

https://doi.org/10.1109/cvpr46437.2021.01060

Ojha, Utkarsh; Li, Yijun; Lu, Jingwan; Efros, Alexei A.; Lee, Yong Jae; Shechtman, Eli; Zhang, Richard (June 2021, Conference on Computer Vision and Pattern Recognition (CVPR))

Training generative models, such as GANs, on a target domain containing limited examples (e.g., 10) can easily result in overfitting. In this work, we seek to utilize a large source domain for pretraining and transfer the diversity information from source to target. We propose to preserve the relative similarities and differences between instances in the source via a novel cross-domain distance consistency loss. To further reduce overfitting, we present an anchor-based strategy to encourage different levels of realism over different regions in the latent space. With extensive results in both photorealistic and non-photorealistic domains, we demonstrate qualitatively and quantitatively that our few-shot model automatically discovers correspondences between source and target domains and generates more diverse and realistic images than previous methods.
more » « less
Full Text Available
MakeltTalk: speaker-aware talking-head animation

https://doi.org/10.1145/3414685.3417774

Zhou, Yang; Han, Xintong; Shechtman, Eli; Echevarria, Jose; Kalogerakis, Evangelos; Li, Dingzeyu (November 2020, ACM Transactions on Graphics)

Full Text Available
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Zhang, Richard; Isola, Phillip; Efros, Alexei; Shechtman, Eli; Wang, Oliver (January 2018, CVPR)

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.
more » « less
Full Text Available
Toward Multimodal Image-to-Image Translation

Zhu, Jun-Yan; Zhang, Richard; Pathak, Deepak; Darrell, Trevor; Efros, Alexei; Wang, Oliver; Shechtman, Eli (January 2017, Advances in neural information processing systems)

Full Text Available

Search for: All records