skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Image Synthesis From Reconfigurable Layout and Style
Despite remarkable recent progress on both unconditional and conditional image synthesis, it remains a long-standing problem to learn generative models that are capable of synthesizing realistic and sharp images from reconfigurable spatial layout (i.e., bounding boxes + class labels in an image lattice) and style (i.e., structural and appearance variations encoded by latent vectors), especially at high resolution. By reconfigurable, it means that a model can preserve the intrinsic one-to-many mapping from a given layout to multiple plausible images with different styles, and is adaptive with respect to perturbations of a layout and style latent code. In this paper, we present a layout- and style-based architecture for generative adversarial networks (termed LostGANs) that can be trained end-to-end to generate images from reconfigurable layout and style. Inspired by the vanilla StyleGAN, the proposed LostGAN consists of two new components: (i) learning fine-grained mask maps in a weakly-supervised manner to bridge the gap between layouts and images, and (ii) learning object instance-specific layout-aware feature normalization (ISLA-Norm) in the generator to realize multi-object style generation. In experiments, the proposed method is tested on the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art performance obtained. The code and pretrained models are available at https://github.com/iVMCL/LostGANs  more » « less
Award ID(s):
1909644
PAR ID:
10122812
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE International Conference on Computer Vision
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Generative models have recently gained increasing attention in image generation and editing tasks. However, they often lack a direct connection to object geometry, which is crucial in sensitive domains such as computational anatomy, biology, and robotics. This paper presents a novel framework for Image Generation informed by Geodesic dynamics (IGG) in deformation spaces. Our IGG model comprises two key components: (i) an efficient autoencoder that explicitly learns the geodesic path of image transformations in the latent space; and (ii) a latent geodesic diffusion model that captures the distribution of latent representations of geodesic deformations conditioned on text instructions. By leveraging geodesic paths, our method ensures smooth, topology-preserving, and interpretable deformations, capturing complex variations in image structures while maintaining geometric consistency. We validate the proposed IGG on plant growth data and brain magnetic resonance imaging (MRI). Experimental results show that IGG outperforms the state-of-the-art image generation/editing models with superior performance in generating realistic, high-quality images with preserved object topology and reduced artifacts. Our code is publicly available at https://github.com/nellie689/IGG. 
    more » « less
  2. Abstract Object tracking in microscopy videos is crucial for understanding biological processes. While existing methods often require fine-tuning tracking algorithms to fit the image dataset, here we explored an alternative paradigm: augmenting the image time-lapse dataset to fit the tracking algorithm. To test this approach, we evaluated whether generative video frame interpolation can augment the temporal resolution of time-lapse microscopy and facilitate object tracking in multiple biological contexts. We systematically compared the capacity of Latent Diffusion Model for Video Frame Interpolation (LDMVFI), Real-time Intermediate Flow Estimation (RIFE), Compression-Driven Frame Interpolation (CDFI), and Frame Interpolation for Large Motion (FILM) to generate synthetic microscopy images derived from interpolating real images. Our testing image time series ranged from fluorescently labeled nuclei to bacteria, yeast, cancer cells, and organoids. We showed that the off-the-shelf frame interpolation algorithms produced bio-realistic image interpolation even without dataset-specific retraining, as judged by high structural image similarity and the capacity to produce segmentations that closely resemble results from real images. Using a simple tracking algorithm based on mask overlap, we confirmed that frame interpolation significantly improved tracking across several datasets without requiring extensive parameter tuning and capturing complex trajectories that were difficult to resolve in the original image time series. Taken together, our findings highlight the potential of generative frame interpolation to improve tracking in time-lapse microscopy across diverse scenarios, suggesting that a generalist tracking algorithm for microscopy could be developed by combining deep learning segmentation models with generative frame interpolation. 
    more » « less
  3. We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation. We build upon FineGAN, an unconditional generative model, to learn the desired disentanglement and image generator, and leverage adversarial joint image-code distribution matching to learn the latent factor encoders. MixNMatch requires bounding boxes during training to model background, but requires no other supervision. Through extensive experiments, we demonstrate MixNMatch's ability to accurately disentangle, encode, and combine multiple factors for mix-and-match image generation, including sketch2color, cartoon2img, and img2gif applications. Our code/models/demo can be found at https://github.com/Yuheng-Li/MixNMatch 
    more » « less
  4. This paper introduces a novel generative encoder (GE) framework for generative imaging and image processing tasks like image reconstruction, compression, denoising, inpainting, deblurring, and super-resolution. GE unifies the generative capacity of GANs and the stability of AEs in an optimization framework instead of stacking GANs and AEs into a single network or combining their loss functions as in existing literature. GE provides a novel approach to visualizing relationships between latent spaces and the data space. The GE framework is made up of a pre-training phase and a solving phase. In the former, a GAN with generator \begin{document}$ G $$\end{document} capturing the data distribution of a given image set, and an AE network with encoder \begin{document}$$ E $$\end{document} that compresses images following the estimated distribution by \begin{document}$$ G $$\end{document} are trained separately, resulting in two latent representations of the data, denoted as the generative and encoding latent space respectively. In the solving phase, given noisy image \begin{document}$$ x = \mathcal{P}(x^*) $$\end{document}, where \begin{document}$$ x^* $$\end{document} is the target unknown image, \begin{document}$$ \mathcal{P} $$\end{document} is an operator adding an addictive, or multiplicative, or convolutional noise, or equivalently given such an image \begin{document}$$ x $$\end{document} in the compressed domain, i.e., given \begin{document}$$ m = E(x) $$\end{document}, the two latent spaces are unified via solving the optimization problem \begin{document}$$ z^* = \underset{z}{\mathrm{argmin}} \|E(G(z))-m\|_2^2+\lambda\|z\|_2^2 $$\end{document} and the image \begin{document}$$ x^* $$\end{document} is recovered in a generative way via \begin{document}$$ \hat{x}: = G(z^*)\approx x^* $$\end{document}, where \begin{document}$$ \lambda>0 $$\end{document}$ is a hyperparameter. The unification of the two spaces allows improved performance against corresponding GAN and AE networks while visualizing interesting properties in each latent space. 
    more » « less
  5. null (Ed.)
    Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms. However, it often degrades the model performance on normal images and the defense does not generalize well to novel attacks. Given the success of deep generative models such as GANs and VAEs in characterizing the underlying manifold of images, we investigate whether or not the aforementioned problems can be remedied by exploiting the underlying manifold information. To this end, we construct an "On-Manifold ImageNet" (OM-ImageNet) dataset by projecting the ImageNet samples onto the manifold learned by StyleGSN. For this dataset, the underlying manifold information is exact. Using OM-ImageNet, we first show that adversarial training in the latent space of images improves both standard accuracy and robustness to on-manifold attacks. However, since no out-of-manifold perturbations are realized, the defense can be broken by Lp adversarial attacks. We further propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model. Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks. In addition, we observe that models defended by DMAT achieve improved robustness against novel attacks which manipulate images by global color shifts or various types of image filtering. Interestingly, similar improvements are also achieved when the defended models are tested on out-of-manifold natural images. These results demonstrate the potential benefits of using manifold information in enhancing robustness of deep learning models against various types of novel adversarial attacks. 
    more » « less