skip to main content

Title: TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images
An unsupervised image-to-image translation (UI2I) task deals with learning a mapping between two domains without paired images. While existing UI2I methods usually require numerous unpaired images from different domains for training, there are many scenarios where training data is quite limited. In this paper, we argue that even if each domain contains a single image, UI2I can still be achieved. To this end, we propose TuiGAN, a generative model that is trained on only two unpaired images and amounts to one-shot unsupervised learning. With TuiGAN, an image is translated in a coarse-to-fine manner where the generated image is gradually refined from global structures to local details. We conduct extensive experiments to verify that our versatile method can outperform strong baselines on a wide variety of UI2I tasks. Moreover, TuiGAN is capable of achieving comparable performance with the state-of-the-art UI2I models trained with sufficient data.
; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
European Conference on Computer Vision
Sponsoring Org:
National Science Foundation
More Like this
  1. Data fusion techniques have gained special interest in remote sensing due to the available capabilities to obtain measurements from the same scene using different instruments with varied resolution domains. In particular, multispectral (MS) and hyperspectral (HS) imaging fusion is used to generate high spatial and spectral images (HSEI). Deep learning data fusion models based on Long Short Term Memory (LSTM) and Convolutional Neural Networks (CNN) have been developed to achieve such task.In this work, we present a Multi-Level Propagation Learning Network (MLPLN) based on a LSTM model but that can be trained with variable data sizes in order achieve themore »fusion process. Moreover, the MLPLN provides an intrinsic data augmentation feature that reduces the required number of training samples. The proposed model generates a HSEI by fusing a high-spatial resolution MS image and a low spatial resolution HS image. The performance of the model is studied and compared to existing CNN and LSTM approaches by evaluating the quality of the fused image using the structural similarity metric (SSIM). The results show that an increase in the SSIM is still obtained while reducing of the number of training samples to train the MLPLN model.« less
  2. Unsupervised domain adaptation for semantic segmentation has been intensively studied due to the low cost of the pixel-level annotation for synthetic data. The most common approaches try to generate images or features mimicking the distribution in the target domain while preserving the semantic contents in the source domain so that a model can be trained with annotations from the latter. However, such methods highly rely on an image translator or feature extractor trained in an elaborated mechanism including adversarial training, which brings in extra complexity and instability in the adaptation process. Furthermore, these methods mainly focus on taking advantage ofmore »the labeled source dataset, leaving the unlabeled target dataset not fully utilized. In this paper, we propose a bidirectional style-induced domain adaptation method, called BiSIDA, that employs consistency regularization to efficiently exploit information from the unlabeled target domain dataset, requiring only a simple neural style transfer model. BiSIDA aligns domains by not only transferring source images into the style of target images but also transferring target images into the style of source images to perform high-dimensional perturbation on the unlabeled target images, which is crucial to the success in applying consistency regularization in segmentation tasks. Extensive experiments show that our BiSIDA achieves new state-of-the-art on two commonly-used synthetic-to-real domain adaptation benchmarks: GTA5-to-CityScapes and SYNTHIA-to-CityScapes. Code and pretrained style transfer model are available at:« less
  3. This paper studies the unsupervised cross-domain translation problem by proposing a generative framework, in which the probability distribution of each domain is represented by a generative cooperative network that consists of an energy based model and a latent variable model. The use of generative cooperative network enables maximum likelihood learning of the domain model by MCMC teaching, where the energy-based model seeks to fit the data distribution of domain and distills its knowledge to the latent variable model via MCMC. Specifically, in the MCMC teaching process, the latent variable model parameterized by an encoder-decoder maps examples from the source domainmore »to the target domain, while the energy-based model further refines the mapped results by Langevin revision such that the revised results match to the examples in the target domain in terms of the statistical properties, which are defined by the learned energy function. For the purpose of building up a correspondence between two unpaired domains, the proposed framework simultaneously learns a pair of cooperative networks with cycle consistency, accounting for a two-way translation between two domains, by alternating MCMC teaching. Experiments show that the proposed framework is useful for unsupervised image-to-image translation and unpaired image sequence translation.« less
  4. Image synthesis from corrupted contrasts increases the diver- sity of diagnostic information available for many neurological diseases. Recently the image-to-image translation has experienced signi cant lev- els of interest within medical research, beginning with the successful use of the Generative Adversarial Network (GAN) to the introduction of cyclic constraint extended to multiple domains. However, in current ap- proaches, there is no guarantee that the mapping between the two image domains would be unique or one-to-one. In this paper, we introduce a novel approach to unpaired image-to-image translation based on the invertible architecture. The invertible property of the ow-based architecture assuresmore »a cycle-consistency of image-to-image translation without additional loss functions. We utilize the temporal informa- tion between consecutive slices to provide more constraints to the optimization for transforming one domain to another in un- paired volumetric medical images. To capture temporal structures in the medical images, we explore the displacement between the consec- utive slices using a deformation eld. In our approach, the deformation eld is used as a guidance to keep the translated slides realistic and con- sistent across the translation. The experimental results have shown that the synthesized images using our proposed approach are able to archive a competitive performance in terms of mean squared error, peak signal- to-noise ratio, and structural similarity index when compared with the existing deep learning-based methods on three standard datasets, i.e. HCP, MRBrainS13 and Brats2019.« less
  5. High-throughput phenotyping enables the efficient collection of plant trait data at scale. One example involves using imaging systems over key phases of a crop growing season. Although the resulting images provide rich data for statistical analyses of plant phenotypes, image processing for trait extraction is required as a prerequisite. Current methods for trait extraction are mainly based on supervised learning with human labeled data or semisupervised learning with a mixture of human labeled data and unsupervised data. Unfortunately, preparing a sufficiently large training data is both time and labor-intensive. We describe a self-supervised pipeline (KAT4IA) that uses K -means clusteringmore »on greenhouse images to construct training data for extracting and analyzing plant traits from an image-based field phenotyping system. The KAT4IA pipeline includes these main steps: self-supervised training set construction, plant segmentation from images of field-grown plants, automatic separation of target plants, calculation of plant traits, and functional curve fitting of the extracted traits. To deal with the challenge of separating target plants from noisy backgrounds in field images, we describe a novel approach using row-cuts and column-cuts on images segmented by transform domain neural network learning, which utilizes plant pixels identified from greenhouse images to train a segmentation model for field images. This approach is efficient and does not require human intervention. Our results show that KAT4IA is able to accurately extract plant pixels and estimate plant heights.« less