An unsupervised image-to-image translation (UI2I) task deals with learning a mapping between two domains without paired images. While existing UI2I methods usually require numerous unpaired images from different domains for training, there are many scenarios where training data is quite limited. In this paper, we argue that even if each domain contains a single image, UI2I can still be achieved. To this end, we propose TuiGAN, a generative model that is trained on only two unpaired images and amounts to one-shot unsupervised learning. With TuiGAN, an image is translated in a coarse-to-fine manner where the generated image is gradually refined from global structures to local details. We conduct extensive experiments to verify that our versatile method can outperform strong baselines on a wide variety of UI2I tasks. Moreover, TuiGAN is capable of achieving comparable performance with the state-of-the-art UI2I models trained with sufficient data.
more »
« less
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain $$X$$ to a target domain $$Y$$ in the absence of paired examples. Our goal is to learn a mapping $$G: X \rightarrow Y$$ such that the distribution of images from $G(X)$ is indistinguishable from the distribution $$Y$$ using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping $$F: Y \rightarrow X$$ and introduce a {\em cycle consistency loss} to push $$F(G(X)) \approx X$$ (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
more »
« less
- PAR ID:
- 10040280
- Date Published:
- Journal Name:
- IEEE International Conference on Computer Vision
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper introduces a novel generative encoder (GE) framework for generative imaging and image processing tasks like image reconstruction, compression, denoising, inpainting, deblurring, and super-resolution. GE unifies the generative capacity of GANs and the stability of AEs in an optimization framework instead of stacking GANs and AEs into a single network or combining their loss functions as in existing literature. GE provides a novel approach to visualizing relationships between latent spaces and the data space. The GE framework is made up of a pre-training phase and a solving phase. In the former, a GAN with generator \begin{document}$ G $$\end{document} capturing the data distribution of a given image set, and an AE network with encoder \begin{document}$$ E $$\end{document} that compresses images following the estimated distribution by \begin{document}$$ G $$\end{document} are trained separately, resulting in two latent representations of the data, denoted as the generative and encoding latent space respectively. In the solving phase, given noisy image \begin{document}$$ x = \mathcal{P}(x^*) $$\end{document}, where \begin{document}$$ x^* $$\end{document} is the target unknown image, \begin{document}$$ \mathcal{P} $$\end{document} is an operator adding an addictive, or multiplicative, or convolutional noise, or equivalently given such an image \begin{document}$$ x $$\end{document} in the compressed domain, i.e., given \begin{document}$$ m = E(x) $$\end{document}, the two latent spaces are unified via solving the optimization problem \begin{document}$$ z^* = \underset{z}{\mathrm{argmin}} \|E(G(z))-m\|_2^2+\lambda\|z\|_2^2 $$\end{document} and the image \begin{document}$$ x^* $$\end{document} is recovered in a generative way via \begin{document}$$ \hat{x}: = G(z^*)\approx x^* $$\end{document}, where \begin{document}$$ \lambda>0 $$\end{document}$ is a hyperparameter. The unification of the two spaces allows improved performance against corresponding GAN and AE networks while visualizing interesting properties in each latent space.more » « less
-
Unpaired image-to-image translation (I2I) is an ill-posed problem, as an infinite number of translation functions can map the source domain distribution to the target distribution. Therefore, much effort has been put into designing suitable constraints, e.g., cycle consistency (CycleGAN), geometry consistency (GCGAN), and contrastive learning-based constraints (CUTGAN), that help better pose the problem. However, these well-known constraints have limitations: (1) they are either too restrictive or too weak for specific I2I tasks; (2) these methods result in content distortion when there is a significant spatial variation between the source and target domains. This paper proposes a universal regularization technique called maximum spatial perturbation consistency (MSPC), which enforces a spatial perturbation function (T) and the translation operator (G) to be commutative (i.e., T \circ G = G \circ T ). In addition, we introduce two adversarial training components for learning the spatial perturbation function. The first one lets T compete with G to achieve maximum perturbation. The second one lets G and T compete with discriminators to align the spatial variations caused by the change of object size, object distortion, background interruptions, etc. Our method outperforms the state-of-the-art methods on most I2I benchmarks. We also introduce a new benchmark, namely the front face to profile face dataset, to emphasize the underlying challenges of I2I for real-world applications. We finally perform ablation experiments to study the sensitivity of our method to the severity of spatial perturbation and its effectiveness for distribution alignment.more » « less
-
A function f∶{0,1}n→ {0,1} is called an approximate AND-homomorphism if choosing x,y∈n uniformly at random, we have that f(x∧ y) = f(x)∧ f(y) with probability at least 1−ε, where x∧ y = (x1∧ y1,…,xn∧ yn). We prove that if f∶ {0,1}n → {0,1} is an approximate AND-homomorphism, then f is δ-close to either a constant function or an AND function, where δ(ε) → 0 as ε→ 0. This improves on a result of Nehama, who proved a similar statement in which δ depends on n. Our theorem implies a strong result on judgement aggregation in computational social choice. In the language of social choice, our result shows that if f is ε-close to satisfying judgement aggregation, then it is δ(ε)-close to an oligarchy (the name for the AND function in social choice theory). This improves on Nehama’s result, in which δ decays polynomially with n. Our result follows from a more general one, in which we characterize approximate solutions to the eigenvalue equation f = λ g, where is the downwards noise operator f(x) = y[f(x ∧ y)], f is [0,1]-valued, and g is {0,1}-valued. We identify all exact solutions to this equation, and show that any approximate solution in which f and λ g are close is close to an exact solution.more » « less
-
Abstract Let $$V_*\otimes V\rightarrow {\mathbb {C}}$$ V ∗ ⊗ V → C be a non-degenerate pairing of countable-dimensional complex vector spaces V and $$V_*$$ V ∗ . The Mackey Lie algebra $${\mathfrak {g}}=\mathfrak {gl}^M(V,V_*)$$ g = gl M ( V , V ∗ ) corresponding to this pairing consists of all endomorphisms $$\varphi $$ φ of V for which the space $$V_*$$ V ∗ is stable under the dual endomorphism $$\varphi ^*: V^*\rightarrow V^*$$ φ ∗ : V ∗ → V ∗ . We study the tensor Grothendieck category $${\mathbb {T}}$$ T generated by the $${\mathfrak {g}}$$ g -modules V , $$V_*$$ V ∗ and their algebraic duals $$V^*$$ V ∗ and $$V^*_*$$ V ∗ ∗ . The category $${{\mathbb {T}}}$$ T is an analogue of categories considered in prior literature, the main difference being that the trivial module $${\mathbb {C}}$$ C is no longer injective in $${\mathbb {T}}$$ T . We describe the injective hull I of $${\mathbb {C}}$$ C in $${\mathbb {T}}$$ T , and show that the category $${\mathbb {T}}$$ T is Koszul. In addition, we prove that I is endowed with a natural structure of commutative algebra. We then define another category $$_I{\mathbb {T}}$$ I T of objects in $${\mathbb {T}}$$ T which are free as I -modules. Our main result is that the category $${}_I{\mathbb {T}}$$ I T is also Koszul, and moreover that $${}_I{\mathbb {T}}$$ I T is universal among abelian $${\mathbb {C}}$$ C -linear tensor categories generated by two objects X , Y with fixed subobjects $$X'\hookrightarrow X$$ X ′ ↪ X , $$Y'\hookrightarrow Y$$ Y ′ ↪ Y and a pairing $$X\otimes Y\rightarrow {\mathbf{1 }}$$ X ⊗ Y → 1 where 1 is the monoidal unit. We conclude the paper by discussing the orthogonal and symplectic analogues of the categories $${\mathbb {T}}$$ T and $${}_I{\mathbb {T}}$$ I T .more » « less
An official website of the United States government

