A style mapper applies some fixed style to its input images (so, for example, taking faces to cartoons). This paper describes a simple procedure – JoJoGAN – to learn a style mapper from a single example of the style. JoJoGAN uses a GAN inversion procedure and StyleGAN’s style-mixing property to produce a substantial paired dataset from a single example style. The paired dataset is then used to fine-tune a StyleGAN. An image can then be style mapped by GAN-inversion followed by the fine-tuned StyleGAN. JoJoGAN needs just one reference and as little as 30 s of training time. JoJoGAN can use extreme style references (say, animal faces) successfully. Furthermore, one can control what aspects of the style are used and how much of the style is applied. Qualitative and quantitative evaluation show that JoJoGAN produces high quality high resolution images that vastly outperform the current state-of-the-art.
more »
« less
StyleGAN knows Normal, Depth, Albedo, and More
Intrinsic images, in the original sense, are image-like maps of scene properties like depth, normal, albedo, or shading. This paper demonstrates that StyleGAN can easily be induced to produce intrinsic images. The procedure is straightforward. We show that if StyleGAN produces G ( w ) from latent w , then for each type of intrinsic image, there is a fixed offset d c so that G ( w + d c ) is that type of intrinsic image for G ( w ) . Here d c is {\em independent of w }. The StyleGAN we used was pretrained by others, so this property is not some accident of our training regime. We show that there are image transformations StyleGAN will {\em not} produce in this fashion, so StyleGAN is not a generic image regression engine. It is conceptually exciting that an image generator should know'' and represent intrinsic images. There may also be practical advantages to using a generative model to produce intrinsic images. The intrinsic images obtained from StyleGAN compare well both qualitatively and quantitatively with those obtained by using SOTA image regression techniques; but StyleGAN's intrinsic images are robust to relighting effects, unlike SOTA methods.
more »
« less
- Award ID(s):
- 2106825
- PAR ID:
- 10535593
- Publisher / Repository:
- Neurips Foundation
- Date Published:
- Journal Name:
- Advances in neural information processing systems
- ISSN:
- 1049-5258
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Deep learning models have demonstrated significant advantages over traditional algorithms in image processing tasks like object detection. However, a large amount of data are needed to train such deep networks, which limits their application to tasks such as biometric recognition that require more training samples for each class (i.e., each individual). Researchers developing such complex systems rely on real biometric data, which raises privacy concerns and is restricted by the availability of extensive, varied datasets. This paper proposes a generative adversarial network (GAN)-based solution to produce training data (palm images) for improved biometric (palmprint-based) recognition systems. We investigate the performance of the most recent StyleGAN models in generating a thorough contactless palm image dataset for application in biometric research. Training on publicly available H-PolyU and IIDT palmprint databases, a total of 4839 images were generated using StyleGAN models. SIFT (Scale-Invariant Feature Transform) was used to find uniqueness and features at different sizes and angles, which showed a similarity score of 16.12% with the most recent StyleGAN3-based model. For the regions of interest (ROIs) in both the palm and finger, the average similarity scores were 17.85%. We present the Frechet Inception Distance (FID) of the proposed model, which achieved a 16.1 score, demonstrating significant performance. These results demonstrated StyleGAN as effective in producing unique synthetic biometric images.more » « less
-
We develop discrete $$W^2_p$$-norm error estimates for the Oliker--Prussner method applied to the Monge--Ampère equation. This is obtained by extending discrete Alexandroff estimates and showing that the contact set of a nodal function contains information on its second-order difference. In addition, we show that the size of the complement of the contact set is controlled by the consistency of the method. Combining both observations, we show that the error estimate $$\| u - u_h \|_{W^2_{f,p}} (\mathcal{N}^I_h)$$ converges in order $$O(h^{1/p})$$ if $p > d$ and converges in order $$O(h^{1/d} \ln (\frac 1 h)^{1/d})$$ if $$p \leq d$$, where $$\|\cdot\|_{W^2_{f,p}(\mathcal{N}^I_h)}$$ is a weighted $$W^2_p$$-type norm, and the constant $C>0$ depends on $$\|{u}\|_{C^{3,1}(\bar\Omega)}$$, the dimension $$d$$, and the constant $$p$$. Numerical examples are given in two space dimensions and confirm that the estimate is sharp in several cases.more » « less
-
Despite remarkable recent progress on both unconditional and conditional image synthesis, it remains a long-standing problem to learn generative models that are capable of synthesizing realistic and sharp images from reconfigurable spatial layout (i.e., bounding boxes + class labels in an image lattice) and style (i.e., structural and appearance variations encoded by latent vectors), especially at high resolution. By reconfigurable, it means that a model can preserve the intrinsic one-to-many mapping from a given layout to multiple plausible images with different styles, and is adaptive with respect to perturbations of a layout and style latent code. In this paper, we present a layout- and style-based architecture for generative adversarial networks (termed LostGANs) that can be trained end-to-end to generate images from reconfigurable layout and style. Inspired by the vanilla StyleGAN, the proposed LostGAN consists of two new components: (i) learning fine-grained mask maps in a weakly-supervised manner to bridge the gap between layouts and images, and (ii) learning object instance-specific layout-aware feature normalization (ISLA-Norm) in the generator to realize multi-object style generation. In experiments, the proposed method is tested on the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art performance obtained. The code and pretrained models are available at https://github.com/iVMCL/LostGANsmore » « less
-
Linear representation learning is widely studied due to its conceptual simplicity and empirical utility in tasks such as compression, classification, and feature extraction. Given a set of points $$[\x_1, \x_2, \ldots, \x_n] = \X \in \R^{d \times n}$$ and a vector $$\y \in \R^d$$, the goal is to find coefficients $$\w \in \R^n$$ so that $$\X \w \approx \y$$, subject to some desired structure on $$\w$$. In this work we seek $$\w$$ that forms a local reconstruction of $$\y$$ by solving a regularized least squares regression problem. We obtain local solutions through a locality function that promotes the use of columns of $$\X$$ that are close to $$\y$$ when used as a regularization term. We prove that, for all levels of regularization and under a mild condition that the columns of $$\X$$ have a unique Delaunay triangulation, the optimal coefficients' number of non-zero entries is upper bounded by $d+1$, thereby providing local sparse solutions when $$d \ll n$$. Under the same condition we also show that for any $$\y$$ contained in the convex hull of $$\X$$ there exists a regime of regularization parameter such that the optimal coefficients are supported on the vertices of the Delaunay simplex containing $$\y$$. This provides an interpretation of the sparsity as having structure obtained implicitly from the Delaunay triangulation of $$\X$$. We demonstrate that our locality regularized problem can be solved in comparable time to other methods that identify the containing Delaunay simplex.more » « less
An official website of the United States government

