<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Sparse Feature Representation Learning for Deep Face Gender Transfer</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>10/01/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10357947</idno>
					<idno type="doi">10.1109/ICCVW54120.2021.00454</idno>
					<title level='j'>Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Xudong Liu</author><author>Ruizhe Wang</author><author>Hao Peng</author><author>Minglei Yin</author><author>Chih-Fan Chen</author><author>Xin Li</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Figure 1: Real or fake? We have applied the proposed face gender transfer technique to both male and female celebrities. Each row contains five pairs of source(real) and target(fake) images. (Answer: top -female to male transfer, the left image of each pair is real; bottom -male to female transfer, the right image of each pair is real).]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Human faces arguably represent the most important class of stimulus in social interaction. Any normal adult can quickly make approximated judgments about the gender, age, and race of a person even though the face might be unfamiliar <ref type="bibr">[29]</ref>. For people with certain pathological conditions (e.g., autism), they might have difficulty with extracting face-related information <ref type="bibr">[32]</ref>. Face perception is a problem of fundamental importance not only to computer vision but also psychology and neuronscience. Computational studies of face images have attracted increasingly more attention in recent years especially due to rapid advances in deep learning -from facial landmark detection <ref type="bibr">[56]</ref> and face recognition (e.g., DeepFace <ref type="bibr">[37]</ref>) to face superresolution <ref type="bibr">[3]</ref> and face synthesis (e.g., StyleGAN <ref type="bibr">[17]</ref>). Most recently, there are a flurry of works on face-related novel applications such as beautification <ref type="bibr">[23]</ref> and makeup editing <ref type="bibr">[4]</ref>, facial gesture synthesis <ref type="bibr">[6]</ref>, face aging studies <ref type="bibr">[48,</ref><ref type="bibr">45]</ref> and face gender classification (e.g., <ref type="bibr">[21,</ref><ref type="bibr">30]</ref>).</p><p>Among them face-related synthesis is particularly interesting thanks to the advanced GANs (Generative Adversarial Networks) <ref type="bibr">[10]</ref> and has a wide range of applica-tions in graphics, human computer interaction and social computing. Novel network architectures such as Cycleconsistent Generative Adversarial Networks (CycleGAN) <ref type="bibr">[57]</ref> and Multimodal Unsupervised Image-to-Image Translation (MUNIT) <ref type="bibr">[14]</ref> have been widely studied for style transfer or translation. However, none of the existing unsupervised learning approaches are capable of delivering satisfactory synthesis results for face gender transfer. How to separate gender (style) from identity (content) for face images turns out to be nontrivial and has been under-explored in the open literature (content-style separation problem has only been studied for textual images recently in <ref type="bibr">[55]</ref>). In this paper, we propose to cast face gender transfer as a special class of style transfer problems with the additional constraint of identity preservation. With the help of the proposed gender synthesis framework, we can address the problem of gender bias present in many facial image datasets.</p><p>Inspired by the Learned Perceptual Image Patch Similarity (LPIPS) <ref type="bibr">[54]</ref>, we propose to tackle the problem of face gender transfer in the space of deep feature representation rather than face images <ref type="bibr">[4]</ref> or latent representations <ref type="bibr">[14]</ref>. More specifically, we have adopted the lightCNN <ref type="bibr">[46]</ref> as the pre-trained network to transform any face image to a 256-dimensional (256D) deep feature representation. Based on the observation that perceptual similarity is an emergent property shared across deep visual representations, we introduce a novel probabilistic gender mask to softly separate the gender from the identity information in the feature space. Accordingly, we have designed a whole class of new loss functions that jointly achieves the objectives of gender transfer and identity preservation. It is worth mentioning that unlike face makeup editing <ref type="bibr">[4]</ref>, we target at learning a pair of symmetric nonlinear mapping functions between the space of male and female faces.</p><p>Our main contributions are summarized as follows:</p><p>&#8226; We have developed a novel approach in a light-CNN pre-trained 256 dimensional deep feature space for face gender transfer that preserves the identity information; we have also designed a class of new loss functions based on a probabilistic mask in (0, 1) 256 separating the gender from the identity information. Gender mask learning has led to the discovery of a collection of sparse features (&#8776; 20 out of 256) uniquely responsible for face gender perception;</p><p>&#8226; Our experimental results have demonstrated the superiority both on visual quality and representation interpretability to other competing methods including Cy-cleGAN <ref type="bibr">[57]</ref>, MUNIT <ref type="bibr">[14]</ref>, DRIT <ref type="bibr">[20]</ref> and StarGAN <ref type="bibr">[5]</ref>.</p><p>&#8226; We have also empirically verified the effectiveness of deep gender feature representation learning by demon-strating a high correlation between the learned sparse features and the gender information. Our results corroborate a hypothesis about the independence between face recognizability and gender classifiability <ref type="bibr">[29]</ref> in the literature of psychology.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Works</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Face Image Synthesis and Image-to-Image Translation</head><p>The capability of deep generative models has dramatically improved thanks to the invention of generative adversarial networks (GAN) <ref type="bibr">[10]</ref>. By concatenating a generator with a discriminator and training them by playing a zerosum game, GAN has achieved impressive results in various image synthesis tasks <ref type="bibr">[7,</ref><ref type="bibr">57,</ref><ref type="bibr">16,</ref><ref type="bibr">17,</ref><ref type="bibr">2,</ref><ref type="bibr">6,</ref><ref type="bibr">18]</ref>. In particular, virtual generation of face images has been studied for photo-sketch synthesis <ref type="bibr">[44]</ref>, face image alignment <ref type="bibr">[46]</ref>, face aging studies <ref type="bibr">[1]</ref> and facial gesture transfer <ref type="bibr">[6]</ref>. GAN and its variants (e.g., conditional GAN <ref type="bibr">[26]</ref>, contextual GAN <ref type="bibr">[22]</ref>, progressive GAN <ref type="bibr">[16]</ref>, styleGAN <ref type="bibr">[17,</ref><ref type="bibr">18]</ref>) have been among the most popular and successful approaches toward face image synthesis. For a recent survey on face image synthesis, please refer to <ref type="bibr">[42]</ref> and its references.</p><p>Among various synthesis tasks, the class of unpaired image-to-image translation is particularly interesting and has many practical applications. The definition of a source and a target domain might include face photo and sketch <ref type="bibr">[31]</ref>, faces with different ages (e.g., <ref type="bibr">[1]</ref>), faces with and without makeup <ref type="bibr">[4]</ref> and faces with varying expressions <ref type="bibr">[34]</ref>. Various GAN-based architectures have been adapted and extended for face image-to-image translation -e.g., conditional GAN <ref type="bibr">[15]</ref>, StarGAN <ref type="bibr">[5]</ref>, DualGAN <ref type="bibr">[50]</ref>, StackGAN <ref type="bibr">[51]</ref>, pairedCycleGAN <ref type="bibr">[4]</ref>, pyramid GAN for age progression <ref type="bibr">[49]</ref>, and ExprGAN for expression editing <ref type="bibr">[8]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Face Gender Recognition and Perception</head><p>Computational modeling of face perception has always been at the intersection of basic (e.g., to explain how we perceive human faces) and applied (e.g., to improve the performance of face recognition) sciences. Early works based on the principal component analysis (PCA) have been studied for both face recognizability <ref type="bibr">[39]</ref> and gender classifiability <ref type="bibr">[29]</ref>). Recent models developed based on deep neural networks have shown a unified solution to both problems of face recognition and gender classification (e.g., <ref type="bibr">[30]</ref>).</p><p>However, there still remains a significant gap between the computational modeling (computer vision community) and biological modeling (neuroscience and psychology) of face perception. Taking face gender as an example, both computers and humans can effortlessly recognize the gender from a face image; but their underlying computational model or sensory processing mechanism might be different. Recent works on face image reconstruction from fMRI data <ref type="bibr">[41]</ref> and face space representation in deep CNN <ref type="bibr">[28]</ref> can be viewed as early attempts to bridge the gap between these communities. In this work, we also attempt to shed light on both the problems of face gender transfer and perception.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Deep Face Gender Transfer</head><p>We propose to formulate the problem of face gender transfer by generating a face image with the opposite gender without changing the identity. It is conceptually similar to the unsupervised learning for image-to-image translation <ref type="bibr">[15]</ref> and style transfer <ref type="bibr">[57]</ref>. However, the constraint of preserving the face identity (content) while transferring the gender (style) distinguishes this work from other existing works in the literature. Note that someone might argue that gender is part of identity (e.g., soft biometrics); we opt for a narrow-sense definition of identity here (i.e., twins would be treated the same class due to their almost identical visual appearance). Additionally, a secondary objective of this study is to obtain an interpretable computational model for face gender perception.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Network Architecture Overview</head><p>Let us introduce some notation first. We will denote the two domains by M (Male) and F (Female) respectively and training samples by</p><p>where f i &#8834; F . As shown in Fig. <ref type="figure">2</ref>, our model consists of two encoder-decoders as cross-domain generators and two discriminators for each domain: </p><p>or cross-domain translation</p><p>where we have used F (M ), M (F ) as the short hand for gender transfer M &#8594; F, F &#8594; M and &#349;f , &#349;m to denote the random perturbation of swapped style codes (for gender transfer) as shown in Fig. <ref type="figure">2</ref>.</p><p>The introduction of discriminators D m , D f is to ensure that the translated images F (M ), M (F ) satisfy the gender constraint (i.e., F (M ), M (F ) do appear like real F/M ). Similar to GAN <ref type="bibr">[10]</ref>, we have used the following adversarial losses:</p><p>and the corresponding reconstruction losses are defined by</p><p>where || &#8226; || 1 denotes the L 1 -norm.</p><p>In MUNIT <ref type="bibr">[14]</ref>, one of the key insight is brought by latent reconstruction (as shown by the rightmost four boxes in Fig. <ref type="figure">2</ref>) -i.e., we should be able to reconstruct the latent (style or content) code sampled from the latent distribution after decoding and encoding. Following <ref type="bibr">[14]</ref>, the latent reconstruction losses associated with F (M ) are given by</p><p>and similarly we can define L c f REC , L &#349;m REC associated with M (F ). However, our empirical studies have shown that such latent reconstruction is not powerful enough for the challenging task of face gender transfer. Inspired by the success of Learned Perceptual Image Patch Similarity (LPIPS) <ref type="bibr">[54]</ref>, we propose to design novel loss functions in the space of feature representation (FR) instead of latent representation such as adopted by MUNIT <ref type="bibr">[14]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Feature Representation and Gender Mask</head><p>In this work, we have adopted a recently developed lightCNN <ref type="bibr">[46]</ref> as the pre-trained network to extract a 256dimensional feature representation from a given image (as shown in Fig. <ref type="figure">3</ref>). A key motivation behind the adoption of lightCNN lies in its capability of learning a compact embedding from original face space (M, F ) &#8712; R H&#215;W to x &#8712; R 256 even in the presence of massive noisy labels. Such a compact representation of large-scale face data enables us to enforce more powerful constraints in the feature (instead of image or latent) space. Note that in MUNIT <ref type="bibr">[14]</ref>, latent reconstruction loss function is intrinsically coupled with encode-decoder pairs, which limited its role of regularization. By contrast, using a pre-trained network as a tool of nonlinear dimensionality reduction greatly facilitates the task of network regularization.</p><p>The other important observation is that no content/style encoder is known for extracting the identity/gender information from a face image. The architecture of content/style encoder in MUNIT <ref type="bibr">[14]</ref> is simply too ad-hoc for the task of face gender transfer. Therefore, we propose to learn a probabilistic gender mask 0 &#8804; w i &#8804; 1(i = 1, 2, ..., 256) to dynamically filter gender representation in our 256D feature representation w. More specifically, w i = 1 -p i where p i denotes the probability of the i-th dimension is genderrelevant (a smaller w i implies a higher influence on gender performance). Note that the update of mask can be conveniently implemented by the ReLU operator during back propagation.</p><p>The definition of gender mask w allows us to simultaneously achieve the objectives of gender transfer and identify preservation as shown in Fig. <ref type="figure">3</ref>. By dividing the 256D feature representation into gender-relevant (w i &#8594; 0) and identity-relevant (w i &#8594; 1) components, we can conquer them separately by designing a pair of loss functions in the masked feature space: one has to assure that the transferred face image has the opposite gender (i.e., maximally separated from the original); and the other is to assure that the transferred face image is still visually similar to the original (i.e., to minimize the perceptual distortion or maximize the perceptual similarity <ref type="bibr">[54]</ref>). Based on the above motivations, we proceed with the design of novel loss functions next. With the pre-trained network for feature extraction, the perceptual loss in the image space can be redefined in the feature space as follows:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Novel Loss Functions</head><p>and similarly, we can redefine the similarity between the original and transferred faces in the feature space by:</p><p>3.3.2 Classification Loss Function: Separating Male from Female.</p><p>The objective is to assure the success of gender transferi.e., among the 256D feature representation, those relevant to gender will be maximally separated after the transfer. Toward this objective, we have adopted a three-layers Perceptron for the task of gender classification (due to their simplicity). As we mentioned above, gender-relevant feature information is distilled when w i &#8594; 0. Therefore, it is natural to work with 1 -w when feeding the three-layer linear regression. More specifically, the classification loss function (CLF) for gender transfer is defined as following:</p><p>where &#8226; denotes the element-wise multiplication and the Binary Cross Entropy (BCE) function is defined by</p><p>Figure <ref type="figure">3</ref>: The two pillars of our approach: feature representation and gender mask. A 256D deep feature representation is extracted from a given face image by light CNN; gender mask plays the role of killing two birds (gender transfer L mf/fm CLF and identity preservation L x,mf/fm mask ) using one stone ( w).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.3">Feature Representation Similarity: Preserving the Identity.</head><p>Now the dual objective is to preserve the identity (perceptual similarity) as much as possible regardless of gender transfer. Based on similar reasoning as before, we conclude that w i &#8594; 1 indicates the gender-irrelevant entries in the feature representation. As shown in Fig. <ref type="figure">3</ref>, we simply work with w instead of 1-w for this dual task. More specifically, the perceptual similarity loss function with masked entries will be defined in a similar manner to Eqs. ( <ref type="formula">6</ref>) and <ref type="bibr">(7)</ref> as follows</p><p>where &#8226; again denotes the element-wise multiplication. Note that Eq. ( <ref type="formula">10</ref>) will be optimized if w = [0, 0, 0...] (a pathological case). This is an unfortunate consequence of ignoring important a priori knowledge about the sparse distribution of gender in the feature space -i.e., the L 1 norm of w cannot be too small because we know in advance that the feature dimensions associated with gender are much smaller than 256 (actually our experiments will show later that the dimensionality of gender subspace is around 20 &lt;&lt; 256).</p><p>To avoid this pitfall, we propose to introduce the following regularization/prior term as following:</p><p>Joint optimization of Eqs. ( <ref type="formula">10</ref>) and ( <ref type="formula">11</ref>) leads to a sparse Bayesian learning of w, which is also coupled with the dual CLF defined above.</p><p>Putting things together, we have the total loss function given by:</p><p>where &#955; mask ,&#955; clf , &#955; rec are the weights balancing the importance of different loss terms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Datasets and Implementation Details</head><p>CelebA <ref type="bibr">[24]</ref> is widely used in various face-related research due to its large number of facial attributes. There are 202,599 images (totaling 10,177 identities) each of which contains 40 face attribute labels including gender information. Different from their protocol <ref type="bibr">[24]</ref>, we opt to adopt their validation set as our training set and use the original testing set for evaluation in our experiments.</p><p>As shown in Fig. <ref type="figure">2</ref>, content encoder consists of two stride-2 convolutions and one residual block <ref type="bibr">[11]</ref> and all convolutional layers are followed by an Instance Normalization (IN) <ref type="bibr">[40]</ref> module; style encoder E s contains several stride convolutional layers with a global average pooling layer and a fully connected layer. Regarding the generator, Adaptive Instance Normalization <ref type="bibr">[13]</ref> is employed with residual blocks. Also, VGG <ref type="bibr">[33]</ref> feature is extracted to keep perceptual invariance (similar to <ref type="bibr">[14]</ref>). Additionally, Least-Square GAN (LSGAN) <ref type="bibr">[25]</ref> and multi-scale discriminators <ref type="bibr">[43]</ref> techniques are used for discriminator training. The popular Adam algorithm <ref type="bibr">[19]</ref> is used as the training optimization method with a learning rate of 0.001 and a batch size of 2.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Face Gender Transfer Evaluation</head><p>In order to evaluate the proposed method of face gender transfer, we have conducted the following two experiments: 1) user study on Amazon Mechanical Turk providing subjective quality measurements and 2) calculate Fr&#233;chet Inception Distance (FID) <ref type="bibr">[12]</ref> as an objective image quality metric. We have compared our approach against three existing methods: CycleGAN <ref type="bibr">[57]</ref>: A cycle consistency loss is proposed to enforce image style transfer between a source domain and a target domain, which lays a solid framework for image-toimage translation using unpaired training data. MUNIT <ref type="bibr">[14]</ref>: A framework for multimodal unsupervised image-to-image translation. It assumes that image representation can be decomposed into a content code that is domain-invariant and a style code that captures domainspecific properties. By combining content code with a random style code, MUINT can generate a variety of outputs from a single input. DRIT <ref type="bibr">[20]</ref>: A network architecture decomposes image representation into two subspaces: a domain-invariant content subspace capturing shared information across domains and a domain-specific attribute subspace. By swapping domainspecific representations, the DRIT model is capable of generating diverse outputs and implementing flexible image style transfer in an unsupervised manner using a cross-cycle consistency loss.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1">User Study</head><p>In our user study, 100 originally-male and 100 originallyfemale identities were used. We have conducted two sets of surveys focusing on Translation and Similarity respectively. In the Translation surveys, participants were presented with two images of the same identity and asked to choose the one which "looks more like a male/female". One of the two images was generated by our method and the other was generated by one of the three competing methods (CylceGAN, MUNIT, and DRIT). In the Similarity surveys, participants were also presented with two images of the same identity, but were asked to rate "from 0 (extremely different) to 10 (extremely similar), how similar are the two faces". One of the two images was the original and the other was generated rating mean std. dev. The order of presentation and the left/right location of images were fully randomized. In total, there were 600 pairs of images in the Translation surveys and also 600 pairs of images in the Similarity surveys. 25 responses were collected for each pair of images. The results show that our method outperforms the three existing methods in terms of preference ratio calculated from the Translation survey. As to the preservation of identity, our method is better than DRIT and MUNIT, but not as good as CycleGAN. Such experimental findings seem to suggest that cycle-consistency loss is beneficial to the task of identity preservation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2">Fr&#233;chet Inception Distance (FID)</head><p>FID <ref type="bibr">[12]</ref> has been widely used for measuring the subjective quality of synthetic images such as <ref type="bibr">[2]</ref>. FID metric is calculated over features extracted from an intermediate layer in the Inception network <ref type="bibr">[36]</ref>. We have conducted an evaluation with FID between the original images and the end images after gender transfer. The feature data are modelled by a multivariate Gaussian distribution with mean &#181; and covariance &#931;. The FID value between the real image x and the synthetic image y is given by the formula below:</p><p>Where T r(A) denotes the trace of square matrix A. Lower FID values imply better image quality. Our approach has achieved the best performance in terms of FID as shown in Table <ref type="table">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Gender Classifier Evaluation</head><p>To better quantitatively evaluate the gender transfer performance, we have designed an experiment to fool the classifier using translated images. First we train a gender classi-</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head><p>CycleGAN MUNIT DRIT Ours fooling rate 72.40% 65.55% 30.08% 78.67%   <ref type="table">4</ref>. It can be observed that our approach achieves the highest fooling rate, which justifies its superiority to other competing approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Gender Feature Representation Correlation Study</head><p>As mentioned before, our architecture learns an adaptive mask to separate gender relevant information from identity representation of face images. Our objective is to not only show the superiority in terms of gender style transfer but also provide an explainable solution (so-called interpretable machine learning <ref type="bibr">[27]</ref> or explainable AI) to learn genderrelated representations from face images.</p><p>To gain deeper insight into the learned gender-related representations, we have designed the following experiment with a linear SVM classifier to demonstrate that deep facial features selected by the probabilistic gender mask w in Sec.</p><p>3.2 has a strong correlation with the actual gender attribute. In our experiment, we have compared with two different schemes of feature selection: one is based on the learned probabilistic gender mask and the other random sampling (i.e., randomly select from the 256-dimensional feature). It can be seen from Figure <ref type="figure">5</ref>, when the number of selected features is within the range of <ref type="bibr">[5,</ref><ref type="bibr">75]</ref>, the classification accuracy of learned gender mask is much higher than randomly selected features. Such experimental result provides strong supporting evidence about the high correlation between the learned deep facial features and gender facial attribute. On the other hand, the interpretability of convolutional neural networks (CNNs) <ref type="bibr">[52]</ref>, has only recently received some attention from the computer vision community. We argue that making the learned representation interpretable is not only for the purpose of breaking the bottlenecks of deep learning <ref type="bibr">[53]</ref> but also to facilitate the communication between computer vision and cognitive science communities. Our result supports a well-known hypothesis in psychology -the independence between face recognizability and gender classifiability <ref type="bibr">[29]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.">Limitations and Discussions</head><p>Though our model is able to capture the key characteristics about gender information and achieves appealing results in face gender transfer, some failure translation does exist. From our experiments, wearing eyeglasses, large pose variation, and extreme age groups (refer to Fig. <ref type="figure">6</ref>) are typical failure cases that our generator cannot perform a quality transfer. The failure example can be classified into two categories: one is image quality based, such as occlusion including eyeglasses, hair occlusion, and large pose variation. These challenges are still problematic in computer vision community, no matter for image generation or recognition task, which is also our next step work. The other is age-related (e.g., for people who are too old or too young), we argue it is because they have less gender related features compared to normal adults. Furthermore, we also observe that our model is not robust enough to generate some marginal details of face -e.g. eyebrows and face symmetry as shown in Fig. <ref type="figure">7</ref>. This is also the most challenging task for current GAN model to generate realistic images including state-of-the-art generation models <ref type="bibr">[17,</ref><ref type="bibr">18]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>In this paper, we present a novel GAN-based face gender translation architecture with a sparse representation learning. Our model not only generates high quality facial synthesis on gender transfer, but learns a gender related compact representation on the deep facial features space. It is a first experiment attempting at the problem of gender representation interpretation from a GAN-based model. We believe the proposed method can serve as a practical solu-  tion to address the gender bias issue, commonly present in many public facial image datasets for various face recognition tasks <ref type="bibr">[38,</ref><ref type="bibr">35,</ref><ref type="bibr">47,</ref><ref type="bibr">9]</ref>.</p></div></body>
		</text>
</TEI>
