skip to main content


Title: A Deep Face Identification Network Enhanced by Facial Attributes Prediction
In this paper, we propose a new deep framework which predicts facial attributes and leverage it as a soft modality to improve face identification performance. Our model is an end to end framework which consists of a convolutional neural network (CNN) whose output is fanned out into two separate branches; the first branch predicts facial attributes while the second branch identifies face images. Contrary to the existing multi-task methods which only use a shared CNN feature space to train these two tasks jointly, we fuse the predicted attributes with the features from the face modality in order to improve the face identification performance. Experimental results show that our model brings benefits to both face identification as well as facial attribute prediction performance, especially in the case of identity facial attributes such as gender prediction. We tested our model on two standard datasets annotated by identities and face attributes. Experimental results indicate that the proposed model outperforms most of the current existing face identification and attribute prediction methods.  more » « less
Award ID(s):
1650474
PAR ID:
10091246
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Page Range / eLocation ID:
666 to 6667
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With benefits of fast query speed and low storage cost, hashing-based image retrieval approaches have garnered considerable attention from the research community. In this paper, we propose a novel Error-Corrected Deep Cross Modal Hashing (CMH-ECC) method which uses a bitmap specifying the presence of certain facial attributes as an input query to retrieve relevant face images from the database. In this architecture, we generate compact hash codes using an end-to-end deep learning module, which effectively captures the inherent relationships between the face and attribute modality. We also integrate our deep learning module with forward error correction codes to further reduce the distance between different modalities of the same subject. Specifically, the properties of deep hashing and forward error correction codes are exploited to design a cross modal hashing framework with high retrieval performance. Experimental results using two standard datasets with facial attributes-image modalities indicate that our CMH-ECC face image retrieval model outperforms most of the current attribute-based face image retrieval approaches. 
    more » « less
  2. Facial attribute recognition is conventionally computed from a single image. In practice, each subject may have multiple face images. Taking the eye size as an example, it should not change, but it may have different estimation in multiple images, which would make a negative impact on face recognition. Thus, how to compute these attributes corresponding to each subject rather than each single image is a profound work. To address this question, we deploy deep training for facial attributes prediction, and we explore the inconsistency issue among the attributes computed from each single image. Then, we develop two approaches to address the inconsistency issue. Experimental results show that the proposed methods can handle facial attribute estimation on either multiple still images or video frames, and can correct the incorrectly annotated labels. The experiments are conducted on two large public databases with annotations of facial attributes. 
    more » « less
  3. Face sketch-photo synthesis is a critical application in law enforcement and digital entertainment industry. Despite the significant improvements in sketch-to-photo synthesis techniques, existing methods have still serious limitations in practice, such as the need for paired data in the training phase or having no control on enforcing facial attributes over the synthesized image. In this work, we present a new framework, which is a conditional version of Cycle-GAN, conditioned on facial attributes. The proposed network forces facial attributes, such as skin and hair color, on the synthesized photo and does not need a set of aligned face-sketch pairs during its training. We evaluate the proposed network by training on two real and synthetic sketch datasets. The hand-sketch images of the FERET dataset and the color face images from the WVU Multi-modal dataset are used as an unpaired input to the proposed conditional CycleGAN with the skin color as the controlled face attribute. For more attribute guided evaluation, a synthetic sketch dataset is created from the CelebA dataset and used to evaluate the performance of the network by forcing several desired facial attributes on the synthesized faces. 
    more » « less
  4. In this paper, we present a new multi-branch neural network that simultaneously performs soft biometric (SB) prediction as an auxiliary modality and face recognition (FR) as the main task. Our proposed network named AAFace utilizes SB attributes to enhance the discriminative ability of FR representation. To achieve this goal, we propose an attribute-aware attentional integration (AAI) module to perform weighted integration of FR with SB feature maps. Our proposed AAI module is not only fully context-aware but also capable of learning complex relationships between input features by means of the sequential multi-scale channel and spatial sub-modules. Experimental results verify the superiority of our proposed network compared with the state-of-the-art (SoTA) SB prediction and FR methods. 
    more » « less
  5. We introduce caption-guided face recognition (CGFR) as a new framework to improve the performance of commercial-off-the-shelf (COTS) face recognition (FR) systems. In contrast to combining soft biometrics (e.g., facial marks, gender, and age) with face images, in this work, we use facial descriptions provided by face examiners as a piece of auxiliary information. However, due to the heterogeneity of the modalities, improving the performance by directly fusing the textual and facial features is very challenging, as both lie in different embedding spaces. In this paper, we propose a contextual feature aggregation module (CFAM) that addresses this issue by effectively exploiting the fine-grained word-region interaction and global image-caption association. Specifically, CFAM adopts a self-attention and a cross-attention scheme for improving the intra-modality and inter-modality relationship between the image and textual features, respectively. Additionally, we design a textual feature refinement module (TFRM) that refines the textual features of the pre-trained BERT encoder by updating the contextual embeddings. This module enhances the discriminative power of textual features with a cross-modal projection loss and realigns the word and caption embeddings with visual features by incorporating a visual-semantic alignment loss. We implemented the proposed CGFR framework on two face recognition models (ArcFace and AdaFace) and evaluated its performance on the Multi-Modal CelebA-HQ dataset. Our framework significantly improves the performance of ArcFace in both 1:1 verification and 1:N identification protocol. 
    more » « less