skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Pose Attention-Guided Profile-to-Frontal Face Recognition
In recent years, face recognition systems have achieved exceptional success due to promising advances in deep learning architectures. However, they still fail to achieve the expected accuracy when matching profile images against a gallery of frontal images. Current approaches either perform pose normalization (i.e., frontalization) or disentangle pose information for face recognition. We instead propose a new approach to utilize pose as auxiliary information via an attention mechanism. In this paper, we hypothesize that pose-attended information using an attention mechanism can guide contextual and distinctive feature extraction from profile faces, which further benefits better representation learning in an embedded domain. To achieve this, first, we design a unified coupled profile-to-frontal face recognition network. It learns the mapping from faces to a compact embedding subspace via a class-specific contrastive loss. Second, we develop a novel pose attention block (PAB) to specially guide the pose-agnostic feature extraction from profile faces. To be more specific, PAB is designed to explicitly help the network to focus on important features along both “channel” and “spatial” dimensions while learning discriminative yet pose-invariant features in an embedding subspace. To validate the effectiveness of our proposed method, we conduct experiments on both controlled and in the- wild benchmarks including Multi-PIE, CFP, and IJB-C, and show superiority over the state-of-the-art.  more » « less
Award ID(s):
1650474
PAR ID:
10401301
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Int. Joint Conference on Biometrics (IJCB'22)
Page Range / eLocation ID:
1 to 10
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we seek to draw connections between the frontal and profile face images in an abstract embedding space. We exploit this connection using a coupled-encoder network to project frontal/profile face images into a common latent embedding space. The proposed model forces the similarity of representations in the embedding space by maximizing the mutual information between two views of the face. The proposed coupled-encoder benefits from three contributions for matching faces with extreme pose disparities. First, we leverage our pose-aware contrastive learning to maximize the mutual information between frontal and profile representations of identities. Second, a memory buffer, which consists of latent representations accumulated over past iterations, is integrated into the model so it can refer to relatively much more instances than the minibatch size. Third, a novel pose-aware adversarial domain adaptation method forces the model to learn an asymmetric mapping from profile to frontal representation. In our framework, the coupled-encoder learns to enlarge the margin between the distribution of genuine and imposter faces, which results in high mutual information between different views of the same identity. The effectiveness of the proposed model is investigated through extensive experiments, evaluations, and ablation studies on four benchmark datasets, and comparison with the compelling state-of-the-art algorithms. 
    more » « less
  2. In this paper, we present a deep coupled learning framework to address the problem of matching polarimetric thermal face photos against a gallery of visible faces. Polarization state information of thermal faces provides the missing textural and geometrics details in the thermal face imagery which exist in visible spectrum. we propose a coupled deep neural network architecture which leverages relatively large visible and thermal datasets to overcome the problem of overfitting and eventually we train it by a polarimetric thermal face dataset which is the first of its kind. The proposed architecture is able to make full use of the polarimetric thermal information to train a deep model compared to the conventional shallow thermal-to-visible face recognition methods. Proposed coupled deep neural network also finds global discriminative features in a nonlinear embedding space to relate the polarimetric thermal faces to their corresponding visible faces. The results show the superiority of our method compared to the state-of-the-art models in cross thermal-to-visible face recognition algorithms. 
    more » « less
  3. Although face recognition (FR) has achieved great success in recent years, it is still challenging to accurately recognize faces in low-quality images due to the obscured facial details. Nevertheless, it is often feasible to make predictions about specific soft biometric (SB) attributes, such as gender, age, and baldness even in dealing with low-quality images. In this paper, we propose a novel multi-branch neural network that leverages SB attribute information to boost the performance of FR. To this ed, we propose a cross-attribute-guided transformer fusion (CATF) module that effectively captures the long-range dependencies and relationships between FR and SB feature representations. The synergy created by the reciprocal flow of information in the dual cross-attention operations of the proposed CATF module enhances the performance of FR. Furthermore, we introduce a novel self-attention distillation framework that effectively highlights crucial facial regions, such as landmarks by aligning low-quality images with those of their high-quality counterparts in the feature space. The proposed self-attention distillation regularizes our network. to learn a unified quality-invariant feature representation in unconstrained environments. We conduct extensive experiments on various real-world FR benchmarks varying in quality. Experimental results demonstrate the superiority of our FR method compared to state-of-the-art FR studies. 
    more » « less
  4. We introduce caption-guided face recognition (CGFR) as a new framework to improve the performance of commercial-off-the-shelf (COTS) face recognition (FR) systems. In contrast to combining soft biometrics (e.g., facial marks, gender, and age) with face images, in this work, we use facial descriptions provided by face examiners as a piece of auxiliary information. However, due to the heterogeneity of the modalities, improving the performance by directly fusing the textual and facial features is very challenging, as both lie in different embedding spaces. In this paper, we propose a contextual feature aggregation module (CFAM) that addresses this issue by effectively exploiting the fine-grained word-region interaction and global image-caption association. Specifically, CFAM adopts a self-attention and a cross-attention scheme for improving the intra-modality and inter-modality relationship between the image and textual features. Additionally, we design a textual feature refinement module (TFRM) that refines the textual features of the pre-trained BERT encoder by updating the contextual embeddings. This module enhances the discriminative power of textual features with a crossmodal projection loss and realigns the word and caption embeddings with visual features by incorporating a visualsemantic alignment loss. We implemented the proposed CGFR framework on two face recognition models (Arc- Face and AdaFace) and evaluated its performance on the Multimodal CelebA-HQ dataset. Our framework improves the performance of ArcFace from 16.75% to 66.83% on TPR@FPR=1e-4 in the 1:1 verification protocol. 
    more » « less
  5. In this paper, we propose a convolutional neural network (CNN) based, scenario-dependent and sensor (mobile device) adaptable hierarchical classification framework. Our proposed framework is designed to automatically categorize face data captured under various challenging conditions, before the FR algorithms (pre-processing, feature extraction and matching) are used. First, a unique multi-sensor database (using Samsung S4 Zoom, Nokia 1020, iPhone 5S and Samsung S5 phones) is collected containing face images indoors, outdoors, with yaw angle from -90 to +90 and at two different distances, i.e. 1 and 10 meters. To cope with pose variations, face detection and pose estimation algorithms are used for classifying the facial images into a frontal or a non-frontal class. Next, our proposed framework is used where tri-level hierarchical classification is performed as follows: Level 1, face images are classified based on phone type; Level 2, face images are further classified into indoor and outdoor images; and finally, Level 3 face images are classified into a close (1m) and a far, low quality, (10m) distance categories respectively. Experimental results show that classification accuracy is scenario dependent, reaching from 95 to more than 98% accuracy for level 2 and from 90 to more than 99% for level 3 classification. A set of experiments is performed indicating that, the usage of data grouping before the face matching is performed, resulted in a significantly improved rank-1 identification rate when compared to the original (all vs. all) biometric system. 
    more » « less