skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning Deep Features for Hierarchical Classification of Mobile Phone Face Datasets in Heterogeneous Environments
In this paper, we propose a convolutional neural network (CNN) based, scenario-dependent and sensor (mobile device) adaptable hierarchical classification framework. Our proposed framework is designed to automatically categorize face data captured under various challenging conditions, before the FR algorithms (pre-processing, feature extraction and matching) are used. First, a unique multi-sensor database (using Samsung S4 Zoom, Nokia 1020, iPhone 5S and Samsung S5 phones) is collected containing face images indoors, outdoors, with yaw angle from -90 to +90 and at two different distances, i.e. 1 and 10 meters. To cope with pose variations, face detection and pose estimation algorithms are used for classifying the facial images into a frontal or a non-frontal class. Next, our proposed framework is used where tri-level hierarchical classification is performed as follows: Level 1, face images are classified based on phone type; Level 2, face images are further classified into indoor and outdoor images; and finally, Level 3 face images are classified into a close (1m) and a far, low quality, (10m) distance categories respectively. Experimental results show that classification accuracy is scenario dependent, reaching from 95 to more than 98% accuracy for level 2 and from 90 to more than 99% for level 3 classification. A set of experiments is performed indicating that, the usage of data grouping before the face matching is performed, resulted in a significantly improved rank-1 identification rate when compared to the original (all vs. all) biometric system.  more » « less
Award ID(s):
1650474 1066197
PAR ID:
10053529
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017)
Page Range / eLocation ID:
186 to 193
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we seek to draw connections between the frontal and profile face images in an abstract embedding space. We exploit this connection using a coupled-encoder network to project frontal/profile face images into a common latent embedding space. The proposed model forces the similarity of representations in the embedding space by maximizing the mutual information between two views of the face. The proposed coupled-encoder benefits from three contributions for matching faces with extreme pose disparities. First, we leverage our pose-aware contrastive learning to maximize the mutual information between frontal and profile representations of identities. Second, a memory buffer, which consists of latent representations accumulated over past iterations, is integrated into the model so it can refer to relatively much more instances than the minibatch size. Third, a novel pose-aware adversarial domain adaptation method forces the model to learn an asymmetric mapping from profile to frontal representation. In our framework, the coupled-encoder learns to enlarge the margin between the distribution of genuine and imposter faces, which results in high mutual information between different views of the same identity. The effectiveness of the proposed model is investigated through extensive experiments, evaluations, and ablation studies on four benchmark datasets, and comparison with the compelling state-of-the-art algorithms. 
    more » « less
  2. Performing a direct match between images from different spectra (i.e., passive infrared and visible) is challenging because each spectrum contains different information pertaining to the subject’s face. In this work, we investigate the benefits and limitations of using synthesized visible face images from thermal ones and vice versa in cross-spectral face recognition systems. For this purpose, we propose utilizing canonical correlation analysis (CCA) and manifold learning dimensionality reduction (LLE). There are four primary contributions of this work. First, we formulate the cross-spectral heterogeneous face matching problem (visible to passive IR) using an image synthesis framework. Second, a new processed database composed of two datasets consistent of separate controlled frontal face subsets (VIS-MWIR and VIS-LWIR) is generated from the original, raw face datasets collected in three different bands (visible, MWIR and LWIR). This multi-band database is constructed using three different methods for preprocessing face images before feature extraction methods are applied. There are: (1) face detection, (2) CSU’s geometric normalization, and (3) our recommended geometric normalization method. Third, a post-synthesis image denoising methodology is applied, which helps alleviate different noise patterns present in synthesized images and improve baseline FR accuracy (i.e. before image synthesis and denoising is applied) in practical heterogeneous FR scenarios. Finally, an extensive experimental study is performed to demonstrate the feasibility and benefits of cross-spectral matching when using our image synthesis and denoising approach. Our results are also compared to a baseline commercial matcher and various academic matchers provided by the CSU’s Face Identification Evaluation System. 
    more » « less
  3. In recent years, face recognition systems have achieved exceptional success due to promising advances in deep learning architectures. However, they still fail to achieve the expected accuracy when matching profile images against a gallery of frontal images. Current approaches either perform pose normalization (i.e., frontalization) or disentangle pose information for face recognition. We instead propose a new approach to utilize pose as auxiliary information via an attention mechanism. In this paper, we hypothesize that pose-attended information using an attention mechanism can guide contextual and distinctive feature extraction from profile faces, which further benefits better representation learning in an embedded domain. To achieve this, first, we design a unified coupled profile-to-frontal face recognition network. It learns the mapping from faces to a compact embedding subspace via a class-specific contrastive loss. Second, we develop a novel pose attention block (PAB) to specially guide the pose-agnostic feature extraction from profile faces. To be more specific, PAB is designed to explicitly help the network to focus on important features along both “channel” and “spatial” dimensions while learning discriminative yet pose-invariant features in an embedding subspace. To validate the effectiveness of our proposed method, we conduct experiments on both controlled and in the- wild benchmarks including Multi-PIE, CFP, and IJB-C, and show superiority over the state-of-the-art. 
    more » « less
  4. Plot-level photography is an attractive time-saving alternative to field measurements for vegetation monitoring. However, widespread adoption of this technique relies on efficient workflows for post-processing images and the accuracy of the resulting products. Here, we estimated relative vegetation cover using both traditional field sampling methods (point frame) and semi-automated classification of photographs (plot-level photography) across thirty 1 m2 plots near Utqiaġvik, Alaska, from 2012 to 2021. Geographic object-based image analysis (GEOBIA) was applied to generate objects based on the three spectral bands (red, green, and blue) of the images. Five machine learning algorithms were then applied to classify the objects into vegetation groups, and random forest performed best (60.5% overall accuracy). Objects were reliably classified into the following classes: bryophytes, forbs, graminoids, litter, shadows, and standing dead. Deciduous shrubs and lichens were not reliably classified. Multinomial regression models were used to gauge if the cover estimates from plot-level photography could accurately predict the cover estimates from the point frame across space or time. Plot-level photography yielded useful estimates of vegetation cover for graminoids. However, the predictive performance varied both by vegetation class and whether it was being used to predict cover in new locations or change over time in previously sampled plots. These results suggest that plot-level photography may maximize the efficient use of time, funding, and available technology to monitor vegetation cover in the Arctic, but the accuracy of current semi-automated image analysis is not sufficient to detect small changes in cover. 
    more » « less
  5. Timely detection of horse pain is important for equine welfare. Horses express pain through their facial and body behavior, but may hide signs of pain from unfamiliar human observers. In addition, collecting visual data with detailed annotation of horse behavior and pain state is both cumbersome and not scalable. Consequently, a pragmatic equine pain classification system would use video of the unobserved horse and weak labels. This paper proposes such a method for equine pain classification by using multi-view surveillance video footage of unobserved horses with induced orthopaedic pain, with temporally sparse video level pain labels. To ensure that pain is learned from horse body language alone, we first train a self-supervised generative model to disentangle horse pose from its appearance and background before using the disentangled horse pose latent representation for pain classification. To make best use of the pain labels, we develop a novel loss that formulates pain classification as a multi-instance learning problem. Our method achieves pain classification accuracy better than human expert performance with 60% accuracy. The learned latent horse pose representation is shown to be viewpoint covariant, and disentangled from horse appearance. Qualitative analysis of pain classified segments shows correspondence between the pain symptoms identified by our model, and equine pain scales used in veterinary practice. 
    more » « less