skip to main content

Title: Landmark Enforcement and Style Manipulation for Generative Morphing
Morph images threaten Facial Recognition Systems (FRS) by presenting as multiple individuals, allowing an adversary to swap identities with another subject. Morph generation using generative adversarial networks (GANs) results in high-quality morphs unaffected by the spatial artifacts caused by landmark-based methods, but there is an apparent loss in identity with standard GAN-based morphing methods. In this paper, we propose a novel StyleGAN morph generation technique by introducing a landmark enforcement method to resolve this issue. Considering this method, we aim to enforce the landmarks of the morphed image to represent the spatial average of the landmarks of the bona fide faces and subsequently the morph images to inherit the geometric identity of both bona fide faces. Exploration of the latent space of our model is conducted using Principal Component Analysis (PCA) to accentuate the effect of both the bona fide faces on the morphed latent representation and address the identity loss issue with latent domain averaging. Additionally, to improve high frequency reconstruction in the morphs, we study the train-ability of the noise input for the StyleGAN2 model.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2022 IEEE International Joint Conference on Biometrics (IJCB), Abu Dhabi, United Arab Emirates, 2022
Page Range / eLocation ID:
1 to 10
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. By combining two or more face images of look-alikes, morphed face images are generated to fool Facial Recognition Systems (FRS) into falsely accepting multiple people, leading to failures in security systems. Despite several attempts in the literature, finding pairs of bona fide faces to generate the morphed images is still a challenging problem. In this paper, we morph identical twin pairs to generate extremely difficult morphs for FRS. We first explore three methods of morphed face generation, GAN-based, landmark-based, and a wavelet-based morphing approach. We leverage these methods to generate morphs from the identical twin pairs that retain high similarity to both subjects while resulting in minimal artifacts in the visual domain. To further improve the difficulty of recognizing morphed face images, we perform an ablation study to apply adversarial perturbation to the morphs such that they cannot be detected by trained morph classifiers. The evaluation of the generated identical twin-morphed dataset is performed in terms of vulnerability analysis and presentation attack error rates. 
    more » « less
  2. Morph detection is of paramount significance when the integrity of Automatic Face Recognition (AFR) systems are concerned. Considering the risks incurred by morphing attacks, a robust automated morph detector is required which can distinguish authentic bona fide samples from altered morphed images. We leverage the wavelet sub-band decomposition of an input image, yielding the fine-grained spatial-frequency content of the input image. To enhance the detection of morphed images, our goal is to find the most discriminative information across frequency channels and spatial domain. To this end, we propose an end-to-end attention-based deep morph detector which assimilates the most discriminative wavelet sub-bands of a given image which are obtained by a group sparsity representation learning scheme. Specifically, our group sparsity-constrained Deep Neural Network (DNN) learns the most discriminative wavelet sub-bands (channels) of an input image while the attention mechanism captures the most discriminative spatial regions of input images for the downstream task of morph detection. To this end, we adopt three attention mechanisms to diversify our refined features for morph detection. As the first attention mechanism, we employ the Convolutional Block Attention Module (CBAM) which provides us with refined feature maps. As the second attention mechanism, compatibility scores across spatial locations and output of our DNN highlight the most discriminative regions, and lastly, the multiheaded self-attention augmented convolutions account for our third attention mechanism. We evaluate the efficiency of our proposed framework through extensive experiments using multiple morph datasets that are compiled using bona fide images available in the FERET, FRLL, FRGC, and WVU Twin datasets. Most importantly, our proposed methodology has resulted in a reduction in detection error rates when compared with state-of-the-art results. Finally, to further assess our multi-attentional morph detection, we delve into different combinations of attention mechanisms via a comprehensive ablation study. 
    more » « less
  3. A morph is an image of an ambiguous subject generated by combining multiple individuals. The morphed image can be submitted to a facial recognition system and erroneously verified with the contributing bad actors. When submitted as a passport image, a morphed face poses a national security threat because a passport can then be shared between the individuals. As morphed images become easier to generate, it is vital that the research community expands available datasets in order to contentiously improve current technology. Children are a challenging paradigm for facial recognition systems and morphing children takes advantage of this disparity. In this paper, we morph juvenile faces in order to create a unique, high-quality dataset to challenge FRS. To the best of our knowledge, this is the first study on the generation and evaluation of juvenile morphed faces. The evaluation of the generated morphed juvenile dataset is performed in terms of vulnerability analysis and presentation attack error rates. 
    more » « less
  4. Many sign languages are bona fide natural languages with grammatical rules and lexicons hence can benefit from machine translation methods. Similarly, since sign language is a visual-spatial language, it can also benefit from computer vision methods for encoding it. With the advent of deep learning methods in recent years, significant advances have been made in natural language processing (specifically neural machine translation) and in computer vision methods (specifically image and video captioning). Researchers have therefore begun expanding these learning methods to sign language understanding. Sign language interpretation is especially challenging, because it involves a continuous visual-spatial modality where meaning is often derived based on context. The focus of this article, therefore, is to examine various deep learning–based methods for encoding sign language as inputs, and to analyze the efficacy of several machine translation methods, over three different sign language datasets. The goal is to determine which combinations are sufficiently robust for sign language translation without any gloss-based information. To understand the role of the different input features, we perform ablation studies over the model architectures (input features + neural translation models) for improved continuous sign language translation. These input features include body and finger joints, facial points, as well as vector representations/embeddings from convolutional neural networks. The machine translation models explored include several baseline sequence-to-sequence approaches, more complex and challenging networks using attention, reinforcement learning, and the transformer model. We implement the translation methods over multiple sign languages—German (GSL), American (ASL), and Chinese sign languages (CSL). From our analysis, the transformer model combined with input embeddings from ResNet50 or pose-based landmark features outperformed all the other sequence-to-sequence models by achieving higher BLEU2-BLEU4 scores when applied to the controlled and constrained GSL benchmark dataset. These combinations also showed significant promise on the other less controlled ASL and CSL datasets. 
    more » « less
  5. Heatmap regression-based models have significantly advanced the progress of facial landmark detection. However, the lack of structural constraints always generates inaccurate heatmaps resulting in poor landmark detection performance. While hierarchical structure modeling methods have been proposed to tackle this issue, they all heavily rely on manually designed tree structures. The designed hierarchical structure is likely to be completely corrupted due to the missing or inaccurate prediction of landmarks. To the best of our knowledge, in the context of deep learning, no work before has investigated how to automatically model proper structures for facial landmarks, by discovering their inherent relations. In this paper, we propose a novel Hierarchical Structured Landmark Ensemble (HSLE) model for learning robust facial landmark detection, by using it as the structural constraints. Different from existing approaches of manually designing structures, our proposed HSLE model is constructed automatically via discovering the most robust patterns so HSLE has the ability to robustly depict both local and holistic landmark structures simultaneously. Our proposed HSLE can be readily plugged into any existing facial landmark detection baselines for further performance improvement. Extensive experimental results demonstrate our approach significantly outperforms the baseline by a large margin to achieve a state-of-the-art performance. 
    more » « less