Morph images threaten Facial Recognition Systems
(FRS) by presenting as multiple individuals, allowing an
adversary to swap identities with another subject. Morph
generation using generative adversarial networks (GANs)
results in high-quality morphs unaffected by the spatial artifacts
caused by landmark-based methods, but there is an
apparent loss in identity with standard GAN-based morphing
methods. In this paper, we propose a novel StyleGAN
morph generation technique by introducing a landmark enforcement
method to resolve this issue. Considering this
method, we aim to enforce the landmarks of the morphed image
to represent the spatial average of the landmarks of the
bona fide faces and subsequently the morph images to inherit
the geometric identity of both bona fide faces. Exploration
of the latent space of our model is conducted using
Principal Component Analysis (PCA) to accentuate the effect
of both the bona fide faces on the morphed latent representation
and address the identity loss issue with latent
domain averaging. Additionally, to improve high frequency
reconstruction in the morphs, we study the train-ability of
the noise input for the StyleGAN2 model.
more »
« less
This content will become publicly available on April 1, 2024
Attention Augmented Face Morph Detection
Morph detection is of paramount significance when the integrity of Automatic Face
Recognition (AFR) systems are concerned. Considering the risks incurred by morphing attacks, a robust
automated morph detector is required which can distinguish authentic bona fide samples from altered
morphed images. We leverage the wavelet sub-band decomposition of an input image, yielding the fine-grained
spatial-frequency content of the input image. To enhance the detection of morphed images, our
goal is to find the most discriminative information across frequency channels and spatial domain. To
this end, we propose an end-to-end attention-based deep morph detector which assimilates the most
discriminative wavelet sub-bands of a given image which are obtained by a group sparsity representation
learning scheme. Specifically, our group sparsity-constrained Deep Neural Network (DNN) learns the most
discriminative wavelet sub-bands (channels) of an input image while the attention mechanism captures the
most discriminative spatial regions of input images for the downstream task of morph detection. To this
end, we adopt three attention mechanisms to diversify our refined features for morph detection. As the first
attention mechanism, we employ the Convolutional Block Attention Module (CBAM) which provides us
with refined feature maps. As the second attention mechanism, compatibility scores across spatial locations
and output of our DNN highlight the most discriminative regions, and lastly, the multiheaded self-attention
augmented convolutions account for our third attention mechanism. We evaluate the efficiency of our
proposed framework through extensive experiments using multiple morph datasets that are compiled using
bona fide images available in the FERET, FRLL, FRGC, and WVU Twin datasets. Most importantly, our
proposed methodology has resulted in a reduction in detection error rates when compared with state-of-the-art
results. Finally, to further assess our multi-attentional morph detection, we delve into different combinations
of attention mechanisms via a comprehensive ablation study.
more »
« less
- Award ID(s):
- 1650474
- NSF-PAR ID:
- 10401288
- Date Published:
- Journal Name:
- IEEE Access
- ISSN:
- 2169-3536
- Page Range / eLocation ID:
- 1 to 1
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
By combining two or more face images of look-alikes, morphed face images are generated to fool Facial Recognition Systems (FRS) into falsely accepting multiple people, leading to failures in security systems. Despite several attempts in the literature, finding pairs of bona fide faces to generate the morphed images is still a challenging problem. In this paper, we morph identical twin pairs to generate extremely difficult morphs for FRS. We first explore three methods of morphed face generation, GAN-based, landmark-based, and a wavelet-based morphing approach. We leverage these methods to generate morphs from the identical twin pairs that retain high similarity to both subjects while resulting in minimal artifacts in the visual domain. To further improve the difficulty of recognizing morphed face images, we perform an ablation study to apply adversarial perturbation to the morphs such that they cannot be detected by trained morph classifiers. The evaluation of the generated identical twin-morphed dataset is performed in terms of vulnerability analysis and presentation attack error rates.more » « less
-
Many sign languages are bona fide natural languages with grammatical rules and lexicons hence can benefit from machine translation methods. Similarly, since sign language is a visual-spatial language, it can also benefit from computer vision methods for encoding it. With the advent of deep learning methods in recent years, significant advances have been made in natural language processing (specifically neural machine translation) and in computer vision methods (specifically image and video captioning). Researchers have therefore begun expanding these learning methods to sign language understanding. Sign language interpretation is especially challenging, because it involves a continuous visual-spatial modality where meaning is often derived based on context. The focus of this article, therefore, is to examine various deep learning–based methods for encoding sign language as inputs, and to analyze the efficacy of several machine translation methods, over three different sign language datasets. The goal is to determine which combinations are sufficiently robust for sign language translation without any gloss-based information. To understand the role of the different input features, we perform ablation studies over the model architectures (input features + neural translation models) for improved continuous sign language translation. These input features include body and finger joints, facial points, as well as vector representations/embeddings from convolutional neural networks. The machine translation models explored include several baseline sequence-to-sequence approaches, more complex and challenging networks using attention, reinforcement learning, and the transformer model. We implement the translation methods over multiple sign languages—German (GSL), American (ASL), and Chinese sign languages (CSL). From our analysis, the transformer model combined with input embeddings from ResNet50 or pose-based landmark features outperformed all the other sequence-to-sequence models by achieving higher BLEU2-BLEU4 scores when applied to the controlled and constrained GSL benchmark dataset. These combinations also showed significant promise on the other less controlled ASL and CSL datasets.more » « less
-
null (Ed.)Abstract--- The JPEG compatibility attack is a steganalysis method for detecting messages embedded in the spatial representation of images under the assumption that the cover is a decompressed JPEG. This paper focuses on improving the detection accuracy for the difficult case of high JPEG qualities and content-adaptive stego algorithms. Close attention is paid to the robustness of the detection with respect to the JPEG compressor and DCT coefficient quantizer. A likelihood ratio detector derived from a model of quantization errors of DCT coefficients in the recompressed image is used to explain the main mechanism responsible for detection and to understand the results of experiments. The most accurate detector is an SRNet trained on a two-channel input consisting of the image and its SQ error. The detection performance is contrasted with state of the art on four content-adaptive stego methods, wide range of payloads and quality factors.more » « less
-
Cloud detection is an inextricable pre-processing step in remote sensing image analysis workflows. Most of the traditional rule-based and machine-learning-based algorithms utilize low-level features of the clouds and classify individual cloud pixels based on their spectral signatures. Cloud detection using such approaches can be challenging due to a multitude of factors including harsh lighting conditions, the presence of thin clouds, the context of surrounding pixels, and complex spatial patterns. In recent studies, deep convolutional neural networks (CNNs) have shown outstanding results in the computer vision domain. These methods are practiced for better capturing the texture, shape as well as context of images. In this study, we propose a deep learning CNN approach to detect cloud pixels from medium-resolution satellite imagery. The proposed CNN accounts for both the low-level features, such as color and texture information as well as high-level features extracted from successive convolutions of the input image. We prepared a cloud-pixel dataset of approximately 7273 randomly sampled 320 by 320 pixels image patches taken from a total of 121 Landsat-8 (30m) and Sentinel-2 (20m) image scenes. These satellite images come with cloud masks. From the available data channels, only blue, green, red, and NIR bands are fed into the model. The CNN model was trained on 5300 image patches and validated on 1973 independent image patches. As the final output from our model, we extract a binary mask of cloud pixels and non-cloud pixels. The results are benchmarked against established cloud detection methods using standard accuracy metrics.more » « less