Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available April 1, 2026
-
Annotating automatic target recognition images is challenging; for example, sometimes there is labeled data in the source domain but no labeled data in the target domain. Therefore, it is essential to construct an optimal target domain classifier using the labeled information of the source domain images. For this purpose, we propose a transductive transfer learning (TTL) network consisting of an unpaired domain translation network, a pretrained source domain classifier, and a gradually constructed target domain classifier. We delve into the unpaired domain translation network, which simultaneously optimizes cycle consistency and modulated noise contrastive losses (MoNCE). Furthermore, the proposed hybrid CUT module integrated into the TTL network generates synthetic negative patches by noisy features mixup, and all the negative patches provide modulated weight into the NCE loss by considering similarity to the query. Apart from that, this hybrid CUT network considers query selection by entropy-based attention to specifying domain variants and invariant regions. The extensive analysis depicted that the proposed transductive network can successfully annotate civilian, military vehicles, and ship targets into the three benchmark ATR datasets. We further demonstrate the importance of each component of the TTL network through extensive ablation studies into the DSIAC dataset.more » « less
-
We introduce caption-guided face recognition (CGFR) as a new framework to improve the performance of commercial-off-the-shelf (COTS) face recognition (FR) systems. In contrast to combining soft biometrics (e.g., facial marks, gender, and age) with face images, in this work, we use facial descriptions provided by face examiners as a piece of auxiliary information. However, due to the heterogeneity of the modalities, improving the performance by directly fusing the textual and facial features is very challenging, as both lie in different embedding spaces. In this paper, we propose a contextual feature aggregation module (CFAM) that addresses this issue by effectively exploiting the fine-grained word-region interaction and global image-caption association. Specifically, CFAM adopts a self-attention and a cross-attention scheme for improving the intra-modality and inter-modality relationship between the image and textual features, respectively. Additionally, we design a textual feature refinement module (TFRM) that refines the textual features of the pre-trained BERT encoder by updating the contextual embeddings. This module enhances the discriminative power of textual features with a cross-modal projection loss and realigns the word and caption embeddings with visual features by incorporating a visual-semantic alignment loss. We implemented the proposed CGFR framework on two face recognition models (ArcFace and AdaFace) and evaluated its performance on the Multi-Modal CelebA-HQ dataset. Our framework significantly improves the performance of ArcFace in both 1:1 verification and 1:N identification protocol.more » « less
-
Interoperability between contact to contactless images in fingerprint matching is a key factor in the success of contactless fingerprinting devices, which have recently witnessed an increasing demand for biometric authentication. However, due to the presence of perspective distortion and the absence of elastic deformation in contactless fingerphotos, direct matching between contactless fingerprint probe images and legacy contact-based gallery images produces a low accuracy. In this paper, to improve interoperability, we propose a coupled deep learning framework that consists of two Conditional Generative Adversarial Networks. Generative modeling is employed to find a projection that maximizes the pairwise correlation between these two domains in a common latent embedding subspace. Extensive experiments on three challenging datasets demonstrate significant performance improvements over the state-of-the-art methods and two top-performing commercial off-the-shelf SDKs, i.e., Verifinger 12.0 and Innovatrics. We also achieve a high-performance gain by combining multiple fingers of the same subject using a score fusion model.more » « less
An official website of the United States government
