skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on February 26, 2026

Title: MENTOR: Human Perception-Guided Pretraining for Increased Generalization
Leveraging human perception into training of convo- lutional neural networks (CNN) has boosted generalization capabilities of such models in open-set recognition tasks. One of the active research questions is where (in the model architecture or training pipeline) and how to efficiently incorporate always-limited human perceptual data into training strategies of models. In this paper, we introduce MENTOR (huMan pErceptioN-guided preTraining fOr increased geneRalization), which addresses this ques- tion through two unique rounds of training CNNs tasked with open-set anomaly detection. First, we train an au- toencoder to learn human saliency maps given an input image, without any class labels. The autoencoder is thus tasked with discovering domain-specific salient features which mimic human perception. Second, we remove the decoder part, add a classification layer on top of the encoder, and train this new model conventionally, now using class labels. We show that MENTOR successfully raises the generalization performance across three different CNN backbones in a variety of anomaly detection tasks (demonstrated for detection of unknown iris presentation attacks, synthetically-generated faces, and anomalies in chest X-ray images) compared to traditional pretraining methods (e.g., sourcing the weights from ImageNet), and as well as state- of-the-art methods that incorporate human perception guidance into training. In addition, we demonstrate that MENTOR can be flexibly applied to existing human perception- guided methods and subsequently increasing their generalization with no architectural modifications.  more » « less
Award ID(s):
2237880
PAR ID:
10585037
Author(s) / Creator(s):
;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3315-1083-1
Page Range / eLocation ID:
7470 to 7479
Format(s):
Medium: X
Location:
Tucson, AZ, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Due to the scarcity of reliable anomaly labels, recent anomaly detection methods leveraging noisy auto-generated labels either select clean samples or refurbish noisy labels. However, both approaches struggle due to the unique properties of anomalies.Sample selectionoften fails to separate sufficiently many clean anomaly samples from noisy ones, whilelabel refurbishmenterroneously refurbishesmarginalclean samples. To overcome these limitations, we design Unity, thefirstlearning from noisy labels (LNL) approach for anomaly detection that elegantly leverages the merits of both sample selection and label refurbishment to iteratively prepare a diverse clean sample set for network training. Unity uses a pair of deep anomaly networks to collaboratively select samples with clean labels based on prediction agreement, followed by a disagreement resolution mechanism to capture marginal samples with clean labels. Thereafter, Unity utilizes unique properties of anomalies to design an anomaly-centric contrastive learning strategy that accurately refurbishes the remaining noisy labels. The resulting set, composed ofselected and refurbishedclean samples, will be used to train the anomaly networks in the next training round. Our experimental study on 10 real-world benchmark datasets demonstrates that Unity consistently outperforms state-of-the-art LNL techniques by up to 0.31 in F-1 Score (0.52 \rightarrow 0.83). 
    more » « less
  2. Incorporating human-perceptual intelligence into model training has shown to increase the generalization capability of models in several difficult biometric tasks, such as presentation attack detection (PAD) and detection of synthetic samples. After the initial collection phase, human visual saliency (e.g., eye-tracking data, or handwritten annotations) can be integrated into model training through attention mechanisms, augmented training samples, or through human perception-related components of loss functions. Despite their successes, a vital, but seemingly neglected, aspect of any saliency-based training is the level of salience granularity (e.g., bounding boxes, single saliency maps, or saliency aggregated from multiple subjects) necessary to find a balance between reaping the full benefits of human saliency and the cost of its collection. In this paper, we explore several different levels of salience granularity and demonstrate that increased generalization capabilities of PAD and synthetic face detection can be achieved by using simple yet effective saliency post-processing techniques across several different CNNs. 
    more » « less
  3. Rubin, Stuart; Chen, Shu-Ching (Ed.)
    In this work, we use an unsupervised method for generating binary class labels in a novel context to create class labels for Medicare fraud detection. We examine how class imbalance influences the quality of these new labels and how it affects supervised classification. We use four different Medicare Part D fraud detection datasets, with the largest containing over 5 million instances. The other three datasets are sampled from the original dataset. Using Random Under-Sampling (RUS), we subsample from the majority class of the original data to produce three datasets with varying levels of class imbalance. To evaluate the performance of the newly created labels, we train a supervised classifier and evaluate its classification performance and compare it to an unsupervised anomaly detection method as a baseline. Our empirical findings indicate that the generated class labels are of high enough quality and enable effective supervised classifier training for fraud detection. Additionally, supervised classification with the new labels consistently outperforms the baseline used for comparison across all test scenarios. Further more, we observe an inverse relationship between class imbalance in the dataset and classifier performance, with AUPRC scores improving as the training dataset becomes more balanced. This work not only validates the efficacy of the synthesized class labels in labeling Medicare fraud but also shows its robustness across different degrees of class imbalance. 
    more » « less
  4. Many applications of machine learning require a model to make accurate predictions on test examples that are distributionally different from training ones, while task-specific labels are scarce during training. An effective approach to this challenge is to pre-train a model on related tasks where data is abundant, and then fine-tune it on a downstream task of interest. While pre-training has been effective in many language and vision domains, it remains an open question how to effectively use pre-training on graph datasets. In this paper, we develop a new strategy and self-supervised methods for pre-training Graph Neural Networks (GNNs). The key to the success of our strategy is to pre-train an expressive GNN at the level of individual nodes as well as entire graphs so that the GNN can learn useful local and global representations simultaneously. We systematically study pre-training on multiple graph classification datasets. We find that naïve strategies, which pre-train GNNs at the level of either entire graphs or individual nodes, give limited improvement and can even lead to negative transfer on many downstream tasks. In contrast, our strategy avoids negative transfer and improves generalization significantly across downstream tasks, leading up to 9.4% absolute improvements in ROC-AUC over non-pre-trained models and achieving state-of-the-art performance for molecular property prediction and protein function prediction. 
    more » « less
  5. Transformers, although first designed for sequence processing, can also handle unordered sets like point cloud data. Additionally, contrastive pretraining has emerged as a successful technique in image processing but remains unexplored for point cloud data. We develop and integrate a new point cloud pretraining technique inspired by the Simple Framework for Contrastive Learning (SimCLR) into the Set Transformer (ST) and Point Cloud Transformer (PCT) architectures and explore model performance using a novel 3D body scan dataset and the canonical datasets ShapeNet and ModelNet. For the 3D body scan dataset, this integration boosts initial training performance and maintains overall higher performance for classification tasks, and demonstrates better stability/convergence for regression tasks in comparison to non-pretrained (Naïve] counterparts. Furthermore, experiments examining strong generalization (relative performance on previously unseen classes) show improvement for pretrained models compared to Naïve models. Consistent benefits across tasks and data sets are observed based on additional experiments performed on the ShapeNet core dataset. Overall, we show how contrastive pretraining for point cloud data is a viable strategy for improving the performance of Transformers on downstream tasks and accelerating the training process. 
    more » « less