Published research highlights the presence of demographic bias in automated facial attribute classification. The proposed bias mitigation techniques are mostly based on supervised learning, which requires a large amount of labeled training data for generalizability and scalability. However, labeled data is limited, requires laborious annotation, poses privacy risks, and can perpetuate human bias. In contrast, self-supervised learning (SSL) capitalizes on freely available unlabeled data, rendering trained models more scalable and generalizable. However, these label-free SSL models may also introduce biases by sampling false negative pairs, especially at low-data regimes (< 200K images) under low compute settings. Further, SSL-based models may suffer from performance degradation due to a lack of quality assurance of the unlabeled data sourced from the web. This paper proposes a fully self-supervised pipeline for demographically fair facial attribute classifiers. Leveraging completely unlabeled data pseudolabeled via pre-trained encoders, diverse data curation techniques, and meta-learning-based weighted contrastive learning, our method significantly outperforms existing SSL approaches proposed for downstream image classification tasks. Extensive evaluations on the FairFace and CelebA datasets demonstrate the efficacy of our pipeline in obtaining fair performance over existing baselines. Thus, setting a new benchmark for SSL in the fairness of facial attribute classification.
more »
« less
Adaptive Graph Guided Embedding for Multi-label Annotation
Multi-label annotation is challenging since a large amount of well-labeled training data are required to achieve promising performance. However, providing such data is expensive while unlabeled data are widely available. To this end, we propose a novel Adaptive Graph Guided Embedding (AG2E) approach for multi-label annotation in a semi-supervised fashion, which utilizes limited labeled data associating with large-scale unlabeled data to facilitate learning performance. Specifically, a multi-label propagation scheme and an effective embedding are jointly learned to seek a latent space where unlabeled instances tend to be well assigned multiple labels. Furthermore, a locality structure regularizer is designed to preserve the intrinsic structure and enhance the multi-label annotation. We evaluate our model in both conventional multi-label learning and zero-shot learning scenario. Experimental results demonstrate that our approach outperforms other compared state-of-the-art methods.
more »
« less
- Award ID(s):
- 1651902
- PAR ID:
- 10065421
- Date Published:
- Journal Name:
- IJCAI
- Page Range / eLocation ID:
- 2798 to 2804
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)We propose a semi-supervised learning approach for video classification, VideoSSL, using convolutional neural networks (CNN). Like other computer vision tasks, existing supervised video classification methods demand a large amount of labeled data to attain good performance. However, annotation of a large dataset is expensive and time consuming. To minimize the dependence on a large annotated dataset, our proposed semi-supervised method trains from a small number of labeled examples and exploits two regulatory signals from unlabeled data. The first signal is the pseudo-labels of unlabeled examples computed from the confidences of the CNN being trained. The other is the normalized probabilities, as predicted by an image classifier CNN, that captures the information about appearances of the interesting objects in the video. We show that, under the supervision of these guiding signals from unlabeled examples, a video classification CNN can achieve impressive performances utilizing a small fraction of annotated examples on three publicly available datasets: UCF101, HMDB51, and Kinetics.more » « less
-
Physiological and behavioral data collected from wearable or mobile sensors have been used to estimate self-reported stress levels. Since stress annotation usually relies on self-reports during the study, a limited amount of labeled data can be an obstacle to developing accurate and generalized stress-predicting models. On the other hand, the sensors can continuously capture signals without annotations. This work investigates leveraging unlabeled wearable sensor data for stress detection in the wild. We propose a two-stage semi-supervised learning framework that leverages wearable sensor data to help with stress detection. The proposed structure consists of an auto-encoder pre-training method for learning information from unlabeled data and the consistency regularization approach to enhance the robustness of the model. Besides, we propose a novel active sampling method for selecting unlabeled samples to avoid introducing redundant information to the model. We validate these methods using two datasets with physiological signals and stress labels collected in the wild, as well as four human activity recognition (HAR) datasets to evaluate the generality of the proposed method. Our approach demonstrated competitive results for stress detection, improving stress classification performance by approximately 7% to 10% on the stress detection datasets compared to the baseline supervised learning models. Furthermore, the ablation study we conducted for the HAR tasks supported the effectiveness of our methods. Our approach showed comparable performance to state-of-the-art semi-supervised learning methods for both stress detection and HAR tasks.more » « less
-
The scarcity of labeled data has traditionally been the primary hindrance in building scalable supervised deep learning models that can retain adequate performance in the presence of various heterogeneities in sample distributions. Domain adaptation tries to address this issue by adapting features learned from a smaller set of labeled samples to that of the incoming unlabeled samples. The traditional domain adaptation approaches normally consider only a single source of labeled samples, but in real world use cases, labeled samples can originate from multiple-sources – providing motivation for multi-source domain adaptation (MSDA). Several MSDA approaches have been investigated for wearable sensor-based human activity recognition (HAR) in recent times, but their performance improvement compared to single source counterpart remained marginal. To remedy this performance gap that, we explore multiple avenues to align the conditional distributions in addition to the usual alignment of marginal ones. In our investigation, we extend an existing multi-source domain adaptation approach under semi-supervised settings. We assume the availability of partially labeled target domain data and further explore the pseudo labeling usage with a goal to achieve a performance similar to the former. In our experiments on three publicly available datasets, we find that a limited labeled target domain data and pseudo label data boost the performance over the unsupervised approach by 10-35% and 2-6%, respectively, in various domain adaptation scenarios.more » « less
-
Abstract Timely and accurate bearing fault detection plays an important role in various industries. Data-driven deep learning methods have recently become a prevailing approach for bearing fault detection. Despite the success of deep learning, fault diagnosis performance is hinged upon the size of labeled data, the acquisition of which oftentimes is expensive in actual practice. Unlabeled data, on the other hand, are inexpensive. To fully utilize a large amount of unlabeled data together with limited labeled data to enhance fault detection performance, in this research, we develop a semi-supervised learning method built upon the autoencoder. In this method, a joint loss is established to account for the effects of both the labeled and unlabeled data, which is subsequently used to direct the backpropagation training. Systematic case studies using the Case Western Reserve University (CWRU) rolling bearing dataset are carried out, in which the effectiveness of this new method is verified by comparing it with other benchmark models.more » « less
An official website of the United States government

