skip to main content


Title: Adaptive Graph Guided Embedding for Multi-label Annotation

Multi-label annotation is challenging since a large amount of well-labeled training data are required to achieve promising performance. However, providing such data is expensive while unlabeled data are widely available. To this end, we propose a novel Adaptive Graph Guided Embedding (AG2E) approach for multi-label annotation in a semi-supervised fashion, which utilizes limited labeled data associating with large-scale unlabeled data to facilitate learning performance. Specifically, a multi-label propagation scheme and an effective embedding are jointly learned to seek a latent space where unlabeled instances tend to be well assigned multiple labels. Furthermore, a locality structure regularizer is designed to preserve the intrinsic structure and enhance the multi-label annotation. We evaluate our model in both conventional multi-label learning and zero-shot learning scenario. Experimental results demonstrate that our approach outperforms other compared state-of-the-art methods.

 
more » « less
Award ID(s):
1651902
NSF-PAR ID:
10065421
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IJCAI
Page Range / eLocation ID:
2798 to 2804
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Physiological and behavioral data collected from wearable or mobile sensors have been used to estimate self-reported stress levels. Since stress annotation usually relies on self-reports during the study, a limited amount of labeled data can be an obstacle to developing accurate and generalized stress-predicting models. On the other hand, the sensors can continuously capture signals without annotations. This work investigates leveraging unlabeled wearable sensor data for stress detection in the wild. We propose a two-stage semi-supervised learning framework that leverages wearable sensor data to help with stress detection. The proposed structure consists of an auto-encoder pre-training method for learning information from unlabeled data and the consistency regularization approach to enhance the robustness of the model. Besides, we propose a novel active sampling method for selecting unlabeled samples to avoid introducing redundant information to the model. We validate these methods using two datasets with physiological signals and stress labels collected in the wild, as well as four human activity recognition (HAR) datasets to evaluate the generality of the proposed method. Our approach demonstrated competitive results for stress detection, improving stress classification performance by approximately 7% to 10% on the stress detection datasets compared to the baseline supervised learning models. Furthermore, the ablation study we conducted for the HAR tasks supported the effectiveness of our methods. Our approach showed comparable performance to state-of-the-art semi-supervised learning methods for both stress detection and HAR tasks. 
    more » « less
  2. The scarcity of labeled data has traditionally been the primary hindrance in building scalable supervised deep learning models that can retain adequate performance in the presence of various heterogeneities in sample distributions. Domain adaptation tries to address this issue by adapting features learned from a smaller set of labeled samples to that of the incoming unlabeled samples. The traditional domain adaptation approaches normally consider only a single source of labeled samples, but in real world use cases, labeled samples can originate from multiple-sources – providing motivation for multi-source domain adaptation (MSDA). Several MSDA approaches have been investigated for wearable sensor-based human activity recognition (HAR) in recent times, but their performance improvement compared to single source counterpart remained marginal. To remedy this performance gap that, we explore multiple avenues to align the conditional distributions in addition to the usual alignment of marginal ones. In our investigation, we extend an existing multi-source domain adaptation approach under semi-supervised settings. We assume the availability of partially labeled target domain data and further explore the pseudo labeling usage with a goal to achieve a performance similar to the former. In our experiments on three publicly available datasets, we find that a limited labeled target domain data and pseudo label data boost the performance over the unsupervised approach by 10-35% and 2-6%, respectively, in various domain adaptation scenarios. 
    more » « less
  3. null (Ed.)
    We propose a semi-supervised learning approach for video classification, VideoSSL, using convolutional neural networks (CNN). Like other computer vision tasks, existing supervised video classification methods demand a large amount of labeled data to attain good performance. However, annotation of a large dataset is expensive and time consuming. To minimize the dependence on a large annotated dataset, our proposed semi-supervised method trains from a small number of labeled examples and exploits two regulatory signals from unlabeled data. The first signal is the pseudo-labels of unlabeled examples computed from the confidences of the CNN being trained. The other is the normalized probabilities, as predicted by an image classifier CNN, that captures the information about appearances of the interesting objects in the video. We show that, under the supervision of these guiding signals from unlabeled examples, a video classification CNN can achieve impressive performances utilizing a small fraction of annotated examples on three publicly available datasets: UCF101, HMDB51, and Kinetics. 
    more » « less
  4. In this work, we show that for a nontrivial hypothesis class C, we can estimate the distance of a target function f to C (estimate the error rate of the best h∈C) using substantially fewer labeled examples than would be needed to actually {\em learn} a good h∈C. Specifically, we show that for the class C of unions of d intervals on the line, in the active learning setting in which we have access to a pool of unlabeled examples drawn from an arbitrary underlying distribution D, we can estimate the error rate of the best h∈C to an additive error ϵ with a number of label requests that is {\em independent of d} and depends only on ϵ. In particular, we make O((1/ϵ^6)log(1/ϵ)) label queries to an unlabeled pool of size O((d/ϵ^2)log(1/ϵ)). This task of estimating the distance of an unknown f to a given class C is called {\em tolerant testing} or {\em distance estimation} in the testing literature, usually studied in a membership query model and with respect to the uniform distribution. Our work extends that of Balcan et al. (2012) who solved the {\em non}-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than ϵ). We also consider the related problem of estimating the performance of a given learning algorithm A in this setting. That is, given a large pool of unlabeled examples drawn from distribution D, can we, from only a few label queries, estimate how well A would perform if the entire dataset were labeled and given as training data to A? We focus on k-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of k for the given learning problem). 
    more » « less
  5. Abstract Background

    Natural language processing (NLP) tasks in the health domain often deal with limited amount of labeled data due to high annotation costs and naturally rare observations. To compensate for the lack of training data, health NLP researchers often have to leverage knowledge and resources external to a task at hand. Recently, pretrained large-scale language models such as the Bidirectional Encoder Representations from Transformers (BERT) have been proven to be a powerful way of learning rich linguistic knowledge from massive unlabeled text and transferring that knowledge to downstream tasks. However, previous downstream tasks often used training data at such a large scale that is unlikely to obtain in the health domain. In this work, we aim to study whether BERT can still benefit downstream tasks when training data are relatively small in the context of health NLP.

    Method

    We conducted a learning curve analysis to study the behavior of BERT and baseline models as training data size increases. We observed the classification performance of these models on two disease diagnosis data sets, where some diseases are naturally rare and have very limited observations (fewer than 2 out of 10,000). The baselines included commonly used text classification models such as sparse and dense bag-of-words models, long short-term memory networks, and their variants that leveraged external knowledge. To obtain learning curves, we incremented the amount of training examples per disease from small to large, and measured the classification performance in macro-averaged$$F_{1}$$F1score.

    Results

    On the task of classifying all diseases, the learning curves of BERT were consistently above all baselines, significantly outperforming them across the spectrum of training data sizes. But under extreme situations where only one or two training documents per disease were available, BERT was outperformed by linear classifiers with carefully engineered bag-of-words features.

    Conclusion

    As long as the amount of training documents is not extremely few, fine-tuning a pretrained BERT model is a highly effective approach to health NLP tasks like disease classification. However, in extreme cases where each class has only one or two training documents and no more will be available, simple linear models using bag-of-words features shall be considered.

     
    more » « less