skip to main content


Title: Active Deep Learning for Activity Recognition with Context Aware Annotator Selection
Machine learning models are bounded by the credibility of ground truth data used for both training and testing. Regardless of the problem domain, this ground truth annotation is objectively manual and tedious as it needs considerable amount of human intervention. With the advent of Active Learning with multiple annotators, the burden can be somewhat mitigated by actively acquiring labels of most informative data instances. However, multiple annotators with varying degrees of expertise poses new set of challenges in terms of quality of the label received and availability of the annotator. Due to limited amount of ground truth information addressing the variabilities of Activity of Daily Living (ADLs), activity recognition models using wearable and mobile devices are still not robust enough for real-world deployment. In this paper, we propose an active learning combined deep model which updates its network parameters based on the optimization of a joint loss function. We then propose a novel annotator selection model by exploiting the relationships among the users while considering their heterogeneity with respect to their expertise, physical and spatial context. Our proposed model leverages model-free deep reinforcement learning in a partially observable environment setting to capture the actionreward interaction among multiple annotators. Our experiments in real-world settings exhibit that our active deep model converges to optimal accuracy with fewer labeled instances and achieves 8% improvement in accuracy in fewer iterations.  more » « less
Award ID(s):
1750936
NSF-PAR ID:
10144109
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Anchorage, Alaska, August 2019
Page Range / eLocation ID:
1862 to 1870
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Precise and eloquent label information is fundamental for interpreting the underlying data distributions distinctively and training of supervised and semi-supervised learning models adequately. But obtaining large amount of labeled data demands substantial manual effort. This obligation can be mitigated by acquiring labels of most informative data instances using Active Learning. However labels received from humans are not always reliable and poses the risk of introducing noisy class labels which will degrade the efficacy of a model instead of its improvement. In this paper, we address the problem of annotating sensor data instances of various Activities of Daily Living (ADLs) in smart home context. We exploit the interactions between the users and annotators in terms of relationships spanning across spatial and temporal space which accounts for an activity as well. We propose a novel annotator selection model SocialAnnotator which exploits the interactions between the users and annotators and rank the annotators based on their level of correspondence. We also introduce a novel approach to measure this correspondence distance using the spatial and temporal information of interactions, type of the relationships and activities. We validate our proposed SocialAnnotator framework in smart environments achieving ≈ 84% statistical confidence in data annotation 
    more » « less
  2. null (Ed.)
    Crowdsourcing provides a practical way to obtain large amounts of labeled data at a low cost. However, the annotation quality of annotators varies considerably, which imposes new challenges in learning a high-quality model from the crowdsourced annotations. In this work, we provide a new perspective to decompose annotation noise into common noise and individual noise and differentiate the source of confusion based on instance difficulty and annotator expertise on a per-instance-annotator basis. We realize this new crowdsourcing model by an end-to-end learning solution with two types of noise adaptation layers: one is shared across annotators to capture their commonly shared confusions, and the other one is pertaining to each annotator to realize individual confusion. To recognize the source of noise in each annotation, we use an auxiliary network to choose from the two noise adaptation layers with respect to both instances and annotators. Extensive experiments on both synthesized and real-world benchmarks demonstrate the effectiveness of our proposed common noise adaptation solution. 
    more » « less
  3. Using noisy crowdsourced labels from multiple annotators, a deep learning-based end-to-end (E2E) system aims to learn the label correction mechanism and the neural classifier simultaneously. To this end, many E2E systems concatenate the neural classifier with multiple annotator-specific label confusion layers and co-train the two parts in a parameter-coupled manner. The formulated coupled cross-entropy minimization (CCEM)-type criteria are intuitive and work well in practice. Nonetheless, theoretical understanding of the CCEM criterion has been limited. The contribution of this work is twofold: First, performance guarantees of the CCEM criterion are presented. Our analysis reveals for the first time that the CCEM can indeed correctly identify the annotators' confusion characteristics and the desired ``ground-truth'' neural classifier under realistic conditions, e.g., when only incomplete annotator labeling and finite samples are available. Second, based on the insights learned from our analysis, two regularized variants of the CCEM are proposed. The regularization terms provably enhance the identifiability of the target model parameters in various more challenging cases. A series of synthetic and real data experiments are presented to showcase the effectiveness of our approach. 
    more » « less
  4. Crowdsourcing has rapidly become a computing paradigm in machine learning and artificial intelligence. In crowdsourcing, multiple labels are collected from crowd-workers on an instance usually through the Internet. These labels are then aggregated as a single label to match the ground truth of the instance. Due to its open nature, human workers in crowdsourcing usually come with various levels of knowledge and socio-economic backgrounds. Effectively handling such human factors has been a focus in the study and applications of crowdsourcing. For example, Bi et al studied the impacts of worker's dedication, expertise, judgment, and task difficulty (Bi et al 2014). Qiu et al offered methods for selecting workers based on behavior prediction (Qiu et al 2016). Barbosa and Chen suggested rehumanizing crowdsourcing to deal with human biases (Barbosa 2019). Checco et al studied adversarial attacks on crowdsourcing for quality control (Checco et al 2020). There are many more related works available in literature. In contrast to commonly used binary-valued labels, interval-valued labels (IVLs) have been introduced very recently (Hu et al 2021). Applying statistical and probabilistic properties of interval-valued datasets, Spurling et al quantitatively defined worker's reliability in four measures: correctness, confidence, stability, and predictability (Spurling et al 2021). Calculating these measures, except correctness, does not require the ground truth of each instance but only worker’s IVLs. Applying these quantified reliability measures, people have significantly improved the overall quality of crowdsourcing (Spurling et al 2022). However, in real world applications, the reliability of a worker may vary from time to time rather than a constant. It is necessary to monitor worker’s reliability dynamically. Because a worker j labels instances sequentially, we treat j’s IVLs as an interval-valued time series in our approach. Assuming j’s reliability relies on the IVLs within a time window only, we calculate j’s reliability measures with the IVLs in the current time window. Moving the time window forward with our proposed practical strategies, we can monitor j’s reliability dynamically. Furthermore, the four reliability measures derived from IVLs are time varying too. With regression analysis, we can separate each reliability measure as an explainable trend and possible errors. To validate our approaches, we use four real world benchmark datasets in our computational experiments. Here are the main findings. The reliability weighted interval majority voting (WIMV) and weighted preferred matching probability (WPMP) schemes consistently overperform the base schemes in terms of much higher accuracy, precision, recall, and F1-score. Note: the base schemes are majority voting (MV), interval majority voting (IMV), and preferred matching probability (PMP). Through monitoring worker’s reliability, our computational experiments have successfully identified possible attackers. By removing identified attackers, we have ensured the quality. We have also examined the impact of window size selection. It is necessary to monitor worker’s reliability dynamically, and our computational results evident the potential success of our approaches.This work is partially supported by the US National Science Foundation through the grant award NSF/OIA-1946391.

     
    more » « less
  5. Activity Recognition (AR) models perform well with a large number of available training instances. However, in the presence of sensor heterogeneity, sensing biasness and variability of human behaviors and activities and unseen activity classes pose key challenges to adopting and scaling these pre-trained activity recognition models in the new environment. These challenging unseen activities recognition problems are addressed by applying transfer learning techniques that leverage a limited number of annotated samples and utilize the inherent structural patterns among activities within and across the source and target domains. This work proposes a novel AR framework that uses the pre-trained deep autoencoder model and generates features from source and target activity samples. Furthermore, this AR frame-work establishes correlations among activities between the source and target domain by exploiting intra- and inter-class knowledge transfer to mitigate the number of labeled samples and recognize unseen activities in the target domain. We validated the efficacy and effectiveness of our AR framework with three real-world data traces (Daily and Sports, Opportunistic, and Wisdm) that contain 41 users and 26 activities in total. Our AR framework achieves performance gains ≈ 5-6% with 111, 18, and 70 activity samples (20 % annotated samples) for Das, Opp, and Wisdm datasets. In addition, our proposed AR framework requires 56, 8, and 35 fewer activity samples (10% fewer annotated examples) for Das, Opp, and Wisdm, respectively, compared to the state-of-the-art Untran model. 
    more » « less