skip to main content

Title: SocialAnnotator: Annotator Selection by Exploiting Social Relationships in Activity Recognition
Precise and eloquent label information is fundamental for interpreting the underlying data distributions distinctively and training of supervised and semi-supervised learning models adequately. But obtaining large amount of labeled data demands substantial manual effort. This obligation can be mitigated by acquiring labels of most informative data instances using Active Learning. However labels received from humans are not always reliable and poses the risk of introducing noisy class labels which will degrade the efficacy of a model instead of its improvement. In this paper, we address the problem of annotating sensor data instances of various Activities of Daily Living (ADLs) in smart home context. We exploit the interactions between the users and annotators in terms of relationships spanning across spatial and temporal space which accounts for an activity as well. We propose a novel annotator selection model SocialAnnotator which exploits the interactions between the users and annotators and rank the annotators based on their level of correspondence. We also introduce a novel approach to measure this correspondence distance using the spatial and temporal information of interactions, type of the relationships and activities. We validate our proposed SocialAnnotator framework in smart environments achieving ≈ 84% statistical confidence in data annotation
Award ID(s):
Publication Date:
Journal Name:
Proceedings of the IEEE AAAI 2018 Fall Symposium, Oct 2018
Sponsoring Org:
National Science Foundation
More Like this
  1. Machine learning models are bounded by the credibility of ground truth data used for both training and testing. Regardless of the problem domain, this ground truth annotation is objectively manual and tedious as it needs considerable amount of human intervention. With the advent of Active Learning with multiple annotators, the burden can be somewhat mitigated by actively acquiring labels of most informative data instances. However, multiple annotators with varying degrees of expertise poses new set of challenges in terms of quality of the label received and availability of the annotator. Due to limited amount of ground truth information addressing the variabilities of Activity of Daily Living (ADLs), activity recognition models using wearable and mobile devices are still not robust enough for real-world deployment. In this paper, we propose an active learning combined deep model which updates its network parameters based on the optimization of a joint loss function. We then propose a novel annotator selection model by exploiting the relationships among the users while considering their heterogeneity with respect to their expertise, physical and spatial context. Our proposed model leverages model-free deep reinforcement learning in a partially observable environment setting to capture the actionreward interaction among multiple annotators. Our experimentsmore »in real-world settings exhibit that our active deep model converges to optimal accuracy with fewer labeled instances and achieves 8% improvement in accuracy in fewer iterations.« less
  2. This work introduces Wearable deep learning (WearableDL) that is a unifying conceptual architecture inspired by the human nervous system, offering the convergence of deep learning (DL), Internet-of-things (IoT), and wearable technologies (WT) as follows: (1) the brain, the core of the central nervous system, represents deep learning for cloud computing and big data processing. (2) The spinal cord (a part of CNS connected to the brain) represents Internet-of-things for fog computing and big data flow/transfer. (3) Peripheral sensory and motor nerves (components of the peripheral nervous system (PNS)) represent wearable technologies as edge devices for big data collection. In recent times, wearable IoT devices have enabled the streaming of big data from smart wearables (e.g., smartphones, smartwatches, smart clothings, and personalized gadgets) to the cloud servers. Now, the ultimate challenges are (1) how to analyze the collected wearable big data without any background information and also without any labels representing the underlying activity; and (2) how to recognize the spatial/temporal patterns in this unstructured big data for helping end-users in decision making process, e.g., medical diagnosis, rehabilitation efficiency, and/or sports performance. Deep learning (DL) has recently gained popularity due to its ability to (1) scale to the big data sizemore »(scalability); (2) learn the feature engineering by itself (no manual feature extraction or hand-crafted features) in an end-to-end fashion; and (3) offer accuracy or precision in learning raw unlabeled/labeled (unsupervised/supervised) data. In order to understand the current state-of-the-art, we systematically reviewed over 100 similar and recently published scientific works on the development of DL approaches for wearable and person-centered technologies. The review supports and strengthens the proposed bioinspired architecture of WearableDL. This article eventually develops an outlook and provides insightful suggestions for WearableDL and its application in the field of big data analytics.« less
  3. In the era of big data, data-driven based classification has become an essential method in smart manufacturing to guide production and optimize inspection. The industrial data obtained in practice is usually time-series data collected by soft sensors, which are highly nonlinear, nonstationary, imbalanced, and noisy. Most existing soft-sensing machine learning models focus on capturing either intra-series temporal dependencies or pre-defined inter-series correlations, while ignoring the correlation between labels as each instance is associated with multiple labels simultaneously. In this paper, we propose a novel graph based soft-sensing neural network (GraSSNet) for multivariate time-series classification of noisy and highly-imbalanced soft-sensing data. The proposed GraSSNet is able to 1) capture the inter-series and intra-series dependencies jointly in the spectral domain; 2) exploit the label correlations by superimposing label graph that built from statistical co-occurrence information; 3) learn features with attention mechanism from both textual and numerical domain; and 4) leverage unlabeled data and mitigate data imbalance by semi-supervised learning. Comparative studies with other commonly used classifiers are carried out on Seagate soft sensing data, and the experimental results validate the competitive performance of our proposed method.
  4. Self-supervised learning of graph neural networks (GNN) is in great need because of the widespread label scarcity issue in real-world graph/network data. Graph contrastive learning (GCL), by training GNNs to maximize the correspondence between the representations of the same graph in its different augmented forms, may yield robust and transferable GNNs even without using labels. However, GNNs trained by traditional GCL often risk capturing redundant graph features and thus may be brittle and provide sub-par performance in downstream tasks. Here, we propose a novel principle, termed adversarial-GCL (\textit{AD-GCL}), which enables GNNs to avoid capturing redundant information during the training by optimizing adversarial graph augmentation strategies used in GCL. We pair AD-GCL with theoretical explanations and design a practical instantiation based on trainable edge-dropping graph augmentation. We experimentally validate AD-GCL by comparing with the state-of-the-art GCL methods and achieve performance gains of up-to~14\% in unsupervised, ~6\% in transfer and~3\% in semi-supervised learning settings overall with 18 different benchmark datasets for the tasks of molecule property regression and classification, and social network classification.
  5. Producing high-quality labeled data is a challenge in any supervised learning problem, where in many cases, human involvement is necessary to ensure the label quality. However, human annotations are not flawless, especially in the case of a challenging problem. In nontrivial problems, the high disagreement among annotators results in noisy labels, which affect the performance of any machine learning model. In this work, we consider three noise reduction strategies to improve the label quality in the Article-Comment Alignment Problem, where the main task is to classify article-comment pairs according to their relevancy level. The first considered labeling disagreement reduction strategy utilizes annotators' background knowledge during the label aggregation step. The second strategy utilizes user disagreement during the training process. In the third and final strategy, we ask annotators to perform corrections and relabel the examples with noisy labels. We deploy these strategies and compare them to a resampling strategy for addressing the class imbalance, another common supervised learning challenge. These alternatives were evaluated on ACAP, a multiclass text pairs classification problem with highly imbalanced data, where one of the classes represents at most 15% of the dataset's entire population. Our results provide evidence that considered strategies can reduce disagreement betweenmore »annotators. However, data quality improvement is insufficient to enhance classification accuracy in the article-comment alignment problem, which exhibits a high-class imbalance. The model performance is enhanced for the same problem by addressing the imbalance issue with a weight loss-based class distribution resampling. We show that allowing the model to pay more attention to the minority class during the training process with the presence of noisy examples improves the test accuracy by 3%.« less