skip to main content

Title: Personalized Activity Recognition using Partially Available Target Data
Recent years have witnessed a growing body of research on autonomous activity recognition models for use in deployment of mobile systems in new settings such as when a wearable system is adopted by a new user. Current research, however, lacks comprehensive frameworks for transfer learning. Specifically, it lacks the ability to deal with partially available data in new settings. To address these limitations, we propose {\it OptiMapper}, a novel uninformed cross-subject transfer learning framework for activity recognition. OptiMapper is a combinatorial optimization framework that extracts abstract knowledge across subjects and utilizes this knowledge for developing a personalized and accurate activity recognition model in new subjects. To this end, a novel community-detection-based clustering of unlabeled data is proposed that uses the target user data to construct a network of unannotated sensor observations. The clusters of these target observations are then mapped onto the source clusters using a complete bipartite graph model. In the next step, the mapped labels are conditionally fused with the prediction of a base learner to create a personalized and labeled training dataset for the target user. We present two instantiations of OptiMapper. The first instantiation, which is applicable for transfer learning across domains with identical activity labels, more » performs a one-to-one bipartite mapping between clusters of the source and target users. The second instantiation performs optimal many-to-one mapping between the source clusters and those of the target. The many-to-one mapping allows us to find an optimal mapping even when the target dataset does not contain sufficient instances of all activity classes. We show that this type of cross-domain mapping can be formulated as a transportation problem and solved optimally. We evaluate our transfer learning techniques on several activity recognition datasets. Our results show that the proposed community detection approach can achieve, on average, 69%$ utilization of the datasets for clustering with an overall clustering accuracy of 87.5%. Our results also suggest that the proposed transfer learning algorithms can achieve up to 22.5% improvement in the activity recognition accuracy, compared to the state-of-the-art techniques. The experimental results also demonstrate high and sustained performance even in presence of partial data. « less
; ; ;
Award ID(s):
1750679 1932346
Publication Date:
Journal Name:
IEEE Transactions on Mobile Computing
Page Range or eLocation-ID:
1 to 1
Sponsoring Org:
National Science Foundation
More Like this
  1. Human activity recognition (HAR) from wearable sensors data has become ubiquitous due to the widespread proliferation of IoT and wearable devices. However, recognizing human activity in heterogeneous environments, for example, with sensors of different models and make, across different persons and their on-body sensor placements introduces wide range discrepancies in the data distributions, and therefore, leads to an increased error margin. Transductive transfer learning techniques such as domain adaptation have been quite successful in mitigating the domain discrepancies between the source and target domain distributions without the costly target domain data annotations. However, little exploration has been done when multiple distinct source domains are present, and the optimum mapping to the target domain from each source is not apparent. In this paper, we propose a deep Multi-Source Adversarial Domain Adaptation (MSADA) framework that opportunistically helps select the most relevant feature representations from multiple source domains and establish such mappings to the target domain by learning the perplexity scores. We showcase that the learned mappings can actually reflect our prior knowledge on the semantic relationships between the domains, indicating that MSADA can be employed as a powerful tool for exploratory activity data analysis. We empirically demonstrate that our proposed multi-source domainmore »adaptation approach achieves 2% improvement with OPPORTUNITY dataset (cross-person heterogeneity, 4 ADLs), whereas 13% improvement on DSADS dataset (cross-position heterogeneity, 10 ADLs and sports activities).« less
  2. The increased ubiquitousness of small smart devices, such as cell- phones, tablets, smart watches and laptops, has led to unique user data, which can be locally processed. The sensors (e.g., microphones and webcam) and improved hardware of the new devices have al- lowed running deep learning models that 20 years ago would have been exclusive to high-end expensive machines. In spite of this progress, state-of-the-art algorithms for facial expression recognition (FER) rely on architectures that cannot be implemented on these devices due to computational and memory constraints. Alternatives involving cloud-based solutions impose privacy barriers that prevent their adoption or user acceptance in wide range of applications. This paper proposes a lightweight model that can run in real-time for image facial expression recognition (IFER) and video facial expression recognition (VFER). The approach relies on a personalization mechanism locally implemented for each subject by fine-tuning a central VFER model with unlabeled videos from a target subject. We train the IFER model to generate pseudo labels and we select the videos with the highest confident predictions to be used for adaptation. The adaptation is performed by implementing a federated learning strategy where the weights of the local model are averaged and used bymore »the central VFER model. We demonstrate that this approach can improve not only the performance on the edge device providing personalized models to the users, but also the central VFER model. We implement a federated learning strategy where the weights of the local models are averaged and used by the central VFER. Within corpus and cross-corpus evaluations on two emotional databases demonstrate that edge models adapted with our personalization strategy achieve up to 13.1% gains in F1-scores. Furthermore, the federated learning implementation improves the mean micro F1-score of the central VFER model by up to 3.4%. The proposed lightweight solution is ideal for interactive user interfaces that preserve the data of the users.« less
  3. Automatic speech emotion recognition provides computers with critical context to enable user understanding. While methods trained and tested within the same dataset have been shown successful, they often fail when applied to unseen datasets. To address this, recent work has focused on adversarial methods to find more generalized representations of emotional speech. However, many of these methods have issues converging, and only involve datasets collected in laboratory conditions. In this paper, we introduce Adversarial Discriminative Domain Generalization (ADDoG), which follows an easier to train “meet in the middle“ approach. The model iteratively moves representations learned for each dataset closer to one another, improving cross-dataset generalization. We also introduce Multiclass ADDoG, or MADDoG, which is able to extend the proposed method to more than two datasets, simultaneously. Our results show consistent convergence for the introduced methods, with significantly improved results when not using labels from the target dataset. We also show how, in most cases, ADDoG and MADDoG can be used to improve upon baseline state-of-the-art methods when target dataset labels are added and in-the-wild data are considered. Even though our experiments focus on cross-corpus speech emotion, these methods could be used to remove unwanted factors of variation in other settings.
  4. We address the problem of human action classification in drone videos. Due to the high cost of capturing and labeling large-scale drone videos with diverse actions, we present unsupervised and semi-supervised domain adaptation approaches that leverage both the existing fully annotated action recognition datasets and unannotated (or only a few annotated) videos from drones. To study the emerging problem of drone-based action recognition, we create a new dataset, NEC-DRONE, containing 5,250 videos to evaluate the task. We tackle both problem settings with 1) same and 2) different action label sets for the source (e.g., Kinectics dataset) and target domains (drone videos). We present a combination of video and instance-based adaptation methods, paired with either a classifier or an embedding-based framework to transfer the knowledge from source to target. Our results show that the proposed adaptation approach substantially improves the performance on these challenging and practical tasks. We further demonstrate the applicability of our method for learning cross-view action recognition on the Charades-Ego dataset. We provide qualitative analysis to understand the behaviors of our approaches.
  5. Human activity recognition (HAR) from wearable sensor data has recently gained widespread adoption in a number of fields. However, recognizing complex human activities, postural and rhythmic body movements (e.g., dance, sports) is challenging due to the lack of domain-specific labeling information, the perpetual variability in human movement kinematics profiles due to age, sex, dexterity and the level of professional training. In this paper, we propose a deep activity recognition model to work with limited labeled data, both for simple and complex human activities. To mitigate the intra- and inter-user spatio-temporal variability of movements, we posit novel data augmentation and domain normalization techniques. We depict a semi-supervised technique that learns noise and transformation invariant feature representation from sparsely labeled data to accommodate intra-personal and inter-user variations of human movement kinematics. We also postulate a transfer learning approach to learn domain invariant feature representations by minimizing the feature distribution distance between the source and target domains. We showcase the improved performance of our proposed framework, AugToAct, using a public HAR dataset. We also design our own data collection, annotation and experimental setup on complex dance activity recognition steps and kinematics movements where we achieved higher performance metrics with limited label data comparedmore »to simple activity recognition tasks.« less