skip to main content

Title: LASO: Exploiting Locomotive and Acoustic Signatures over the Edge to Annotate IMU Data for Human Activity Recognition
Annotated IMU sensor data from smart devices and wearables are essential for developing supervised models for fine-grained human activity recognition, albeit generating sufficient annotated data for diverse human activities under different environments is challenging. Existing approaches primarily use human-in-the-loop based techniques, including active learning; however, they are tedious, costly, and time-consuming. Leveraging the availability of acoustic data from embedded microphones over the data collection devices, in this paper, we propose LASO, a multimodal approach for automated data annotation from acoustic and locomotive information. LASO works over the edge device itself, ensuring that only the annotated IMU data is collected, discarding the acoustic data from the device itself, hence preserving the audio-privacy of the user. In the absence of any pre-existing labeling information, such an auto-annotation is challenging as the IMU data needs to be sessionized for different time-scaled activities in a completely unsupervised manner. We use a change-point detection technique while synchronizing the locomotive information from the IMU data with the acoustic data, and then use pre-trained audio-based activity recognition models for labeling the IMU data while handling the acoustic noises. LASO efficiently annotates IMU data, without any explicit human intervention, with a mean accuracy of 0.93 ($\pm 0.04$) and more » 0.78 ($\pm 0.05$) for two different real-life datasets from workshop and kitchen environments, respectively. « less
; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Recently, significant efforts are made to explore device-free human activity recognition techniques that utilize the information collected by existing indoor wireless infrastructures without the need for the monitored subject to carry a dedicated device. Most of the existing work, however, focuses their attention on the analysis of the signal received by a single device. In practice, there are usually multiple devices "observing" the same subject. Each of these devices can be regarded as an information source and provides us an unique "view" of the observed subject. Intuitively, if we can combine the complementary information carried by the multiple views, we will be able to improve the activity recognition accuracy. Towards this end, we propose DeepMV, a unified multi-view deep learning framework, to learn informative representations of heterogeneous device-free data. DeepMV can combine different views' information weighted by the quality of their data and extract commonness shared across different environments to improve the recognition performance. To evaluate the proposed DeepMV model, we set up a testbed using commercialized WiFi and acoustic devices. Experiment results show that DeepMV can effectively recognize activities and outperform the state-of-the-art human activity recognition methods.
  2. The success and impact of activity recognition algorithms largely depends on the availability of the labeled training samples and adaptability of activity recognition models across various domains. In a new environment, the pre-trained activity recognition models face challenges in presence of sensing bias- ness, device heterogeneities, and inherent variabilities in human behaviors and activities. Activity Recognition (AR) system built in one environment does not scale well in another environment, if it has to learn new activities and the annotated activity samples are scarce. Indeed building a new activity recognition model and training the model with large annotated samples often help overcome this challenging problem. However, collecting annotated samples is cost-sensitive and learning activity model at wild is computationally expensive. In this work, we propose an activity recognition framework, UnTran that utilizes source domains' pre-trained autoencoder enabled activity model that transfers two layers of this network to generate a common feature space for both source and target domain activities. We postulate a hybrid AR framework that helps fuse the decisions from a trained model in source domain and two activity models (raw and deep-feature based activity model) in target domain reducing the demand of annotated activity samples to help recognize unseenmore »activities. We evaluated our framework with three real-world data traces consisting of 41 users and 26 activities in total. Our proposed UnTran AR framework achieves ≈ 75% F1 score in recognizing unseen new activities using only 10% labeled activity data in the target domain. UnTran attains ≈ 98% F1 score while recognizing seen activities in presence of only 2-3% of labeled activity samples.« less
  3. Human activity recognition (HAR) from wearable sensor data has recently gained widespread adoption in a number of fields. However, recognizing complex human activities, postural and rhythmic body movements (e.g., dance, sports) is challenging due to the lack of domain-specific labeling information, the perpetual variability in human movement kinematics profiles due to age, sex, dexterity and the level of professional training. In this paper, we propose a deep activity recognition model to work with limited labeled data, both for simple and complex human activities. To mitigate the intra- and inter-user spatio-temporal variability of movements, we posit novel data augmentation and domain normalization techniques. We depict a semi-supervised technique that learns noise and transformation invariant feature representation from sparsely labeled data to accommodate intra-personal and inter-user variations of human movement kinematics. We also postulate a transfer learning approach to learn domain invariant feature representations by minimizing the feature distribution distance between the source and target domains. We showcase the improved performance of our proposed framework, AugToAct, using a public HAR dataset. We also design our own data collection, annotation and experimental setup on complex dance activity recognition steps and kinematics movements where we achieved higher performance metrics with limited label data comparedmore »to simple activity recognition tasks.« less
  4. This paper presents a portable inertial measurement unit (IMU)-based motion sensing system and proposed an adaptive gait phase detection approach for non-steady state walking and multiple activities (walking, running, stair ascent, stair descent, squat) monitoring. The algorithm aims to overcome the limitation of existing gait detection methods that are time-domain thresholding based for steady-state motion and are not versatile to detect gait during different activities or different gait patterns of the same activity. The portable sensing suit is composed of three IMU sensors (wearable sensors for gait phase detection) and two footswitches (ground truth measurement and not needed for gait detection of the proposed algorithm). The acceleration, angular velocity, Euler angle, resultant acceleration, and resultant angular velocity from three IMUs are used as the input training data and the data of two footswitches used as the training label data (single support, double support, swing phase). Three methods 1) Logistic Regression (LR), 2) Random Forest Classifier (RF), and 3) Artificial Neural Network (NN) are used to build the gait phase detection models. The result shows our proposed gait phase detection with Random Forest Classifier can achieve 98.94% accuracy in walking, 98.45% in running, 99.15% in stair-ascent, 99.00% in stair-descent, and 99.63%more »in squatting. It demonstrates that our sensing suit can not only detect the gait status in any transient state but also generalize to multiple activities. Therefore, it can be implemented in real-time monitoring of human gait and control of assistive devices.« less
  5. While inferring human activities from sensors embedded in mobile devices using machine learning algorithms has been studied, current research relies primarily on sensor data that are collected in controlled settings often with healthy individuals. Currently, there exists a gap in research about how to design activity recognition models based on sensor data collected with chronically-ill individuals and in free-living environments. In this paper, we focus on a situation where free-living activity data are collected continuously, activity vocabulary (i.e., class labels) are not known as a priori, and sensor data are annotated by end-users through an active learning process. By analyzing sensor data collected in a clinical study involving patients with cardiovascular disease, we demonstrate significant challenges that arise while inferring physical activities in uncontrolled environments. In particular, we observe that activity labels that are distinct in syntax can refer to semantically-identical behaviors, resulting in a sparse label space. To construct a meaningful label space, we propose LabelMerger, a framework for restructuring the label space created through active learning in uncontrolled environments in preparation for training activity recognition models. LabelMerger combines the semantic meaning of activity labels with physical attributes of the activities (i.e., domain knowledge) to generate a flexible andmore »meaningful representation of the labels. Specifically, our approach merges labels using both word embedding techniques from the natural language processing domain and activity intensity from the physical activity research. We show that the new representation of the sensor data obtained by LabelMerger results in more accurate activity recognition models compared to the case where original label space is used to learn recognition models.« less