skip to main content

This content will become publicly available on December 17, 2023

Title: HAR-CTGAN: A Mobile Sensor Data Generation Tool for Human Activity Recognition
Human activity recognition (HAR) is the process of using mobile sensor data to determine the physical activities performed by individuals. HAR is the backbone of many mobile healthcare applications, such as passive health monitoring systems, early diagnosing systems, and fall detection systems. Effective HAR models rely on deep learning architectures and big data in order to accurately classify activities. Unfortunately, HAR datasets are expensive to collect, are often mislabeled, and have large class imbalances. State-of-the-art approaches to address these challenges utilize Generative Adversarial Networks (GANs) for generating additional synthetic data along with their labels. Problematically, these HAR GANs only synthesize continuous features — features that are represented by real numbers — recorded from gyroscopes, accelerometers, and other sensors that produce continuous data. This is limiting since mobile sensor data commonly has discrete features that provide additional context such as device location and the time-of-day, which have been shown to substantially improve HAR classification. Hence, we studied Conditional Tabular Generative Adversarial Networks (CTGANs) for data generation to synthesize mobile sensor data containing both continuous and discrete features, a task never been done by state-of-the-art approaches. We show HAR-CTGANs generate data with greater realism resulting in allowing better downstream performance in HAR more » models, and when state-of-the-art models were modified with HAR-CTGAN characteristics, downstream performance also improves. « less
; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
2022 IEEE International Conference on Big Data (Big Data)
Page Range or eLocation-ID:
5233 to 5242
Sponsoring Org:
National Science Foundation
More Like this
  1. Raynal, Ann M. ; Ranney, Kenneth I. (Ed.)
    Most research in technologies for the Deaf community have focused on translation using either video or wearable devices. Sensor-augmented gloves have been reported to yield higher gesture recognition rates than camera-based systems; however, they cannot capture information expressed through head and body movement. Gloves are also intrusive and inhibit users in their pursuit of normal daily life, while cameras can raise concerns over privacy and are ineffective in the dark. In contrast, RF sensors are non-contact, non-invasive and do not reveal private information even if hacked. Although RF sensors are unable to measure facial expressions or hand shapes, which would be required for complete translation, this paper aims to exploit near real-time ASL recognition using RF sensors for the design of smart Deaf spaces. In this way, we hope to enable the Deaf community to benefit from advances in technologies that could generate tangible improvements in their quality of life. More specifically, this paper investigates near real-time implementation of machine learning and deep learning architectures for the purpose of sequential ASL signing recognition. We utilize a 60 GHz RF sensor which transmits a frequency modulation continuous wave (FMWC waveform). RF sensors can acquire a unique source of information that ismore »inaccessible to optical or wearable devices: namely, a visual representation of the kinematic patterns of motion via the micro-Doppler signature. Micro-Doppler refers to frequency modulations that appear about the central Doppler shift, which are caused by rotational or vibrational motions that deviate from principle translational motion. In prior work, we showed that fractal complexity computed from RF data could be used to discriminate signing from daily activities and that RF data could reveal linguistic properties, such as coarticulation. We have also shown that machine learning can be used to discriminate with 99% accuracy the signing of native Deaf ASL users from that of copysigning (or imitation signing) by hearing individuals. Therefore, imitation signing data is not effective for directly training deep models. But, adversarial learning can be used to transform imitation signing to resemble native signing, or, alternatively, physics-aware generative models can be used to synthesize ASL micro-Doppler signatures for training deep neural networks. With such approaches, we have achieved over 90% recognition accuracy of 20 ASL signs. In natural environments, however, near real-time implementations of classification algorithms are required, as well as an ability to process data streams in a continuous and sequential fashion. In this work, we focus on extensions of our prior work towards this aim, and compare the efficacy of various approaches for embedding deep neural networks (DNNs) on platforms such as a Raspberry Pi or Jetson board. We examine methods for optimizing the size and computational complexity of DNNs for embedded micro-Doppler analysis, methods for network compression, and their resulting sequential ASL recognition performance.« less
  2. Recent advances in machine learning and deep neural networks have led to the realization of many important applications in the area of personalized medicine. Whether it is detecting activities of daily living or analyzing images for cancerous cells, machine learning algorithms have become the dominant choice for such emerging applications. In particular, the state-of-the-art algorithms used for human activity recognition (HAR) using wearable inertial sensors utilize machine learning algorithms to detect health events and to make predictions from sensor data. Currently, however, there remains a gap in research on whether or not and how activity recognition algorithms may become the subject of adversarial attacks. In this paper, we take the first strides on (1) investigating methods of generating adversarial example in the context of HAR systems; (2) studying the vulnerability of activity recognition models to adversarial examples in feature and signal domain; and (3) investigating the effects of adversarial training on HAR systems. We introduce Adar, a novel computational framework for optimization-driven creation of adversarial examples in sensor-based activity recognition systems. Through extensive analysis based on real sensor data collected with human subjects, we found that simple evasion attacks are able to decrease the accuracy of a deep neural networkmore »from 95.1% to 3.4% and from 93.1% to 16.8% in the case of a convolutional neural network. With adversarial training, the robustness of the deep neural network increased on the adversarial examples by 49.1% in the worst case while the accuracy on clean samples decreased by 13.2%.« less
  3. Disentangled generative models map a latent code vector to a target space, while enforcing that a subset of the learned latent codes are interpretable and associated with distinct properties of the target distribution. Recent advances have been dominated by Variational AutoEncoder (VAE)-based methods, while training disentangled generative adversarial networks (GANs) remains challenging. In this work, we show that the dominant challenges facing disentangled GANs can be mitigated through the use of self-supervision. We make two main contributions: first, we design a novel approach for training disentangled GANs with self-supervision. We propose contrastive regularizer, which is inspired by a natural notion of disentanglement: latent traversal. This achieves higher disentanglement scores than state-of-the-art VAE- and GAN-based approaches. Second, we propose an unsupervised model selection scheme called ModelCentrality, which uses generated synthetic samples to compute the medoid (multi-dimensional generalization of median) of a collection of models. The current common practice of hyper-parameter tuning requires using ground-truths samples, each labelled with known perfect disentangled latent codes. As real datasets are not equipped with such labels, we propose an unsupervised model selection scheme and show that it finds a model close to the best one, for both VAEs and GANs. Combining contrastive regularization with ModelCentrality,more »we improve upon the state-of-the-art disentanglement scores significantly, without accessing the supervised data.« less
  4. Significant resources have been spent in collecting and storing large and heterogeneous radar datasets during expensive Arctic and Antarctic fieldwork. The vast majority of data available is unlabeled, and the labeling process is both time-consuming and expensive. One possible alternative to the labeling process is the use of synthetically generated data with artificial intelligence. Instead of labeling real images, we can generate synthetic data based on arbitrary labels. In this way, training data can be quickly augmented with additional images. In this research, we evaluated the performance of synthetically generated radar images based on modified cycle-consistent adversarial networks. We conducted several experiments to test the quality of the generated radar imagery. We also tested the quality of a state-of-the-art contour detection algorithm on synthetic data and different combinations of real and synthetic data. Our experiments show that synthetic radar images generated by generative adversarial network (GAN) can be used in combination with real images for data augmentation and training of deep neural networks. However, the synthetic images generated by GANs cannot be used solely for training a neural network (training on synthetic and testing on real) as they cannot simulate all of the radar characteristics such as noise or Dopplermore »effects. To the best of our knowledge, this is the first work in creating radar sounder imagery based on generative adversarial network.« less
  5. Abstract Objective

    Electronic medical records (EMRs) can support medical research and discovery, but privacy risks limit the sharing of such data on a wide scale. Various approaches have been developed to mitigate risk, including record simulation via generative adversarial networks (GANs). While showing promise in certain application domains, GANs lack a principled approach for EMR data that induces subpar simulation. In this article, we improve EMR simulation through a novel pipeline that (1) enhances the learning model, (2) incorporates evaluation criteria for data utility that informs learning, and (3) refines the training process.

    Materials and Methods

    We propose a new electronic health record generator using a GAN with a Wasserstein divergence and layer normalization techniques. We designed 2 utility measures to characterize similarity in the structural properties of real and simulated EMRs in the original and latent space, respectively. We applied a filtering strategy to enhance GAN training for low-prevalence clinical concepts. We evaluated the new and existing GANs with utility and privacy measures (membership and disclosure attacks) using billing codes from over 1 million EMRs at Vanderbilt University Medical Center.


    The proposed model outperformed the state-of-the-art approaches with significant improvement in retaining the nature of real records, including prediction performance andmore »structural properties, without sacrificing privacy. Additionally, the filtering strategy achieved higher utility when the EMR training dataset was small.


    These findings illustrate that EMR simulation through GANs can be substantially improved through more appropriate training, modeling, and evaluation criteria.

    « less