skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: HAR-CTGAN: A Mobile Sensor Data Generation Tool for Human Activity Recognition
Human activity recognition (HAR) is the process of using mobile sensor data to determine the physical activities performed by individuals. HAR is the backbone of many mobile healthcare applications, such as passive health monitoring systems, early diagnosing systems, and fall detection systems. Effective HAR models rely on deep learning architectures and big data in order to accurately classify activities. Unfortunately, HAR datasets are expensive to collect, are often mislabeled, and have large class imbalances. State-of-the-art approaches to address these challenges utilize Generative Adversarial Networks (GANs) for generating additional synthetic data along with their labels. Problematically, these HAR GANs only synthesize continuous features — features that are represented by real numbers — recorded from gyroscopes, accelerometers, and other sensors that produce continuous data. This is limiting since mobile sensor data commonly has discrete features that provide additional context such as device location and the time-of-day, which have been shown to substantially improve HAR classification. Hence, we studied Conditional Tabular Generative Adversarial Networks (CTGANs) for data generation to synthesize mobile sensor data containing both continuous and discrete features, a task never been done by state-of-the-art approaches. We show HAR-CTGANs generate data with greater realism resulting in allowing better downstream performance in HAR models, and when state-of-the-art models were modified with HAR-CTGAN characteristics, downstream performance also improves.  more » « less
Award ID(s):
1852498
PAR ID:
10399258
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2022 IEEE International Conference on Big Data (Big Data)
Page Range / eLocation ID:
5233 to 5242
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The rapid advancement of Quantum Machine Learning (QML) has introduced new possibilities and challenges in the field of cybersecurity. Generative Adversarial Networks (GANs) have been used as promising tools in Machine Learning (ML) and QML for generating realistic synthetic data from existing (real) dataset which aids in the analysis, detection, and protection against adversarial attacks. In fact, Quantum Generative Adversarial Networks (QGANs) has great ability for numerical data as well as image data generation which have high-dimensional features using the property of quantum superposition. However, effectively loading datasets onto quantum computers encounters significant obstacles due to losses and inherent noise which affects performance. In this work, we study the impact of various losses during training of QGANs as well as GANs for various state-of-the-art cybersecurity datasets. This paper presents a comparative analysis of the stability of loss functions for real datasets as well as GANs generated synthetic dataset. Therefore, we conclude that QGANs demonstrate superior stability and maintain consistently lower generator loss values than traditional machine learning approaches like GANs. Consequently, experimental results indicate that the stability of the loss function is more pronounced for QGANs than GANs. 
    more » « less
  2. Generative Adversarial Networks (GANs) have recently attracted considerable attention in the AI community due to their ability to generate high-quality data of significant statisti- cal resemblance to real data. Fundamentally, GAN is a game between two neural networks trained in an adversarial manner to reach a zero-sum Nash equilibrium profile. Despite the improvement accomplished in GANs in the last few years, several issues remain to be solved. This paper reviews the literature on the game-theoretic aspects of GANs and addresses how game theory models can address specific challenges of generative models and improve the GAN’s performance. We first present some preliminaries, including the basic GAN model and some game theory background. We then present a taxonomy to clas- sify state-of-the-art solutions into three main categories: modified game models, modified architectures, and modified learning methods. The classification is based on modifications made to the basic GAN model by proposed game-theoretic approaches in the literature. We then explore the objectives of each category and discuss recent works in each class. Finally, we discuss the remaining challenges in this field and present future research directions. 
    more » « less
  3. Recent advances in machine learning and deep neural networks have led to the realization of many important applications in the area of personalized medicine. Whether it is detecting activities of daily living or analyzing images for cancerous cells, machine learning algorithms have become the dominant choice for such emerging applications. In particular, the state-of-the-art algorithms used for human activity recognition (HAR) using wearable inertial sensors utilize machine learning algorithms to detect health events and to make predictions from sensor data. Currently, however, there remains a gap in research on whether or not and how activity recognition algorithms may become the subject of adversarial attacks. In this paper, we take the first strides on (1) investigating methods of generating adversarial example in the context of HAR systems; (2) studying the vulnerability of activity recognition models to adversarial examples in feature and signal domain; and (3) investigating the effects of adversarial training on HAR systems. We introduce Adar, a novel computational framework for optimization-driven creation of adversarial examples in sensor-based activity recognition systems. Through extensive analysis based on real sensor data collected with human subjects, we found that simple evasion attacks are able to decrease the accuracy of a deep neural network from 95.1% to 3.4% and from 93.1% to 16.8% in the case of a convolutional neural network. With adversarial training, the robustness of the deep neural network increased on the adversarial examples by 49.1% in the worst case while the accuracy on clean samples decreased by 13.2%. 
    more » « less
  4. Generative Adversarial Networks (GANs) have shown stupendous power in generating realistic images to an extend that human eyes are not capable of recognizing them as synthesized. State-of-the-art GAN models are capable of generating realistic and high-quality images, which promise unprecedented opportunities for generating design concepts. Yet, the preliminary experiments reported in this paper shed light on a fundamental limitation of GANs for generative design: lack of novelty and diversity in generated samples. This article conducts a generative design study on a large-scale sneaker dataset based on StyleGAN, a state-of-the-art GAN architecture, to advance the understanding of the performance of these generative models in generating novel and diverse samples (i.e., sneaker images). The findings reveal that although StyleGAN can generate samples with quality and realism, the generated and style-mixed samples highly resemble the training dataset (i.e., existing sneakers). This article aims to provide future research directions and insights for the engineering design community to further realize the untapped potentials of GANs for generative design. 
    more » « less
  5. Security monitoring is crucial for maintaining a strong IT infrastructure by protecting against emerging threats, identifying vulnerabilities, and detecting potential points of failure. It involves deploying advanced tools to continuously monitor networks, systems, and configurations. However, organizations face challenges in adapting modern techniques like Machine Learning (ML) due to privacy and security risks associated with sharing internal data. Compliance with regulations like GDPR further complicates data sharing. To promote external knowledge sharing, a secure and privacy-preserving method for organizations to share data is necessary. Privacy-preserving data generation involves creating new data that maintains privacy while preserving key characteristics and properties of the original data so that it is still useful in creating downstream models of attacks. Generative models, such as Generative Adversarial Networks (GAN), have been proposed as a solution for privacy preserving synthetic data generation. However, standard GANs are limited in their capabilities to generate realistic system data. System data have inherent constraints, e.g., the list of legitimate I.P. addresses and port numbers are limited, and protocols dictate a valid sequence of network events. Standard generative models do not account for such constraints and do not utilize domain knowledge in their generation process. Additionally, they are limited by the attribute values present in the training data. This poses a major privacy risk, as sensitive discrete attribute values are repeated by GANs. To address these limitations, we propose a novel model for Knowledge Infused Privacy Preserving Data Generation. A privacy preserving Generative Adversarial Network (GAN) is trained on system data for generating synthetic datasets that can replace original data for downstream tasks while protecting sensitive data. Knowledge from domain-specific knowledge graphs is used to guide the data generation process, check for the validity of generated values, and enrich the dataset by diversifying the values of attributes. We specifically demonstrate this model by synthesizing network data captured by the network capture tool, Wireshark. We establish that the synthetic dataset holds up to the constraints of the network-specific datasets and can replace the original dataset in downstream tasks. 
    more » « less