skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Privacy-Preserving Data-Driven Learning Models for Emerging Communication Networks: A Comprehensive Survey
With the proliferation of Beyond 5G (B5G) communication systems and heterogeneous networks, mobile broadband users are generating massive volumes of data that undergo fast processing and computing to obtain actionable insights. While analyzing this huge amount of data typically involves machine and deep learning-based data-driven Artificial Intelligence (AI) models, a key challenge arises in terms of providing privacy assurances for user-generated data. Even though data-driven techniques have been widely utilized for network traffic analysis and other network management tasks, researchers have also identified that applying AI techniques may often lead to severe privacy concerns. Therefore, the concept of privacy-preserving data-driven learning models has recently emerged as a hot area of research to facilitate model training on large-scale datasets while guaranteeing privacy along with the security of the data. In this paper, we first demonstrate the research gap in this domain, followed by a tutorial-oriented review of data-driven models, which can be potentially mapped to privacy-preserving techniques. Then, we provide preliminaries of a number of privacy-preserving techniques (e.g., differential privacy, functional encryption, Homomorphic encryption, secure multi-party computation, and federated learning) that can be potentially adopted for emerging communication networks. The provided preliminaries enable us to showcase the subset of data-driven privacy-preserving models, which are gaining traction in emerging communication network systems. We provide a number of relevant networking use cases, ranging from the B5G core and Radio Access Networks (RANs) to semantic communications, adopting privacy-preserving data-driven models. Based on the lessons learned from the pertinent use cases, we also identify several open research challenges and hint toward possible solutions.  more » « less
Award ID(s):
2210252
PAR ID:
10596630
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Communications Surveys & Tutorials
ISSN:
2373-745X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In graph machine learning, data collection, sharing, and analysis often involve multiple parties, each of which may require varying levels of data security and privacy. To this end, preserving privacy is of great importance in protecting sensitive information. In the era of big data, the relationships among data entities have become unprecedentedly complex, and more applications utilize advanced data structures (i.e., graphs) that can support network structures and relevant attribute information. To date, many graph-based AI models have been proposed (e.g., graph neural networks) for various domain tasks, like computer vision and natural language processing. In this paper, we focus on reviewing privacypreserving techniques of graph machine learning. We systematically review related works from the data to the computational aspects. We rst review methods for generating privacy-preserving graph data. Then we describe methods for transmitting privacy-preserved information (e.g., graph model parameters) to realize the optimization-based computation when data sharing among multiple parties is risky or impossible. In addition to discussing relevant theoretical methodology and software tools, we also discuss current challenges and highlight several possible future research opportunities for privacy-preserving graph machine learning. Finally, we envision a uni ed and comprehensive secure graph machine learning system. 
    more » « less
  2. Emerging Distributed AI systems are revolutionizing big data computing and data processing capabilities with growing economic and societal impact. However, recent studies have identified new attack surfaces and risks caused by security, privacy, and fairness issues in AI systems. In this paper, we review representative techniques, algorithms, and theoretical foundations for trustworthy distributed AI through robustness guarantee, privacy protection, and fairness awareness in distributed learning. We first provide a brief overview of alternative architectures for distributed learning, discuss inherent vulnerabilities for security, privacy, and fairness of AI algorithms in distributed learning, and analyze why these problems are present in distributed learning regardless of specific architectures. Then we provide a unique taxonomy of countermeasures for trustworthy distributed AI, covering (1) robustness to evasion attacks and irregular queries at inference, and robustness to poisoning attacks, Byzantine attacks, and irregular data distribution during training; (2) privacy protection during distributed learning and model inference at deployment; and (3) AI fairness and governance with respect to both data and models. We conclude with a discussion on open challenges and future research directions toward trustworthy distributed AI, such as the need for trustworthy AI policy guidelines, the AI responsibility-utility co-design, and incentives and compliance. 
    more » « less
  3. Security monitoring is crucial for maintaining a strong IT infrastructure by protecting against emerging threats, identifying vulnerabilities, and detecting potential points of failure. It involves deploying advanced tools to continuously monitor networks, systems, and configurations. However, organizations face challenges in adapting modern techniques like Machine Learning (ML) due to privacy and security risks associated with sharing internal data. Compliance with regulations like GDPR further complicates data sharing. To promote external knowledge sharing, a secure and privacy-preserving method for organizations to share data is necessary. Privacy-preserving data generation involves creating new data that maintains privacy while preserving key characteristics and properties of the original data so that it is still useful in creating downstream models of attacks. Generative models, such as Generative Adversarial Networks (GAN), have been proposed as a solution for privacy preserving synthetic data generation. However, standard GANs are limited in their capabilities to generate realistic system data. System data have inherent constraints, e.g., the list of legitimate I.P. addresses and port numbers are limited, and protocols dictate a valid sequence of network events. Standard generative models do not account for such constraints and do not utilize domain knowledge in their generation process. Additionally, they are limited by the attribute values present in the training data. This poses a major privacy risk, as sensitive discrete attribute values are repeated by GANs. To address these limitations, we propose a novel model for Knowledge Infused Privacy Preserving Data Generation. A privacy preserving Generative Adversarial Network (GAN) is trained on system data for generating synthetic datasets that can replace original data for downstream tasks while protecting sensitive data. Knowledge from domain-specific knowledge graphs is used to guide the data generation process, check for the validity of generated values, and enrich the dataset by diversifying the values of attributes. We specifically demonstrate this model by synthesizing network data captured by the network capture tool, Wireshark. We establish that the synthetic dataset holds up to the constraints of the network-specific datasets and can replace the original dataset in downstream tasks. 
    more » « less
  4. The increasing use of high-dimensional imaging in medical AI raises significant privacy and security concerns. This paper presents a Bootstrap Your Own Latent (BYOL)-based self supervised learning (SSL) framework for secure image processing, ensuring compliance with HIPAA and privacy-preserving machine learning (PPML) techniques. Our method integrates federated learning, homomorphic encryption, and differential privacy to enhance security while reducing dependence on labeled data. Experimental results on the MNIST and NIH Chest Xray datasets demonstrate a classification accuracy of 97.5% and 99.99% (pre-fine-tuning 40%), with improved clustering performance using K-Means (Silhouette Score: 0.5247). These findings validate BYOL’s capability for robust, privacy-preserving image processing while emphasizing the need for fine-tuning to optimize classification performance. 
    more » « less
  5. Integrated sensing and communication (ISAC) is considered an emerging technology for 6th-generation (6G) wireless and mobile networks. It is expected to enable a wide variety of vertical applications, ranging from unmanned aerial vehicles (UAVs) detection for critical infrastructure protection to physiological sensing for mobile healthcare. Despite its significant socioeconomic benefits, ISAC technology also raises unique challenges in system security and user privacy. Being aware of the security and privacy challenges, understanding the trade-off between security and communication performance, and exploring potential countermeasures in practical systems are critical to a wide adoption of this technology in various application scenarios. This talk will discuss various security and privacy threats in emerging ISAC systems with a focus on communication-centric ISAC systems, that is, using the cellular or WiFi infrastructure for sensing. We will then examine potential mechanisms to secure ISAC systems and protect user privacy at the physical and data layers under different sensing modes. At the wireless physical (PHY) layer, an ISAC system is subject to both passive and active attacks, such as unauthorized passive sensing, unauthorized active sensing, signal spoofing, and jamming. Potential countermeasures include wireless channel/radio frequency (RF) environment obfuscation, waveform randomization, anti-jamming communication, and spectrum/RF monitoring. At the data layer, user privacy could be compromised during data collection, sharing, storage, and usage. For sensing systems powered by artificial intelligence (AI), user privacy could also be compromised during the model training and inference stages. An attacker could falsify the sensing data to achieve a malicious goal. Potential countermeasures include the application of privacy enhancing technologies (PETs), such as data anonymization, differential privacy, homomorphic encryption, trusted execution, and data synthesis. 
    more » « less