A surprising phenomenon in modern machine learning is the ability of a highly overparameterized model to generalize well (small error on the test data) even when it is trained to memorize the training data (zero error on the training data). This has led to an arms race towards increasingly overparameterized models (c.f., deep learning). In this paper, we study an underexplored hidden cost of overparameterization: the fact that overparameterized models may be more vulnerable to privacy attacks, in particular the membership inference attack that predicts the (potentially sensitive) examples used to train a model. We significantly extend the relatively few empirical results on this problem by theoretically proving for an overparameterized linear regression model in the Gaussian data setting that membership inference vulnerability increases with the number of parameters. Moreover, a range of empirical studies indicates that more complex, nonlinear models exhibit the same behavior. Finally, we extend our analysis towards ridge-regularized linear regression and show in the Gaussian data setting that increased regularization also increases membership inference vulnerability in the overparameterized regime.
more »
« less
A Blessing of Dimensionality in Membership Inference through Regularization
Is overparameterization a privacy liability? In this work, we study the effect that the number of parameters has on a classifier's vulnerability to membership inference attacks. We first demonstrate how the number of parameters of a model can induce a privacy--utility trade-off: increasing the number of parameters generally improves generalization performance at the expense of lower privacy. However, remarkably, we then show that if coupled with proper regularization, increasing the number of parameters of a model can actually simultaneously increase both its privacy and performance, thereby eliminating the privacy--utility trade-off. Theoretically, we demonstrate this curious phenomenon for logistic regression with ridge regularization in a bi-level feature ensemble setting. Pursuant to our theoretical exploration, we develop a novel leave-one-out analysis tool to precisely characterize the vulnerability of a linear classifier to the optimal membership inference attack. We empirically exhibit this "blessing of dimensionality" for neural networks on a variety of tasks using early stopping as the regularizer.
more »
« less
- PAR ID:
- 10466348
- Date Published:
- Journal Name:
- Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Graph Neural Networks (GNNs) have been widely used in various graph-based applications. Recent studies have shown that GNNs are vulnerable to link-level membership inference attacks (LMIA) which can infer whether a given link was included in the training graph of a GNN model. While most of the studies focus on the privacy vulnerability of the links in the entire graph, none have inspected the privacy risk of specific subgroups of links (e.g., links between LGBT users). In this paper, we present the first study of disparity in subgroup vulnerability (DSV) of GNNs against LMIA. First, with extensive empirical evaluation, we demonstrate the existence of non-negligible DSV under various settings of GNN models and input graphs. Second, by both statistical and causal analysis, we identify the difference between three specific graph structural properties of subgroups as one of the underlying reasons for DSV. Among the three properties, the difference between subgroup density has the largest causal effect on DSV. Third, inspired by the causal analysis, we design a new defense mechanism named FairDefense to mitigate DSV while providing protection against LMIA. At a high level, at each iteration of target model training, FairDefense randomizes the membership of edges in the training graph with a given probability, aiming to reduce the gap between the density of different subgroups for DSV mitigation. Our empirical results demonstrate that FairDefense outperforms the existing defense methods in the trade-off between defense and target model accuracy. More importantly, it offers better DSV mitigation.more » « less
-
While machine learning (ML) models are becoming mainstream, including in critical application domains, concerns have been raised about the increasing risk of sensitive data leakage. Various privacy attacks, such as membership inference attacks (MIAs), have been developed to extract data from trained ML models, posing significant risks to data confidentiality. While the predominant work in the ML community considers traditional Artificial Neural Networks (ANNs) as the default neural model, neuromorphic architectures, such as Spiking Neural Networks (SNNs), have recently emerged as an attractive alternative mainly due to their significantly low power consumption. These architectures process information through discrete events, i.e., spikes, to mimic the functioning of biological neurons in the brain. While the privacy issues have been extensively investigated in the context of traditional ANNs, they remain largely unexplored in neuromorphic architectures, and little work has been dedicated to investigating their privacy-preserving properties. In this paper, we investigate the question of whether SNNs have inherent privacy-preserving advantages. Specifically, we investigate SNNs’ privacy properties through the lens of MIAs across diverse datasets, in comparison with ANNs. We explore the impact of different learning algorithms (surrogate gradient and evolutionary learning), programming frameworks (snnTorch, TENNLab, and LAVA), and various parameters on the resilience of SNNs against MIA. Our experiments reveal that SNNs demonstrate consistently superior privacy preservation compared to ANNs, with evolutionary algorithms further enhancing their resilience. For example, on the CIFAR-10 dataset, SNNs achieve an AUC as low as 0.59 compared to 0.82 for ANNs, and on CIFAR-100, SNNs maintain a low AUC of 0.58, whereas ANNs reach 0.88. Furthermore, we investigate the privacy-utility trade-off through Differentially Private Stochastic Gradient Descent (DPSGD), observing that SNNs incur a notably lower accuracy drop than ANNs under equivalent privacy constraints.more » « less
-
Gradient leakage attacks are dominating privacy threats in federated learning, despite the default privacy that training data resides locally at the clients. Differential privacy has been the de facto standard for privacy protection and is deployed in federated learning to mitigate privacy risks. However, much existing literature points out that differential privacy fails to defend against gradient leakage. The paper presents ModelCloak, a principled approach based on differential privacy noise, aiming for safe-sharing client local model updates. The paper is organized into three major components. First, we introduce the gradient leakage robustness trade-off, in search of the best balance between accuracy and leakage prevention. The trade-off relation is developed based on the behavior of gradient leakage attacks throughout the federated training process. Second, we demonstrate that a proper amount of differential privacy noise can offer the best accuracy performance within the privacy requirement under a fixed differential privacy noise setting. Third, we propose dynamic differential privacy noise and show that the privacy-utility trade-off can be further optimized with dynamic model perturbation, ensuring privacy protection, competitive accuracy, and leakage attack prevention simultaneously.more » « less
-
Graph Neural Networks (GNNs) have been widely applied to various applications across different domains. However, recent studies have shown that GNNs are susceptible to the membership inference attacks (MIAs) which aim to infer if some particular data samples were included in the model’s training data. While most previous MIAs have focused on inferring the membership of individual nodes and edges within the training graph, we introduce a novel form of membership inference attack called the Structure Membership Inference Attack (SMIA) which aims to determine whether a given set of nodes corresponds to a particular target structure, such as a clique or a multi-hop path, within the original training graph. To address this issue, we present novel black-box SMIA attacks that leverage the prediction outputs generated by the target GNN model for inference. Our approach involves training a three-label classifier, which, in combination with shadow training, aids in enabling the inference attack. Our extensive experimental evaluation of three representative GNN models and three real-world graph datasets demonstrates that our proposed attacks consistently outperform three baseline methods, including the one that employs the conventional link membership inference attacks to infer the subgraph structure. Additionally, we design a defense mechanism that introduces perturbations to the node embeddings thus influencing the corresponding prediction outputs by the target model. Our defense selectively perturbs dimensions within the node embeddings that have the least impact on the model's accuracy. Our empirical results demonstrate that the defense effectiveness of our approach is comparable with two established defense techniques that employ differential privacy. Moreover, our method achieves a better trade-off between defense strength and the accuracy of the target model compared to the two existing defense methods.more » « less
An official website of the United States government

