skip to main content


Title: Membership Inference Attacks and Defenses in Classification Models
We study the membership inference (MI) attack against classifiers, where the attacker's goal is to determine whether a data instance was used for training the classifier. Through systematic cataloging of existing MI attacks and extensive experimental evaluations of them, we find that a model's vulnerability to MI attacks is tightly related to the generalization gap -- the difference between training accuracy and test accuracy. We then propose a defense against MI attacks that aims to close the gap by intentionally reduces the training accuracy. More specifically, the training process attempts to match the training and validation accuracies, by means of a new {\em set regularizer} using the Maximum Mean Discrepancy between the softmax output empirical distributions of the training and validation sets. Our experimental results show that combining this approach with another simple defense (mix-up training) significantly improves state-of-the-art defense against MI attacks, with minimal impact on testing accuracy.  more » « less
Award ID(s):
1943364
PAR ID:
10323553
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the Eleventh ACM Conference on Data and Application Security and Privacy
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms. However, it often degrades the model performance on normal images and the defense does not generalize well to novel attacks. Given the success of deep generative models such as GANs and VAEs in characterizing the underlying manifold of images, we investigate whether or not the aforementioned problems can be remedied by exploiting the underlying manifold information. To this end, we construct an "On-Manifold ImageNet" (OM-ImageNet) dataset by projecting the ImageNet samples onto the manifold learned by StyleGSN. For this dataset, the underlying manifold information is exact. Using OM-ImageNet, we first show that adversarial training in the latent space of images improves both standard accuracy and robustness to on-manifold attacks. However, since no out-of-manifold perturbations are realized, the defense can be broken by Lp adversarial attacks. We further propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model. Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks. In addition, we observe that models defended by DMAT achieve improved robustness against novel attacks which manipulate images by global color shifts or various types of image filtering. Interestingly, similar improvements are also achieved when the defended models are tested on out-of-manifold natural images. These results demonstrate the potential benefits of using manifold information in enhancing robustness of deep learning models against various types of novel adversarial attacks. 
    more » « less
  2. Graph Neural Networks (GNNs) have been widely applied to various applications across different domains. However, recent studies have shown that GNNs are susceptible to the membership inference attacks (MIAs) which aim to infer if some particular data samples were included in the model’s training data. While most previous MIAs have focused on inferring the membership of individual nodes and edges within the training graph, we introduce a novel form of membership inference attack called the Structure Membership Inference Attack (SMIA) which aims to determine whether a given set of nodes corresponds to a particular target structure, such as a clique or a multi-hop path, within the original training graph. To address this issue, we present novel black-box SMIA attacks that leverage the prediction outputs generated by the target GNN model for inference. Our approach involves training a three-label classifier, which, in combination with shadow training, aids in enabling the inference attack. Our extensive experimental evaluation of three representative GNN models and three real-world graph datasets demonstrates that our proposed attacks consistently outperform three baseline methods, including the one that employs the conventional link membership inference attacks to infer the subgraph structure. Additionally, we design a defense mechanism that introduces perturbations to the node embeddings thus influencing the corresponding prediction outputs by the target model. Our defense selectively perturbs dimensions within the node embeddings that have the least impact on the model's accuracy. Our empirical results demonstrate that the defense effectiveness of our approach is comparable with two established defense techniques that employ differential privacy. Moreover, our method achieves a better trade-off between defense strength and the accuracy of the target model compared to the two existing defense methods.

     
    more » « less
  3. null (Ed.)
    Federated learning (FL) allows a set of agents to collaboratively train a model without sharing their potentially sensitive data. This makes FL suitable for privacy-preserving applications. At the same time, FL is susceptible to adversarial attacks due to decentralized and unvetted data. One important line of attacks against FL is the backdoor attacks. In a backdoor attack, an adversary tries to embed a backdoor functionality to the model during training that can later be activated to cause a desired misclassification. To prevent backdoor attacks, we propose a lightweight defense that requires minimal change to the FL protocol. At a high level, our defense is based on carefully adjusting the aggregation server's learning rate, per dimension and per round, based on the sign information of agents' updates. We first conjecture the necessary steps to carry a successful backdoor attack in FL setting, and then, explicitly formulate the defense based on our conjecture. Through experiments, we provide empirical evidence that supports our conjecture, and we test our defense against backdoor attacks under different settings. We observe that either backdoor is completely eliminated, or its accuracy is significantly reduced. Overall, our experiments suggest that our defense significantly outperforms some of the recently proposed defenses in the literature. We achieve this by having minimal influence over the accuracy of the trained models. In addition, we also provide convergence rate analysis for our proposed scheme. 
    more » « less
  4. null (Ed.)
    Patch adversarial attacks on images, in which the attacker can distort pixels within a region of bounded size, are an important threat model since they provide a quantitative model for physical adversarial attacks. In this paper, we introduce a certifiable defense against patch attacks that guarantees for a given image and patch attack size, no patch adversarial examples exist. Our method is related to the broad class of randomized smoothing robustness schemes which provide high-confidence probabilistic robustness certificates. By exploiting the fact that patch attacks are more constrained than general sparse attacks, we derive meaningfully large robustness certificates against them. Additionally, in contrast to smoothing-based defenses against L_p and sparse attacks, our defense method against patch attacks is de-randomized, yielding improved, deterministic certificates. Compared to the existing patch certification method proposed by Chiang et al. (2020), which relies on interval bound propagation, our method can be trained significantly faster, achieves high clean and certified robust accuracy on CIFAR-10, and provides certificates at ImageNet scale. For example, for a 5-by-5 patch attack on CIFAR-10, our method achieves up to around 57.6% certified accuracy (with a classifier with around 83.8% clean accuracy), compared to at most 30.3% certified accuracy for the existing method (with a classifier with around 47.8% clean accuracy). Our results effectively establish a new state-of-the-art of certifiable defense against patch attacks on CIFAR-10 and ImageNet. 
    more » « less
  5. Graph Neural Networks (GNNs) have been widely used in various graph-based applications. Recent studies have shown that GNNs are vulnerable to link-level membership inference attacks (LMIA) which can infer whether a given link was included in the training graph of a GNN model. While most of the studies focus on the privacy vulnerability of the links in the entire graph, none have inspected the privacy risk of specific subgroups of links (e.g., links between LGBT users). In this paper, we present the first study of disparity in subgroup vulnerability (DSV) of GNNs against LMIA. First, with extensive empirical evaluation, we demonstrate the existence of non-negligible DSV under various settings of GNN models and input graphs. Second, by both statistical and causal analysis, we identify the difference between three specific graph structural properties of subgroups as one of the underlying reasons for DSV. Among the three properties, the difference between subgroup density has the largest causal effect on DSV. Third, inspired by the causal analysis, we design a new defense mechanism named FairDefense to mitigate DSV while providing protection against LMIA. At a high level, at each iteration of target model training, FairDefense randomizes the membership of edges in the training graph with a given probability, aiming to reduce the gap between the density of different subgroups for DSV mitigation. Our empirical results demonstrate that FairDefense outperforms the existing defense methods in the trade-off between defense and target model accuracy. More importantly, it offers better DSV mitigation.

     
    more » « less