NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Alleviating the Fear of Losing Alignment in LLM Fine-tuning

https://doi.org/10.1109/SP61157.2025.00171

Yang, Kang; Tao, Guanhong; Chen, Xun; Xu, Jun (May 2025, IEEE)

Large language models (LLMs) have demonstrated revolutionary capabilities in understanding complex contexts and performing a wide range of tasks. However, LLMs can also answer questions that are unethical or harmful, raising concerns about their applications. To regulate LLMs' responses to such questions, a training strategy called alignment can help. Yet, alignment can be unexpectedly compromised when fine-tuning an LLM for downstream tasks. This paper focuses on recovering the alignment lost during fine-tuning. We observe that there are two distinct directions inherent in an aligned LLM: the aligned direction and the harmful direction. An LLM is inclined to answer questions in the aligned direction while refusing queries in the harmful direction. Therefore, we propose to recover the harmful direction of the fine-tuned model that has been compromised. Specifically, we restore a small subset of the fine-tuned model's weight parameters from the original aligned model using gradient descent. We also introduce a rollback mechanism to avoid aggressive recovery and maintain downstream task performance. Our evaluation on 125 fine-tuned LLMs demonstrates that our method can reduce their harmful rate (percentage of answering harmful questions) from 33.25% to 1.74%, without sacrificing task performance much. In contrast, the existing methods either only reduce the harmful rate to a limited extent or significantly impact the normal functionality. Our code is available at https://github.com/kangyangWHU/LLMAlignment
more » « less
Free, publicly-accessible full text available May 12, 2026
Interaction-level Membership Inference Attack against Recommender Systems with Long-tailed Distribution

https://doi.org/10.1145/3627673.3679804

Zhong, Da; Wang, Xiuling; Xu, Zhichao; Xu, Jun; Wang, Wendy Hui (October 2024, ACM)

Full Text Available
Revisiting Black-box Ownership Verification for Graph Neural Networks

https://doi.org/10.1109/SP54263.2024.00232

Zhou, Ruikai; Yang, Kang; Wang, Xiuling; Wang, Wendy Hui; Xu, Jun (May 2024, IEEE)

Graph Neural Networks (GNNs) have emerged as powerful tools for processing graph-structured data, enabling applications in various domains. Yet, GNNs are vulnerable to model extraction attacks, imposing risks to intellectual property. To mitigate model extraction attacks, model ownership verification is considered an effective method. However, throughout a series of empirical studies, we found that the existing GNN ownership verification methods either mandate unrealistic conditions or present unsatisfactory accuracy under the most practical settings—the black-box setting where the verifier only requires access to the final output (e.g., posterior probability) of the target model and the suspect model. Inspired by the studies, we propose a new, black-box GNN ownership verification method that involves local independent models and shadow surrogate models to train a classifier for performing ownership verification. Our method boosts the verification accuracy by exploiting two insights: (1) We consider the overall behaviors of the target model for decision-making, better utilizing its holistic fingerprinting; (2) We enrich the fingerprinting of the target model by masking a subset of features of its training data, injecting extra information to facilitate ownership verification. To assess the effectiveness of our proposed method, we perform an intensive series of evaluations with 5 popular datasets, 5 mainstream GNN architectures, and 16 different settings. Our method achieves nearly perfect accuracy with a marginal impact on the target model in all cases, significantly outperforming the existing methods and enlarging their practicality. We also demonstrate that our method maintains robustness against adversarial attempts to evade the verification.
more » « less
Full Text Available
Disparate Vulnerability in Link Inference Attacks against Graph Neural Networks

https://doi.org/10.56553/popets-2023-0103

Zhong, Da; Yu, Ruotong; Wu, Kun; Wang, Xiuling; Xu, Jun; Wang, Wendy Hui (October 2023, Proceedings on Privacy Enhancing Technologies)

Graph Neural Networks (GNNs) have been widely used in various graph-based applications. Recent studies have shown that GNNs are vulnerable to link-level membership inference attacks (LMIA) which can infer whether a given link was included in the training graph of a GNN model. While most of the studies focus on the privacy vulnerability of the links in the entire graph, none have inspected the privacy risk of specific subgroups of links (e.g., links between LGBT users). In this paper, we present the first study of disparity in subgroup vulnerability (DSV) of GNNs against LMIA. First, with extensive empirical evaluation, we demonstrate the existence of non-negligible DSV under various settings of GNN models and input graphs. Second, by both statistical and causal analysis, we identify the difference between three specific graph structural properties of subgroups as one of the underlying reasons for DSV. Among the three properties, the difference between subgroup density has the largest causal effect on DSV. Third, inspired by the causal analysis, we design a new defense mechanism named FairDefense to mitigate DSV while providing protection against LMIA. At a high level, at each iteration of target model training, FairDefense randomizes the membership of edges in the training graph with a given probability, aiming to reduce the gap between the density of different subgroups for DSV mitigation. Our empirical results demonstrate that FairDefense outperforms the existing defense methods in the trade-off between defense and target model accuracy. More importantly, it offers better DSV mitigation.
more » « less
Full Text Available
Disparate Vulnerability in Link Inference Attacks against Graph Neural Networks

Da Zhong, Routing Yu (July 2023, Proceedings on Privacy Enhancing Technologies)

Full Text Available
Understanding Disparate Effects of Membership Inference Attacks and their Countermeasures

https://doi.org/10.1145/3488932.3501279

Zhong, Da; Sun, Haipei; Xu, Jun; Gong, Neil; Wang, Wendy Hui (May 2022, Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security)

Full Text Available

Search for: All records