Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available July 27, 2026
-
Large language models (LLMs) have demonstrated revolutionary capabilities in understanding complex contexts and performing a wide range of tasks. However, LLMs can also answer questions that are unethical or harmful, raising concerns about their applications. To regulate LLMs' responses to such questions, a training strategy called alignment can help. Yet, alignment can be unexpectedly compromised when fine-tuning an LLM for downstream tasks. This paper focuses on recovering the alignment lost during fine-tuning. We observe that there are two distinct directions inherent in an aligned LLM: the aligned direction and the harmful direction. An LLM is inclined to answer questions in the aligned direction while refusing queries in the harmful direction. Therefore, we propose to recover the harmful direction of the fine-tuned model that has been compromised. Specifically, we restore a small subset of the fine-tuned model's weight parameters from the original aligned model using gradient descent. We also introduce a rollback mechanism to avoid aggressive recovery and maintain downstream task performance. Our evaluation on 125 fine-tuned LLMs demonstrates that our method can reduce their harmful rate (percentage of answering harmful questions) from 33.25% to 1.74%, without sacrificing task performance much. In contrast, the existing methods either only reduce the harmful rate to a limited extent or significantly impact the normal functionality. Our code is available at https://github.com/kangyangWHU/LLMAlignmentmore » « lessFree, publicly-accessible full text available May 12, 2026
-
Free, publicly-accessible full text available May 12, 2026
-
This paper focuses on fuzzing document software or precisely, software that processes document files (e.g., HTML, PDF, and DOCX). Document software typically requires highly-structured inputs, which general-purpose fuzzing cannot handle well. We propose two techniques to facilitate fuzzing on document software. First, we design an intermediate document representation (DIR) for document files. DIR describes a document file in an abstract way that is independent of the underlying format. Reusing common SDKs, a DIR document can be lowered into a desired format without a deep understanding of the format. Second, we propose multi-level mutations to operate directly on a DIR document, which can more thoroughly explore the searching space than existing single-level mutations. Combining these two techniques, we can reuse the same DIR-based generations and mutations to fuzz any document format, without separately handling the target format and re-engineering the generation/mutation components.To assess utility of our DIR-based fuzzing, we applied it to 6 PDF and 6 HTML applications (48-hour) demonstrated superior performance, outpacing general mutation-based fuzzing (AFL++), ML-based PDF fuzzing (Learn&Fuzz), and structure-aware mutation-based fuzzing ((NAUTILUS) by 33.87%, 127.74%, and 25.17% in code coverage, respectively. For HTML, it exceeded AFL++ and generation-based methods (FreeDom and Domato) by 28.8% and 14.02%.more » « less
-
Graph Neural Networks (GNNs) have emerged as powerful tools for processing graph-structured data, enabling applications in various domains. Yet, GNNs are vulnerable to model extraction attacks, imposing risks to intellectual property. To mitigate model extraction attacks, model ownership verification is considered an effective method. However, throughout a series of empirical studies, we found that the existing GNN ownership verification methods either mandate unrealistic conditions or present unsatisfactory accuracy under the most practical settings—the black-box setting where the verifier only requires access to the final output (e.g., posterior probability) of the target model and the suspect model. Inspired by the studies, we propose a new, black-box GNN ownership verification method that involves local independent models and shadow surrogate models to train a classifier for performing ownership verification. Our method boosts the verification accuracy by exploiting two insights: (1) We consider the overall behaviors of the target model for decision-making, better utilizing its holistic fingerprinting; (2) We enrich the fingerprinting of the target model by masking a subset of features of its training data, injecting extra information to facilitate ownership verification. To assess the effectiveness of our proposed method, we perform an intensive series of evaluations with 5 popular datasets, 5 mainstream GNN architectures, and 16 different settings. Our method achieves nearly perfect accuracy with a marginal impact on the target model in all cases, significantly outperforming the existing methods and enlarging their practicality. We also demonstrate that our method maintains robustness against adversarial attempts to evade the verification.more » « less
An official website of the United States government

Full Text Available