NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Cleangen: Mitigating backdoor attacks for generation tasks in large language models

Li, Y; Xu, Z; Jiang, F; Niu, L; Sahabandu, D; Ramasubramanian, B; Poovendran, R (November 2024, Conference on Empirical Methods in Natural Language Processing (EMNLP))

The remarkable performance of large language models (LLMs) in generation tasks has enabled practitioners to leverage publicly available models to power custom applications, such as chatbots and virtual assistants. However, the data used to train or fine-tune these LLMs is often undisclosed, allowing an attacker to compromise the data and inject backdoors into the models. In this paper, we develop a novel inference time defense, named CLEANGEN, to mitigate backdoor attacks for generation tasks in LLMs. CLEANGEN is a lightweight and effective decoding strategy that is compatible with the state-of-the-art (SOTA) LLMs. Our insight behind CLEANGEN is that compared to other LLMs, back doored LLMs assign significantly higher probabilities to tokens representing the attacker-desired contents. These discrepancies in token probabilities enable CLEANGEN to identify suspicious tokens favored by the attacker and replace them with tokens generated by another LLM that is not compromised by the same attacker, thereby avoiding generation of attacker-desired content. We evaluate CLEANGEN against five SOTA backdoor attacks. Our results show that CLEANGEN achieves lower attack success rates (ASR) compared to five SOTA baseline defenses for all five backdoor attacks. Moreover, LLMs deploying CLEANGEN maintain helpfulness in their responses when serving benign user queries with minimal added computational overhead.
more » « less
Full Text Available
ACE: A model poisoning attack on contribution evaluation methods in federated learning

Xu, Z; Jiang, F; Niu, L; Jia, J; Li, Bo; Poovendran, Radha (August 2024, 33rd USENIX Security Symposium (USENIX Security 24))

Full Text Available
SafeDecoding: Defending against jailbreak attacks via safety-aware decoding

Xu, Z; Jiang, F; Niu, L; Jia, J; Li, Bo; Poovendran, Radha (August 2024, Annual Meeting of the Association for Computational Linguistics (ACL))

Full Text Available
ArtPrompt: ASCII art-based jailbreak attacks against aligned LLMs

Jiang, F; Xu, Z; Niu, L; Xiang, Z; Li, Bo; Poovendran, Radha (August 2024, Annual Meeting of the Association for Computational Linguistics (ACL))

Full Text Available
ArtPrompt: ASCII art-based jailbreak attacks against aligned LLMs

Jiang, F; Xu, Z; Niu, L; Xiang, Z; Li, Bo; Poovendran, Radha (August 2024, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15157–15173)

Safety is critical to the usage of large language models (LLMs). Multiple techniques such as data filtering and supervised fine tuning have been developed to strengthen LLM safety. However, currently known techniques presume that corpora used for safety alignment of LLMs are solely interpreted by semantics. This assumption, however, does not hold in real-world applications, which leads to severe vulnerabilities in LLMs. For example, users of forums often use ASCII art, a form of text-based art, to convey image information. In this paper, we propose a novel ASCII art-based jailbreak attack and introduce a comprehensive benchmark Vision-in-Text Challenge (VITC) to evaluate the capabilities of LLMs in recognizing prompts that cannot be solely interpreted by semantics. We show that five SOTA LLMs (GPT-3.5, GPT-4, Gemini, Claude, and Llama2) struggle to recognize prompts provided in the form of ASCII art. Based on this observation, we develop the jailbreak attack ArtPrompt, which leverages the poor performance of LLMs in recognizing ASCII art to bypass safety measures and elicit undesired behaviors from LLMs. ArtPrompt only requires black-box access to the victim LLMs, making it a practical attack. We evaluate ArtPrompt on five SOTA LLMs, and show that ArtPrompt can effectively and efficiently induce undesired behaviors from all five LLMs. Our code is available at https: //github.com/uw-nsl/ArtPrompt.
more » « less
Full Text Available
Poster: Brave: Byzantine-resilient and privacy-preserving peer-to-peer federated learning

Xu, Z; Jiang, F; Niu, L; Jia, J; Li, Bo; Poovendran, Radha (July 2024, In Proceedings of the 19th ACM Asia Conference on Computer and Communications Security (pp. 1934-1936).)

Federated learning (FL) enables multiple participants to train a global machine learning model without sharing their private training data. Peer-to-peer (P2P) FL advances existing centralized FL paradigms by eliminating the server that aggregates local models from participants and then updates the global model. However, P2P FL is vulnerable to (i) honest-but-curious participants whose objective is to infer private training data of other participants, and (ii) Byzantine participants who can transmit arbitrarily manipulated local models to corrupt the learning process. P2P FL schemes that simultaneously guarantee Byzantine resilience and preserve privacy have been less studied. In this paper, we develop Brave, a protocol that ensures Byzantine Resilience And priVacy-prEserving property for P2P FL in the presence of both types of adversaries. We show that Brave preserves privacy by establishing that any honest-but-curious adversary cannot infer other participants’ private data by observing their models. We further prove that Brave is Byzantine-resilient, which guarantees that all benign participants converge to an identical model that deviates from a global model trained without Byzantine adversaries by a bounded distance. We evaluate Brave against three state-of-the-art adversaries on a P2P FL for image classification tasks on benchmark datasets CIFAR10 and MNIST. Our results show that global models learned with Brave in the presence of adversaries achieve comparable classification accuracy to global models trained in the absence of any adversary.
more » « less
Full Text Available
SafeDecoding: Defending against jailbreak attacks via safety-aware decoding

Xu, Z; Jiang, F; Niu, L; Jia, J; Li, Bo; Poovendran, Radha (May 2024, ICLR Workshop on Secure and Trustworthy Large Language Models (ICLR SeT-LLM))

As large language models (LLMs) become increasingly integrated into real-world applications such as code generation and chatbot assistance, extensive efforts have been made to align LLM behavior with human values, including safety. Jailbreak attacks, aiming to provoke unintended and unsafe behaviors from LLMs, remain a significant LLM safety threat. In this paper, we aim to defend LLMs against jailbreak attacks by introducing SafeDecoding, a safety-aware decoding strategy for LLMs to generate helpful and harmless responses to user queries. Our insight in developing SafeDecoding is based on the observation that, even though probabilities of tokens representing harmful contents outweigh those representing harmless responses, safety disclaimers still appear among the top tokens after sorting tokens by probability in descending order. This allows us to mitigate jailbreak attacks by identifying safety disclaimers and amplifying their token probabilities, while simultaneously attenuating the probabilities of token sequences that are aligned with the objectives of jailbreak attacks. We perform extensive experiments on five LLMs using six state-of-the-art jailbreak attacks and four benchmark datasets. Our results show that SafeDecoding significantly reduces attack success rate and harmfulness of jailbreak attacks without compromising the helpfulness of responses to benign user queries while outperforming six defense methods. Our code is publicly available at: https://github.com/uw-nsl/SafeDecoding
more » « less
Full Text Available
Geometric mechanics of ordered and disordered kirigami

https://doi.org/10.1098/rspa.2022.0822

Chaudhary, G.; Niu, L.; Han, Q.; Lewicka, M.; Mahadevan, L. (June 2023, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences)

The presence of incomplete cuts in a thin planar sheet can dramatically alter its mechanical and geometrical response to loading, as the cuts allow the sheet to deform strongly in the third dimension, most beautifully demonstrated in kirigami art-forms. We use numerical experiments to characterize the geometric mechanics of kirigamized sheets as a function of the number, size and orientation of cuts. We show that the geometry of mechanically loaded sheets can be approximated as a composition of simple developable units: flats, cylinders, cones and compressed Elasticae. This geometric construction yields scaling laws for the mechanical response of the sheet in both the weak and strongly deformed limit. In the ultimately stretched limit, this further leads to a theorem on the nature and form of geodesics in an arbitrary kirigami pattern, consistent with observations and simulations. Finally, we show that by varying the shape and size of the geodesic in a kirigamized sheet, we can control the deployment trajectory of the sheet, and thence its functional properties as an exemplar of a tunable structure that can serve as a robotic gripper, a soft light window or the basis for a physically unclonable device. Overall our study of disordered kirigami sets the stage for controlling the shape and shielding the stresses in thin sheets using cuts.
more » « less
Full Text Available
Abstraction-Free Control Synthesis to Satisfy Temporal Logic Constraints under Sensor Faults and Attacks

Niu, L.; Li, Z.; Clark, A. (December 2022, IEEE Conference on Decision and Control)

Full Text Available
A Compositional Approach to Safety-Critical Resilient Control for Systems with Coupled Dynamics

Al Maruf, A.; Niu, L.; Clark, A.; Mertoguno, J.S.; Poovendran, R. (December 2022, IEEE Conference on Decision and Control)

Full Text Available

« Prev Next »

Search for: All records