NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Neural Network Verification with Branch-and-Bound for General Nonlinearities

Shi, Zhouxing; Jin, Qirui; Kolter, Zico; Jana, Suman; Hsieh, Cho-Jui; Zhang, Huan (May 2025, 31st International Conference on Tools and Algorithms for the Construction and Analysis of Systems)

Free, publicly-accessible full text available May 3, 2026
Defending LLMs against Jailbreaking Attacks via Backtranslation

Wang, Yihan; Shi, Zhouxing; Bai, Andrew; Hsieh, Cho-Jui (August 2024, The 62nd Annual Meeting of the Association for Computational Linguistics (ACL-Findings))

Full Text Available
Red Teaming Language Model Detectors with Language Models

Shi, Zhouxing; Wang, Yihan; Yin, Fan; Chen, Xiangning; Chang, Kai-Wei; Hsieh, Cho-Jui (October 2024, Volume: Transactions of the Association for Computational Linguistics (TACL))

Full Text Available
Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation

Yang, Lujie; Dai, Hongkai; Shi, Zhouxing; Hsieh, Cho-Jui; Tedrake, Russ; Zhang, Huan (July 2024, International Conference on Machine Learning)

Full Text Available
Defending LLMs against Jailbreaking Attacks via Backtranslation

https://doi.org/10.18653/v1/2024.findings-acl.948

Wang, Yihan; Shi, Zhouxing; Bai, Andrew; Hsieh, Cho-Jui (January 2024, Association for Computational Linguistics)

Full Text Available
Red Teaming Language Model Detectors with Language Models

https://doi.org/10.1162/tacl_a_00639

Shi, Zhouxing; Wang, Yihan; Yin, Fan; Chen, Xiangning; Chang, Kai-Wei; Hsieh, Cho-Jui (January 2024, Transactions of the Association for Computational Linguistics)

The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive usage of LLMs, recent work has proposed algorithms to detect LLM-generated text and protect LLMs. In this paper, we investigate the robustness and reliability of these LLM detectors under adversarial attacks. We study two types of attack strategies: 1) replacing certain words in an LLM’s output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation. In both strategies, we leverage an auxiliary LLM to generate the word replacements or the instructional prompt. Different from previous works, we consider a challenging setting where the auxiliary LLM can also be protected by a detector. Experiments reveal that our attacks effectively compromise the performance of all detectors in the study with plausible generations, underscoring the urgent need to improve the robustness of LLM-generated text detection systems. Code is available at https://github.com/shizhouxing/LLM-Detector-Robustness
more » « less
Full Text Available
Effective Robustness against Natural Distribution Shifts for Models with Different Training Data

Shi, Zhouxing; Carlini, Nicholas; Balashankar, Ananth; Schmidt, Ludwig; Hsieh, Cho-Jui; Beutel, Alex; Qin, Yao. (December 2023, Advances in neural information processing systems)

Full Text Available
Towards Robustness Certification Against Universal Perturbations

Zeng, Yi; Shi, Zhouxing; Jin, Ming; Kang, Feiyang; Lyu, Lingjuan; Hsieh, Cho-Jui; Jia, Ruoxi. (May 2023, International Conference on Learning Representation)

Full Text Available
On the Convergence of Certified Robust Training with Interval Bound Propagation

Wang, Yihan; Shi, Zhouxing; Gu, Quanquan; Hsieh, Cho-Jui (January 2022, International Conference on Learning Representation (ICLR))

Full Text Available
On the Sensitivity and Stability of Model Interpretations in NLP

https://doi.org/10.18653/v1/2022.acl-long.188

Yin, Fan; Shi, Zhouxing; Hsieh, Cho-Jui; Chang, Kai-Wei (January 2022, On the Sensitivity and Stability of Model Interpretations in NLP)

Recent years have witnessed the emergence of a variety of post-hoc interpretations that aim to uncover how natural language processing (NLP) models make predictions. Despite the surge of new interpretation methods, it remains an open problem how to define and quantitatively measure the faithfulness of interpretations, i.e., to what extent interpretations reflect the reasoning process by a model. We propose two new criteria, sensitivity and stability, that provide complementary notions of faithfulness to the existed removal-based criteria. Our results show that the conclusion for how faithful interpretations are could vary substantially based on different notions. Motivated by the desiderata of sensitivity and stability, we introduce a new class of interpretation methods that adopt techniques from adversarial robustness. Empirical results show that our proposed methods are effective under the new criteria and overcome limitations of gradient-based methods on removal-based criteria. Besides text classification, we also apply interpretation methods and metrics to dependency parsing. Our results shed light on understanding the diverse set of interpretations.
more » « less
Full Text Available

« Prev Next »

Search for: All records