NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Guardagent: Safeguard LLM agents via knowledge-enabled reasoning

Xiang, Zhen; Zheng, Linzhi; Li, Yanjie; Hong, Junyuan; Li, Qinbin; Xie, Han; Zhang, Jiawei; Xiong, Zidi; Xie, Chulin; Bastian, Nathaniel D (July 2025, ICML 2025 Workshop on Computer Use Agents)

The rapid advancement of large language model (LLM) agents has raised new concerns regarding their safety and security, which cannot be addressed by traditional textual-harm-focused LLM guardrails. We propose GuardAgent, the first guardrail agent to protect other agents by checking whether the agent actions satisfy safety guard requests. Specifically, GuardAgent first analyzes the safety guard requests to generate a task plan, and then converts this plan into guardrail code for execution. In both steps, an LLM is utilized as the reasoning component, supplemented by in-context demonstrations retrieved from a memory module storing information from previous tasks. GuardAgent can understand different safety guard requests and provide reliable code-based guardrails with high flexibility and low operational overhead. In addition, we propose two novel benchmarks: EICU-AC benchmark to assess the access control for healthcare agents and Mind2Web-SC benchmark to evaluate the safety regulations for web agents. We show that GuardAgent effectively moderates the violation actions for two types of agents on these two benchmarks with over 98% and 83% guardrail accuracies, respectively.
more » « less
Full Text Available
DeepOSets: Non-Autoregressive In-Context Learning of Supervised Learning Operators

Chiu, Shao-Ting; Hong, Junyuan; Braga-Neto, Ulisses (December 2024, Neurips 2024)

Full Text Available
A-CONECT: Designing AI-based Conversational Chatbot for Early Dementia Intervention

Hong, Junyuan; Zheng, Wenqing; Meng, Han; Liang, Siqi; Chen, Anqing; Dodge, Hiroko H; Zhou, Jiayu; Wang, Zhangyang (March 2025, ICLR 2024 Workshop on Large Language Model (LLM) Agents)

Full Text Available
On the Generalization Ability of Unsupervised Pretraining

Deng, Yuyang; Hong, Junyuan; Zhou, Jiayu; Mahdavi, Mehrdad (March 2024, International Conference on Artificial Intelligence and Statistics)

Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. However, a rigorous understanding of how the representation function learned on an unlabeled dataset affects the generalization of the fine-tuned model is lacking. Existing theoretical research does not adequately account for the heterogeneity of the distribution and tasks in pre-training and fine-tuning stage. To bridge this gap, this paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase, ultimately affecting the generalization capabilities of the fine-tuned model on downstream tasks. We apply our theoretical framework to analyze generalization bound of two distinct scenarios: Context Encoder pre-training with deep neural networks and Masked Autoencoder pre-training with deep transformers, followed by fine-tuning on a binary classification task. Finally, inspired by our findings, we propose a novel regularization method during pre-training to further enhances the generalization of fine-tuned model. Overall, our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
more » « less
Full Text Available
DP-OPT: MAKE LARGE LANGUAGE MODEL YOUR PRIVACY-PRESERVING PROMPT ENGINEER

Hong, Junyuan; Wang, Jiachen; Zhang, Chenhui; Li, Zhangheng; Li, Bo; Wang, Zhangyang (May 2024, International Conference on Learning Representations (ICLR) 2024)

Full Text Available
GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing

https://doi.org/10.18653/v1/2025.naacl-long.287

Duan, Jinhao; Zhao, Xinyu; Zhang, Zhuoxuan; Ko, Eunhye Grace; Boddy, Lily; Wang, Chenan; Li, Tianhao; Rasgon, Alexander; Hong, Junyuan; Lee, Min Kyung; et al (January 2025, Association for Computational Linguistics)

Full Text Available
Safe and Robust Watermark Injection with a Single OoD Image

Yu, Shuyang; Hong, Junyuan; Zhang, Haobo; Wang, Haotao; Wang, Zhangyang; Zhou, Jiayu (January 2024, 2024 International Conference on Learning Representations)

Full Text Available
LLM-PBE: Assessing Data Privacy in Large Language Models

https://doi.org/10.14778/3681954.3681994

Li, Qinbin; Hong, Junyuan; Xie, Chulin; Tan, Jeffrey; Xin, Rachel; Hou, Junyi; Yin, Xavier; Wang, Zhun; Hendrycks, Dan; Wang, Zhangyang; et al (July 2024, Proceedings of the VLDB Endowment)

Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Their profound capabilities in processing and interpreting complex language data, however, bring to light pressing concerns regarding data privacy, especially the risk of unintentional training data leakage. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Addressing this gap, our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs. LLM-PBE is designed to analyze privacy across the entire lifecycle of LLMs, incorporating diverse attack and defense strategies, and handling various data types and metrics. Through detailed experimentation with multiple LLMs, LLM-PBE facilitates an in-depth exploration of data privacy concerns, shedding light on influential factors such as model size, data characteristics, and evolving temporal dimensions. This study not only enriches the understanding of privacy issues in LLMs but also serves as a vital resource for future research in the field. Aimed at enhancing the breadth of knowledge in this area, the findings, resources, and our full technical report are made available at https://llm-pbe.github.io/, providing an open platform for academic and practical advancements in LLM privacy assessment.
more » « less
Full Text Available
Understanding Deep Gradient Leakage via Inversion Influence Functions

Zhang, Haobo; Hong, Junyuan; Deng, Yuyang; Mahdavi, Mehrdad; Zhou, Jiayu (September 2023, 2023 Conference on Neural Information Processing Systems)

Full Text Available
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Hong, Junyuan; Duan, Jinhao; Zhang, Chenhui; Li, Zhangheng; Xie, Chulin; Lieberman, Kelsey; Diffenderfer, James; Bartoldson, Brian; Jaiswal, Ajay; Xu, Kaidi; et al (July 2024, International Conference on Machine Learning (ICML 2024))

Full Text Available

« Prev Next »

Search for: All records