GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis | NSF Public Access Repository

skip to main content

An official website of the United States government Here's how you know

Official websites use .gov

A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS

A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Citation Details

GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis

Award ID(s):: 1937786 2131859 2125977 1937787

PAR ID:: 10522919

Author(s) / Creator(s):: Xie, Yueqi; Fang, Minghong; Pi, Renjie; Gong, Neil

Publisher / Repository:: Annual Meeting of the Association for Computational Linguistics

Date Published:: 2024-07-12

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Proceeding:
The DOI is not currently available.