The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive usage of LLMs, recent work has proposed algorithms to detect LLM-generated text and protect LLMs. In this paper, we investigate the robustness and reliability of these LLM detectors under adversarial attacks. We study two types of attack strategies: 1) replacing certain words in an LLM’s output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation. In both strategies, we leverage an auxiliary LLM to generate the word replacements or the instructional prompt. Different from previous works, we consider a challenging setting where the auxiliary LLM can also be protected by a detector. Experiments reveal that our attacks effectively compromise the performance of all detectors in the study with plausible generations, underscoring the urgent need to improve the robustness of LLM-generated text detection systems. Code is available at https://github.com/shizhouxing/LLM-Detector-Robustness
more »
« less
This content will become publicly available on February 12, 2026
Evaluating GPT for use in K-12 Block Based CS Instruction Using a Transpiler and Prompt Engineering
Though the increased availability of Large Language Models (LLMs) presents signi!cant potential for change in the way students learn to program, the text-based nature of the available tools currently preclude block-based languages from much of that innovation. In an attempt to remedy this, we identify the strengths and weaknesses of using a transpiler to leverage the existing learning in commercially available LLMs and Scratch, a visual block-based programming language. Using only prompt engineering, we evaluate an LLM’s performance on two common classroom tasks in a Scratch curriculum. We evaluate the LLM’s ability to: 1) Create project solutions that compile and satisfy project requirements and 2) Analyze student projects’ completion of project requirements using natural language. In both cases, we !nd results indicating that prompt-engineering alone is insu"cient to reliably produce high-quality results. For projects of medium complexity, the LLM-generated solutions con- sistently failed to follow correct syntax or, in the few instances with correct syntax, produce correct solutions. When used for auto- grading, we found a correlation between scores assigned by the official Scratch Encore autograder and those generated by the LLM, nevertheless the discrepancies between the ‘real’ scores and the scores assigned by the LLM remained too great for the tool to be reliable in a classroom setting.
more »
« less
- Award ID(s):
- 2201313
- PAR ID:
- 10585802
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400705311
- Page Range / eLocation ID:
- 388 to 394
- Subject(s) / Keyword(s):
- K-12 Block Based Programming Large Language Models Generative AI
- Format(s):
- Medium: X
- Location:
- Pittsburgh PA USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The rise of e-commerce and social networking platforms has led to an increase in the disclosure of personal health information within user-generated content. This study investigates the application of large language models (LLMs) to detect and sanitize sensitive health data shared by users across platforms such as Amazon, patient.info, and Facebook. We propose a methodology that leverages LLMs to evaluate both the sensitivity of disclosed information and the platform-specific semantics of the content. Through prompt engineering, our method identifies sensitive information and rephrases it to minimize disclosure while preserving content similarity. ChatGPT serves as the LLM in this study due to its versatility. Empirical results suggest that ChatGPT can reliably assign sensitivity scores to user-generated text and generate sanitized versions that effectively preserve the original meaning.more » « less
-
Large Language Models (LLMs) have achieved remarkable success in natural language tasks, yet understanding their reasoning processes re- mains a significant challenge. We address this by introducing XplainLLM, a dataset accom- panying an explanation framework designed to enhance LLM transparency and reliability. Our dataset comprises 24,204 instances where each instance interprets the LLM’s reasoning behavior using knowledge graphs (KGs) and graph attention networks (GAT), and includes explanations of LLMs such as the decoder- only Llama-3 and the encoder-only RoBERTa. XplainLLM also features a framework for gener- ating grounded explanations and the debugger- scores for multidimensional quality analysis. Our explanations include why-choose and why- not-choose components, reason-elements, and debugger-scores that collectively illuminate the LLM’s reasoning behavior. Our evaluations demonstrate XplainLLM’s potential to reduce hallucinations and improve grounded explana- tion generation in LLMs. XplainLLM is a re- source for researchers and practitioners to build trust and verify the reliability of LLM outputs. Our code and dataset are publicly available.more » « less
-
Automating hardware design could obviate a signif-icant amount of human error from the engineering process and lead to fewer errors. Verilog is a popular hardware description language to model and design digital systems, thus generating Verilog code is a critical first step. Emerging large language models (LLMs) are able to write high-quality code in other programming languages. In this paper, we characterize the ability of LLMs to generate useful Verilog. For this, we fine-tune pre-trained LLMs on Verilog datasets collected from GitHub and Verilog textbooks. We construct an evaluation framework comprising test-benches for functional analysis and a flow to test the syntax of Verilog code generated in response to problems of varying difficulty. Our findings show that across our problem scenarios, the fine-tuning results in LLMs more capable of producing syntactically correct code (25.9% overall). Further, when analyzing functional correctness, a fine-tuned open-source CodeGen LLM can outperform the state-of-the-art commercial Codex LLM (6.5% overall). We release our training/evaluation scripts and LLM checkpoints as open source contributions.more » « less
-
Large language models (LLM) are perceived to offer promising potentials for automating security tasks, such as those found in security operation centers (SOCs). As a first step towards evaluating this perceived potential, we investigate the use of LLMs in software pentesting, where the main task is to automatically identify software security vulnerabilities in source code. We hypothesize that an LLM-based AI agent can be improved over time for a specific security task as human operators interact with it. Such improvement can be made, as a first step, by engineering prompts fed to the LLM based on the responses produced, to include relevant contexts and structures so that the model provides more accurate results. Such engineering efforts become sustainable if the prompts that are engineered to produce better results on current tasks, also produce better results on future unknown tasks. To examine this hypothesis, we utilize the OWASP Benchmark Project 1.2 which contains 2,740 hand-crafted source code test cases containing various types of vulnerabilities. We divide the test cases into training and testing data, where we engineer the prompts based on the training data (only), and evaluate the final system on the testing data. We compare the AI agent’s performance on the testing data against the performance of the agent without the prompt engineering. We also compare the AI agent’s results against those from SonarQube, a widely used static code analyzer for security testing. We built and tested multiple versions of the AI agent using different off-the-shelf LLMs – Google’s Gemini-pro, as well as OpenAI’s GPT-3.5-Turbo and GPT-4-Turbo (with both chat completion and assistant APIs). The results show that using LLMs is a viable approach to build an AI agent for software pentesting that can improve through repeated use and prompt engineering.more » « less
An official website of the United States government
