skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Do Users Write More Insecure Code with AI Assistants?
Abstract—We conduct the first large-scale user study examining how users interact with an AI Code assistant to solve a variety of security related tasks across different programming languages. Overall, we find that participants who had access to an AI assistant based on OpenAI’s codex-davinci-002 model wrote significantly less secure code than those without access. Additionally, participants with access to an AI assistant were more likely to believe they wrote secure code than those without access to the AI assistant. Furthermore, we find that participants who trusted the AI less and engaged more with the language and format of their prompts (e.g. re-phrasing, adjusting temperature) provided code with fewer security vulnerabilities. Finally, in order to better inform the design of future AI-based Code assistants, we provide an in-depth analysis of participants’ language and interaction behavior, as well as release our user interface as an instrument to conduct similar studies in the future.  more » « less
Award ID(s):
2343611
PAR ID:
10472129
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM CCS 2023 arXiv:2211.03622
Date Published:
Subject(s) / Keyword(s):
Cryptography and Security (cs.CR)
Format(s):
Medium: X
Location:
ACM CCS 2023 arXiv:2211.03622
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract—We conduct the first large-scale user study examining how users interact with an AI Code assistant to solve a variety of security related tasks across different programming languages. Overall, we find that participants who had access to an AI assistant based on OpenAI’s codex-davinci-002 model wrote significantly less secure code than those without access. Additionally, participants with access to an AI assistant were more likely to believe they wrote secure code than those without access to the AI assistant. Furthermore, we find that participants who trusted the AI less and engaged more with the language and format of their prompts (e.g. re-phrasing, adjusting temperature) provided code with fewer security vulnerabilities. Finally, in order to better inform the design of future AI-based Code assistants, we provide an in-depth analysis of participants’ language and interaction behavior, as well as release our user interface as an instrument to conduct similar studies in the future. 
    more » « less
  2. The rapid adoption of generative AI in software development has impacted the industry, yet its efects on developers with visual impairments remain largely unexplored. To address this gap, we used an Activity Theory framework to examine how developers with visual impairments interact with AI coding assistants. For this purpose, we conducted a study where developers who are visually impaired completed a series of programming tasks using a generative AI coding assistant. We uncovered that, while participants found the AI assistant benefcial and reported signifcant advantages, they also highlighted accessibility challenges. Specifcally, the AI coding assistant often exacerbated existing accessibility barriers and introduced new challenges. For example, it overwhelmed users with an excessive number of suggestions, leading developers who are visually impaired to express a desire for “AI timeouts.” Additionally, the generative AI coding assistant made it more difcult for developers to switch contexts between the AI-generated content and their own code. Despite these challenges, participants were optimistic about the potential of AI coding assistants to transform the coding experience for developers with visual impairments. Our fndings emphasize the need to apply activity-centered design principles to generative AI assistants, ensuring they better align with user behaviors and address specifc accessibility needs. This approach can enable the assistants to provide more intuitive, inclusive, and efective experiences, while also contributing to the broader goal of enhancing accessibility in software development 
    more » « less
  3. Large language models (LLM) are perceived to offer promising potentials for automating security tasks, such as those found in security operation centers (SOCs). As a first step towards evaluating this perceived potential, we investigate the use of LLMs in software pentesting, where the main task is to automatically identify software security vulnerabilities in source code. We hypothesize that an LLM-based AI agent can be improved over time for a specific security task as human operators interact with it. Such improvement can be made, as a first step, by engineering prompts fed to the LLM based on the responses produced, to include relevant contexts and structures so that the model provides more accurate results. Such engineering efforts become sustainable if the prompts that are engineered to produce better results on current tasks, also produce better results on future unknown tasks. To examine this hypothesis, we utilize the OWASP Benchmark Project 1.2 which contains 2,740 hand-crafted source code test cases containing various types of vulnerabilities. We divide the test cases into training and testing data, where we engineer the prompts based on the training data (only), and evaluate the final system on the testing data. We compare the AI agent’s performance on the testing data against the performance of the agent without the prompt engineering. We also compare the AI agent’s results against those from SonarQube, a widely used static code analyzer for security testing. We built and tested multiple versions of the AI agent using different off-the-shelf LLMs – Google’s Gemini-pro, as well as OpenAI’s GPT-3.5-Turbo and GPT-4-Turbo (with both chat completion and assistant APIs). The results show that using LLMs is a viable approach to build an AI agent for software pentesting that can improve through repeated use and prompt engineering. 
    more » « less
  4. Artificial Intelligence, intelligence demonstrated by machines, has emerged as one of the most convenient and personable applications of everyday life. Specifically, AI powers digital personal assistants to answer user questions and automate everyday tasks. AI Assistants listen continuously to answer the user, even when not in use. Why is this a problem? For a hacker, this makes any digital assistant a potential listening device, a major security and privacy issue. While some companies are handling this situation well, others are falling behind as their AI components are slowly dying in the consumer market. Which digital assistant is best and most secure you may ask? This paper will first detail how each AI assistant works from a technical perspective. Then based on survey results, this paper will detail how AI Assistants rank in terms of overall security and performance 
    more » « less
  5. Explanations have increasingly been incorporated into intelligent systems to offer insights into the underlying AI models. In this paper, we investigate the impact of AI-generated visual explanations on users’ decision-making processes during an image matching task. Our work examines how these explanations affect correctness, timing, and confidence and explores the role of AI literacy in user behavior. We conducted a mixed-methods user study with 54 participants who were tasked to identify hotels from images using a specialized intelligent system. Participants were randomly assigned to use the system with or without visual explanation capabilities. Results showed that visual explanations did not affect the accuracy of the decision or the confidence of the user in image matching tasks. Participants with high-AI literacy outperformed those with lower literacy, but engaged less with explanations. Distinct matching strategies emerged between high-AI and low-AI participants, with high-AI participants systematically examining high-ranked images and using the explanation for verification purposes, while low-AI participants followed more exhaustive approaches. 
    more » « less