skip to main content


Title: Grounded Copilot: How Programmers Interact with Code-Generating Models

Powered by recent advances in code-generating models, AI assistants like Github Copilot promise to change the face of programming forever. But whatisthis new face of programming? We present the first grounded theory analysis of how programmers interact with Copilot, based on observing 20 participants—with a range of prior experience using the assistant—as they solve diverse programming tasks across four languages. Our main finding is that interactions with programming assistants arebimodal: inacceleration mode, the programmer knows what to do next and uses Copilot to get there faster; inexploration mode, the programmer is unsure how to proceed and uses Copilot to explore their options. Based on our theory, we provide recommendations for improving the usability of future AI programming assistants.

 
more » « less
Award ID(s):
2107397
NSF-PAR ID:
10467112
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Programming Languages
Volume:
7
Issue:
OOPSLA1
ISSN:
2475-1421
Page Range / eLocation ID:
85 to 111
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants’ feedback. 
    more » « less
  2. null (Ed.)
    Student perceptions of the complete online transition of two CS courses in response to the COVID-19 pandemic Due to the COVID-19 pandemic, universities across the globe switched from traditional Face-to-Face (F2F) course delivery to completely online. Our university declared during our Spring break that students would not return to campus, and that all courses must be delivered fully online starting two weeks later. This was challenging to both students and instructors. In this evidence-based practice paper, we present results of end-of-semester student surveys from two Spring 2020 CS courses: a programming intensive CS2 course, and a senior theory course in Formal Languages and Automata (FLA). Students indicated course components they perceived as most beneficial to their learning, before and then after the online transition, and preferences for each regarding online vs. F2F. By comparing student reactions across courses, we gain insights on which components are easily adapted to online delivery, and which require further innovation. COVID was unfortunate, but gave a rare opportunity to compare students’ reflections on F2F instruction with online instructional materials for half a semester vs. entirely online delivery of the same course during the second half. The circumstances are unique, but we were able to acquire insights for future instruction. Some course components were perceived to be more useful either before or after the transition, and preferences were not the same in the two courses, possibly due to differences in the courses. Students in both courses found prerecorded asynchronous lectures significantly less useful than in-person lectures. For CS2, online office hours were significantly less useful than in-person office hours, but we found no significant difference in FLA. CS2 students felt less supported by their instructor after the online transition, but no significant difference was indicated by FLA students. FLA students found unproctored online exams offered through Canvas more stressful than in-person proctored exams, but the opposite was indicated by CS2 students. CS2 students indicated that visual materials from an eTextbook were more useful to them after going online than before, but FLA students indicated no significant difference. Overall, students in FLA significantly preferred the traditional F2F version of the course, while no significant difference was detected for CS2 students. We did not find significant effects from gender on the preference of one mode over the other. A serendipitous outcome was learning that some changes forced by circumstance should be considered for long term adoption. Offering online lab sessions and online exams where the questions are primarily multiple choice are possible candidates. However, we found that students need to feel the presence of their instructor to feel properly supported. To determine what course components need further improvement before transitioning to fully online mode, we computed a logistic regression model. The dependent variable is the student's preference for F2F or fully online. The independent variables are the course components before and after the online transition. For both courses, in-person lectures were a significant factor negatively affecting students' preferences of the fully online mode. Similarly, for CS2, in-person labs and in-person office hours were significant factors pushing students’ preferences toward F2F mode. 
    more » « less
  3. Artificial Intelligence (AI) enhanced systems are widely adopted in post-secondary education, however, tools and activities have only recently become accessible for teaching AI and machine learning (ML) concepts to K-12 students. Research on K-12 AI education has largely included student attitudes toward AI careers, AI ethics, and student use of various existing AI agents such as voice assistants; most of which has focused on high school and middle school. There is no consensus on which AI and Machine Learning concepts are grade-appropriate for elementary-aged students or how elementary students explore and make sense of AI and ML tools. AI is a rapidly evolving technology and as future decision-makers, children will need to be AI literate[1]. In this paper, we will present elementary students’ sense-making of simple machine-learning concepts. Through this project, we hope to generate a new model for introducing AI concepts to elementary students into school curricula and provide tangible, trainable representations of ML for students to explore in the physical world. In our first year, our focus has been on simpler machine learning algorithms. Our desire is to empower students to not only use AI tools but also to understand how they operate. We believe that appropriate activities can help late elementary-aged students develop foundational AI knowledge namely (1) how a robot senses the world, and (2) how a robot represents data for making decisions. Educational robotics programs have been repeatedly shown to result in positive learning impacts and increased interest[2]. In this pilot study, we leveraged the LEGO® Education SPIKE™ Prime for introducing ML concepts to upper elementary students. Through pilot testing in three one-week summer programs, we iteratively developed a limited display interface for supervised learning using the nearest neighbor algorithm. We collected videos to perform a qualitative evaluation. Based on analyzing student behavior and the process of students trained in robotics, we found some students show interest in exploring pre-trained ML models and training new models while building personally relevant robotic creations and developing solutions to engineering tasks. While students were interested in using the ML tools for complex tasks, they seemed to prefer to use block programming or manual motor controls where they felt it was practical. 
    more » « less
  4. Abstract—We conduct the first large-scale user study examining how users interact with an AI Code assistant to solve a variety of security related tasks across different programming languages. Overall, we find that participants who had access to an AI assistant based on OpenAI’s codex-davinci-002 model wrote significantly less secure code than those without access. Additionally, participants with access to an AI assistant were more likely to believe they wrote secure code than those without access to the AI assistant. Furthermore, we find that participants who trusted the AI less and engaged more with the language and format of their prompts (e.g. re-phrasing, adjusting temperature) provided code with fewer security vulnerabilities. Finally, in order to better inform the design of future AI-based Code assistants, we provide an in-depth analysis of participants’ language and interaction behavior, as well as release our user interface as an instrument to conduct similar studies in the future. 
    more » « less
  5. Abstract—We conduct the first large-scale user study examining how users interact with an AI Code assistant to solve a variety of security related tasks across different programming languages. Overall, we find that participants who had access to an AI assistant based on OpenAI’s codex-davinci-002 model wrote significantly less secure code than those without access. Additionally, participants with access to an AI assistant were more likely to believe they wrote secure code than those without access to the AI assistant. Furthermore, we find that participants who trusted the AI less and engaged more with the language and format of their prompts (e.g. re-phrasing, adjusting temperature) provided code with fewer security vulnerabilities. Finally, in order to better inform the design of future AI-based Code assistants, we provide an in-depth analysis of participants’ language and interaction behavior, as well as release our user interface as an instrument to conduct similar studies in the future. 
    more » « less