Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

Vaithilingam, Priyan; Zhang, Tianyi; Glassman, Elena L.

doi:10.1145/3491101.3519665

Citation Details

Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants’ feedback. more »

Award ID(s):: 2107391 2123965

NSF-PAR ID:: 10366304

Author(s) / Creator(s):: Vaithilingam, Priyan; Zhang, Tianyi; Glassman, Elena L.

Date Published:: 2022-04-27

Journal Name:: CHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI ’22 Extended Abstracts)

Page Range / eLocation ID:: 1 to 7

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3491101.3519665

More Like this