NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

HILDE: Intentional Code Generation via Human-in-the-Loop Decoding

Gonzalez, Emmanuel Anaya; Rothkopf, Raven; Lerner, Sorin; Polikarpova, Nadia (October 2025, Proceedings)

While AI programming tools hold the promise of increasing programmers’ capabilities and productivity to a remarkable degree, they often exclude users from essential decision making processes, causing many to effectively “turn off their brains” and over-rely on solutions provided by these systems. These behaviors can have severe consequences in critical domains, like software security. We propose Human-in-the-Loop Decoding, a novel interaction technique that allows users to observe and directly influence LLM decisions during code generation, in order to align the model’s output with their personal requirements. We implement this technique in HILDE, a code completion assistant that highlights critical decisions made by the LLM and provides local alternatives for the user to explore. In a within-subjects study (N=18) on security-related tasks, we found that HILDE led participants to generate significantly fewer vulnerabilities and better align code generation with their goals compared to a traditional code completion assistant.
more » « less
Free, publicly-accessible full text available October 7, 2026
HILDE: Intentional Code Generation via Human-in-the-Loop Decoding

Gonzalez, Emmanuel Anaya; Rothkopf, Raven; Lerner, Sorin; Polikarpova, Nadia (October 2025, 2025 IEEE Symposium on Visual Languages and Human-Centric Computing)

While AI programming tools hold the promise of increasing programmers’ capabilities and productivity to a remarkable degree, they often exclude users from essential decision making processes, causing many to effectively “turn off their brains” and over-rely on solutions provided by these systems. These behaviors can have severe consequences in critical domains, like software security. We propose Human-in-the-Loop Decoding, a novel interaction technique that allows users to observe and directly influence LLM decisions during code generation, in order to align the model’s output with their personal requirements. We implement this technique in HILDE, a code completion assistant that highlights critical decisions made by the LLM and provides local alternatives for the user to explore. In a within-subjects study (N=18) on security-related tasks, we found that HILDE led participants to generate significantly fewer vulnerabilities and better align code generation with their goals compared to a traditional code completion assistant.
more » « less
Free, publicly-accessible full text available October 7, 2026
The Command Line GUIde: Graphical Interfaces from Man Pages via AI

Kasibatla, Saketh Ram; Hiremath, Kiran Medleri; Rothkopf, Raven; Lerner, Sorin; Xia, Haijun; Hempel, Brian (October 2025, Proceedings)

Although birthed in the era of teletypes, the command line shell survived the graphical interface revolution of the 1980’s and lives on in modern desktop operating systems. The command line provides access to powerful functionality not otherwise exposed on the computer, but requires users to recall textual syntax and carefully scour documentation. In contrast, graphical interfaces let users organically discover and invoke possible actions through widgets and menus. To better expose the power of the command line, we demonstrate a mechanism for automatically creating graphical interfaces for command line tools by translating their documentation (in the form of man pages) into interface specifications via AI. Using these specifications, our user-facing system, called GUIDE, presents the command options to the user graphically. We evaluate the generated interfaces on a corpus of commands to show to what degree GUIDE offers thorough graphical interfaces for users’ real-world command line tasks.
more » « less
Free, publicly-accessible full text available October 7, 2026
The Command Line GUIde: Graphical Interfaces from Man Pages via AI

Kasibatla, Saketh Ram; Hiremath, Kiran Medleri; Rothkopf, Raven; Lerner, Sorin; Xia, Haijun; Hempel, Brian (October 2025, 2025 IEEE Symposium on Visual Languages and Human-Centric Computing)

Although birthed in the era of teletypes, the command line shell survived the graphical interface revolution of the 1980’s and lives on in modern desktop operating systems. The command line provides access to powerful functionality not otherwise exposed on the computer, but requires users to recall textual syntax and carefully scour documentation. In contrast, graphical interfaces let users organically discover and invoke possible actions through widgets and menus. To better expose the power of the command line, we demonstrate a mechanism for automatically creating graphical interfaces for command line tools by translating their documentation (in the form of man pages) into interface specifications via AI. Using these specifications, our user-facing system, called GUIDE, presents the command options to the user graphically. We evaluate the generated interfaces on a corpus of commands to show to what degree GUIDE offers thorough graphical interfaces for users’ real-world command line tasks.
more » « less
Free, publicly-accessible full text available October 7, 2026
How Scientists Use Jupyter Notebooks: Goals, Quality Attributes, and Opportunities

Huang, Ruanqianqian Lisa; Ravi, Savitha; He, Michael; Tian, Boyu; Lerner, Sorin; Coblenz, Michael (April 2025, 2025 IEEE/ACM 47th International Conference on Software Engineering)

Computational notebooks are intended to prioritize the needs of scientists, but little is known about how scientists interact with notebooks, what requirements drive scientists’ software development processes, or what tactics scientists use to meet their requirements. We conducted an observational study of 20 scientists using Jupyter notebooks for their day-to-day tasks, finding that scientists prioritize different quality attributes depending on their goals. A qualitative analysis of their usage shows (1) a collection of goals scientists pursue with Jupyter notebooks, (2) a set of quality attributes that scientists value when they write software, and (3) tactics that scientists leverage to promote quality. In addition, we identify ways scientists incorporated AI tools into their notebook work. From our observations, we derive design recommendations for improving computational notebooks and future programming systems for scientists. Key opportunities pertain to helping scientists create and manage state, dependencies, and abstractions in their software, enabling more effective reuse of clearly-defined components.
more » « less
Free, publicly-accessible full text available April 27, 2026
How Scientists Use Jupyter Notebooks: Goals, Quality Attributes, and Opportunities

https://doi.org/10.1109/ICSE55347.2025.00232

Huang, Ruanqianqian Lisa; Ravi, Savitha; He, Michael; Tian, Boyu; Lerner, Sorin; Coblenz, Michael (April 2025, IEEE)

Free, publicly-accessible full text available April 26, 2026
Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

An, Chenyang; Chen, Zhibo; Ye, Qihao; First, Emily; Peng, Letian; Zhang, Jiayun; Wang, Zihan; Lerner, Sorin; Shang, Jingbo (August 2024, The 62nd Annual Meeting of the Association for Computational Linguistics)

Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its training which does not incorporate learning from failed attempts. Intuitively, a tactic that leads to a failed search path would indicate that similar tactics should receive less attention during the following trials. In this paper, we demonstrate the benefit of training models that additionally learn from failed search paths. Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems and formalize it in Lean, such that we can reliably check the correctness of proofs. We compare our model trained on relatively short trial-and-error information (TRIALMASTER) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.
more » « less
Full Text Available
Validating AI-Generated Code with Live Programming

https://doi.org/10.1145/3613904.3642495

Ferdowsi, Kasra; Huang, Ruanqianqian Lisa; James, Michael B; Polikarpova, Nadia; Lerner, Sorin (May 2024, ACM)

Full Text Available
Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification

https://doi.org/10.1109/ICSE55347.2025.00161

Thompson, Kyle; Saavedra, Nuno; Carrott, Pedro; Fisher, Kevin; Sanchez-Stern, Alex; Brun, Yuriy; Ferreira, João F; Lerner, Sorin; First, Emily (April 2025, IEEE)

Free, publicly-accessible full text available April 26, 2026
Investigating the Impact of Using a Live Programming Environment in a CS1 Course

Huang, Ruanqianqian; Ferdowsifard, Kasra; Selvaraj, Ana; Soosai Raj, Adalbert Gerald; Lerner, Sorin (March 2022, In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education)

Novice programmers often struggle with code understanding and debugging. Live Programming environments visualize the runtime values of a program each time it is modified to provide immediate feedback, which help with tracing the program execution. This paper presents the use of a Live Programming tool in a CS1 course to better understand the impact of Live Programming on novices’ learning metrics and their perceptions of the tool. We conducted a within-subjects study at a large public university in a CS1 course in Python (N=237) where students completed tasks in a lab setting, in some cases with a Live Programming environment, and in some cases without. Through post-lab surveys and open-ended feedback, we measured how well students understood the material and how students perceived the programming environment. To understand the impact of Live Programming, we compared the collected data for students who used Live Programming with the data for students who did not. We found that while learning outcomes were the same regardless of whether Live Programming was used or not, students who used the Live Programming tool completed some code tracing tasks faster. Furthermore, students liked the Live Programming environment more, and rated it as more helpful for their learning.
more » « less
Full Text Available

« Prev Next »

Search for: All records