NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning

Sanchez-Stern, Alex; Varghese, Abhishek; Kaufman, Zhanna; Zhang, Dylan; Ringer, Talia; Brun, Yuriy (April 2025, IEEE)

Formal verification is a promising method for producing reliable software, but the difficulty of manually writing verification proofs severely limits its utility in practice. Recent methods have automated some proof synthesis by guiding a search through the proof space using a theorem prover. Unfortunately, the theorem prover provides only the crudest estimate of progress, resulting in effectively undirected search. To address this problem, we create QEDCartographer, an automated proof-synthesis tool that combines supervised and reinforcement learning to more effectively explore the proof space. QEDCartographer incorporates the proofs' branching structure, enabling reward-free search and overcoming the sparse reward problem inherent to formal verification. We evaluate QEDCartographer using the CoqGym benchmark of 68.5K theorems from 124 open-source Coq projects. QEDCartographer fully automatically proves 21.4% of the test-set theorems. Previous search-based proof-synthesis tools Tok, Tac, ASTactic, Passport, and Proverbot9001, which rely only on supervised learning, prove 9.6%, 9.8%, 10.9%, 12.5%, and 19.8%, respectively. Diva, which combines 62 tools, proves 19.2%. Comparing to the most effective prior tool, Proverbot9001, QEDCartographer produces 26% shorter proofs 27% faster, on average over the theorems both tools prove. Together, QEDCartographer and non-learning-based CoqHammer prove 31.8% of the theorems, while CoqHammer alone proves 26.6%. Our work demonstrates that reinforcement learning is a fruitful research direction for improving proof-synthesis tools' search mechanisms.
more » « less
Free, publicly-accessible full text available April 28, 2026
Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification

https://doi.org/10.1109/ICSE55347.2025.00161

Thompson, Kyle; Saavedra, Nuno; Carrott, Pedro; Fisher, Kevin; Sanchez-Stern, Alex; Brun, Yuriy; Ferreira, João F; Lerner, Sorin; First, Emily (April 2025, IEEE)

Free, publicly-accessible full text available April 26, 2026
Passport: Improving Automated Formal Verification Using Identifiers

https://doi.org/10.1145/3593374

Sanchez-Stern, Alex; First, Emily; Zhou, Timothy; Kaufman, Zhanna; Brun, Yuriy; Ringer, Talia (June 2023, ACM Transactions on Programming Languages and Systems)

Formally verifying system properties is one of the most effective ways of improving system quality, but its high manual effort requirements often render it prohibitively expensive. Tools that automate formal verification by learning from proof corpora to synthesize proofs have just begun to show their promise. These tools are effective because of the richness of the data the proof corpora contain. This richness comes from the stylistic conventions followed by communities of proof developers, together with the powerful logical systems beneath proof assistants. However, this richness remains underexploited, with most work thus far focusing on architecture rather than on how to make the most of the proof data. This article systematically explores how to most effectively exploit one aspect of that proof data: identifiers. We develop the Passport approach, a method for enriching the predictive Coq model used by an existing proof-synthesis tool with three new encoding mechanisms for identifiers: category vocabulary indexing, subword sequence modeling, and path elaboration. We evaluate our approach’s enrichment effect on three existing base tools: ASTactic, Tac, and Tok. In head-to-head comparisons, Passport automatically proves 29% more theorems than the best-performing of these base tools. Combining the three tools enhanced by the Passport approach automatically proves 38% more theorems than combining the three base tools. Finally, together, these base tools and their enhanced versions prove 45% more theorems than the combined base tools. Overall, our findings suggest that modeling identifiers can play a significant role in improving proof synthesis, leading to higher-quality software.
more » « less
Full Text Available
PRoofster: Automated Formal Verification

https://doi.org/10.1109/ICSE-Companion58688.2023.00018

Agrawal, Arpan; First, Emily; Kaufman, Zhanna; Reichel, Tom; Zhang, Shizhuo; Zhou, Timothy; Sanchez-Stern, Alex; Ringer, Talia; Brun, Yuriy (May 2023, Proceedings of the Demonstrations Track at the 45th International Conference on Software Engineering (ICSE))

Full Text Available
Scooter & Sidecar: A Domain-Specific Approach to Writing Secure Database Migrations

https://doi.org/10.1145/3453483.3454072

Renner, John; Sanchez-Stern, Alex; Brown, Fraser; Lerner, Sorin; Stefan, Deian (June 2021, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation)
null (Ed.)
Web applications often handle large amounts of sensitive user data. Modern secure web frameworks protect this data by (1) using declarative languages to specify security policies alongside database schemas and (2) automatically enforcing these policies at runtime. Unfortunately, these frameworks do not handle the very common situation in which the schemas or the policies need to evolve over time—and updates to schemas and policies need to be performed in a carefully coordinated way. Mistakes during schema or policy migrations can unintentionally leak sensitive data or introduce privilege escalation bugs. In this work, we present a domain-specific language (Scooter) for expressing schema and policy migrations, and an associated SMT-based verifier (Sidecar) which ensures that migrations are secure as the application evolves. We describe the design of Scooter and Sidecar and show that our framework can be used to express realistic schemas, policies, and migrations, without giving up on runtime or verification performance.
more » « less
Full Text Available
Generating correctness proofs with neural networks

https://doi.org/10.1145/3394450.3397466

Sanchez-Stern, Alex; Alhessi, Yousef; Saul, Lawrence; Lerner, Sorin (June 2020, 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languagesu)
null (Ed.)
Full Text Available
Data-driven lemma synthesis for interactive proofs

https://doi.org/10.1145/3563306

Sivaraman, Aishwarya; Sanchez-Stern, Alex; Chen, Bretton; Lerner, Sorin; Millstein, Todd (October 2022, Proceedings of the ACM on Programming Languages)

Interactive proofs of theorems often require auxiliary helper lemmas to prove the desired theorem. Existing approaches for automatically synthesizing helper lemmas fall into two broad categories. Some approaches are goal-directed, producing lemmas specifically to help a user make progress from a given proof state, but they have limited expressiveness in terms of the lemmas that can be produced. Other approaches are highly expressive, able to generate arbitrary lemmas from a given grammar, but they are completely undirected and hence not amenable to interactive usage. In this paper, we develop an approach to lemma synthesis that is both goal-directed and expressive. The key novelty is a technique for reducing lemma synthesis to a data-driven program synthesis problem, whereby examples for synthesis are generated from the current proof state. We also describe a technique to systematically introduce new variables for lemma synthesis, as well as techniques for filtering and ranking candidate lemmas for presentation to the user. We implement these ideas in a tool called lfind, which can be run as a Coq tactic. In an evaluation on four benchmark suites, lfind produces useful lemmas in 68% of the cases where a human prover used a lemma to make progress. In these cases lfind synthesizes a lemma that either enables a fully automated proof of the original goal or that matches the human-provided lemma.
more » « less

Search for: All records