NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

An, Chenyang; Chen, Zhibo; Ye, Qihao; First, Emily; Peng, Letian; Zhang, Jiayun; Wang, Zihan; Lerner, Sorin; Shang, Jingbo (August 2024, The 62nd Annual Meeting of the Association for Computational Linguistics)

Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its training which does not incorporate learning from failed attempts. Intuitively, a tactic that leads to a failed search path would indicate that similar tactics should receive less attention during the following trials. In this paper, we demonstrate the benefit of training models that additionally learn from failed search paths. Facing the lack of such trial-and-error data in existing open-source theorem-proving datasets, we curate a dataset on intuitionistic propositional logic theorems and formalize it in Lean, such that we can reliably check the correctness of proofs. We compare our model trained on relatively short trial-and-error information (TRIALMASTER) with models trained only on the correct paths and discover that the former solves more unseen theorems with lower trial searches.
more » « less
Full Text Available
Regex+: Synthesizing Regular Expressions from Positive Examples

Pertseva, Elizaveta; Barbone, Mark; Rudek, Joey; Polikarpova, Nadia (June 2022, 11TH Workshop on Synthesis)

Regular expressions are a popular target for programming by example (PBE) systems, which seek to learn regexes from user-provided examples. Synthesizing from only positive examples remains an unsolved challenge, as the unrestricted search space makes it difficult to avoid over- and under- generalizing. Prior work has approached this in two ways: search-based techniques which require extra input, such as user feedback and/or a natural language description, and neural techniques. The former puts an extra burden on the user, while the latter requires large representative training data sets which are almost nonexistent for this domain. To tackle this challenge we present Regex+, a search-based syn- thesizer that infers regexes from just a few positive examples. Regex+ avoids over/under-generalization by using minimum description length (MDL) learning, adapted to version space algebras in order to efficiently search for an optimal regex according to a compositional MDL ranking function. Our evaluation shows that Regex+ more than triples the accu- racy of existing neural and search-based regex synthesizers on benchmarks with only positive examples
more » « less
Full Text Available
Just-in-time learning for bottom-up enumerative synthesis

https://doi.org/10.1145/3428295

Barke, Shraddha; Peleg, Hila; Polikarpova, Nadia (November 2020, Proceedings of the ACM on Programming Languages)
null (Ed.)
Full Text Available
Generating correctness proofs with neural networks

https://doi.org/10.1145/3394450.3397466

Sanchez-Stern, Alex; Alhessi, Yousef; Saul, Lawrence; Lerner, Sorin (June 2020, 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languagesu)
null (Ed.)
Full Text Available

Search for: All records