skip to main content


Title: Program synthesis using conflict-driven learning
We propose a new conflict-driven program synthesis technique that is capable of learning from past mistakes. Given a spurious program that violates the desired specification, our synthesis algorithm identifies the root cause of the conflict and learns new lemmas that can prevent similar mistakes in the future. Specifically, we introduce the notion of equivalence modulo conflict and show how this idea can be used to learn useful lemmas that allow the synthesizer to prune large parts of the search space. We have implemented a general purpose CDCL-style program synthesizer called Neo and evaluate it in two different application domains, namely data wrangling in R and functional programming over lists. Our experiments demonstrate the substantial benefits of conflict driven learning and show that Neo outperforms two state-of-the-art synthesis tools, Morpheus and DeepCoder, that target these respective domains  more » « less
Award ID(s):
1762363
NSF-PAR ID:
10095674
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM SIGPLAN Conference on Programming Language Design and Implementation
Page Range / eLocation ID:
420 to 435
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Interactive proofs of theorems often require auxiliary helper lemmas to prove the desired theorem. Existing approaches for automatically synthesizing helper lemmas fall into two broad categories. Some approaches are goal-directed, producing lemmas specifically to help a user make progress from a given proof state, but they have limited expressiveness in terms of the lemmas that can be produced. Other approaches are highly expressive, able to generate arbitrary lemmas from a given grammar, but they are completely undirected and hence not amenable to interactive usage. In this paper, we develop an approach to lemma synthesis that is both goal-directed and expressive. The key novelty is a technique for reducing lemma synthesis to a data-driven program synthesis problem, whereby examples for synthesis are generated from the current proof state. We also describe a technique to systematically introduce new variables for lemma synthesis, as well as techniques for filtering and ranking candidate lemmas for presentation to the user. We implement these ideas in a tool called lfind, which can be run as a Coq tactic. In an evaluation on four benchmark suites, lfind produces useful lemmas in 68% of the cases where a human prover used a lemma to make progress. In these cases lfind synthesizes a lemma that either enables a fully automated proof of the original goal or that matches the human-provided lemma. 
    more » « less
  2. Interactive proofs of theorems often require auxiliary helper lemmas to prove the desired theorem. Existing approaches for automatically synthesizing helper lemmas fall into two broad categories. Some approaches are goal-directed, producing lemmas specifically to help a user make progress from a given proof state, but they have limited expressiveness in terms of the lemmas that can be produced. Other approaches are highly expressive, able to generate arbitrary lemmas from a given grammar, but they are completely undirected and hence not amenable to interactive usage. In this paper, we develop an approach to lemma synthesis that is both goal-directed and expressive. The key novelty is a technique for reducing lemma synthesis to a data-driven program synthesis problem, whereby examples for synthesis are generated from the current proof state. We also describe a technique to systematically introduce new variables for lemma synthesis, as well as techniques for filtering and ranking candidate lemmas for presentation to the user. We implement these ideas in a tool called lfind, which can be run as a Coq tactic. In an evaluation on four benchmark suites, lfind produces useful lemmas in 68% of the cases where a human prover used a lemma to make progress. In these cases lfind synthesizes a lemma that either enables a fully automated proof of the original goal or that matches the human-provided lemma. 
    more » « less
  3. While input-output examples are a natural form of specification for program synthesis engines, they can be imprecise for domains such as table transformations. In this paper, we investigate how extracting readily-available information about the user intent behind these input-output examples helps speed up synthesis and reduce overfitting. We present Gauss, a synthesis algorithm for table transformations that accepts partial input-output examples, along with user intent graphs. Gauss includes a novel conflict-resolution reasoning algorithm over graphs that enables it to learn from mistakes made during the search and use that knowledge to explore the space of programs even faster. It also ensures the final program is consistent with the user intent specification, reducing overfitting. We implement Gauss for the domain of table transformations (supporting Pandas and R), and compare it to three state-of-the-art synthesizers accepting only input-output examples. We find that it is able to reduce the search space by 56×, 73× and 664× on average, resulting in 7×, 26× and 7× speedups in synthesis times on average, respectively. 
    more » « less
  4. Many problem domains, including program synthesis and rewrite-based optimization, require searching astronomically large spaces of programs. Existing approaches often rely on building specialized data structures—version-space algebras, finite tree automata, or e-graphs—to compactly represent such spaces. At their core, all these data structures exploit independence of subterms; as a result, they cannot efficiently represent more complex program spaces, where the choices of subterms are entangled. We introduce equality-constrained tree automata (ECTAs), a new data structure, designed to compactly represent large spaces of programs with entangled subterms. We present efficient algorithms for extracting programs from ECTAs, implemented in a performant Haskell library, ecta. Using the ecta library, we construct Hectare, a type-driven program synthesizer for Haskell. Hectare significantly outperforms a state-of-the-art synthesizer Hoogle+—providing an average speedup of 8×—despite its implementation being an order of magnitude smaller. 
    more » « less
  5. We present a new domain-agnostic synthesis technique for generating programs from input-output examples. Our method, called metric program synthesis, relaxes the well-known observational equivalence idea (used widely in bottom-up enumerative synthesis) into a weaker notion of observational similarity, with the goal of reducing the search space that the synthesizer needs to explore. Our method clusters programs into equivalence classes based on a distance metric and constructs a version space that compactly represents ""approximately correct"" programs. Then, given a ""close enough"" program sampled from this version space, our approach uses a distance-guided repair algorithm to find a program that exactly matches the given input-output examples. We have implemented our proposed metric program synthesis technique in a tool called SyMetric and evaluate it in three different domains considered in prior work. Our evaluation shows that SyMetric outperforms other domain-agnostic synthesizers that use observational equivalence and that it achieves results competitive with domain-specific synthesizers that are either designed for or trained on those domains. 
    more » « less