skip to main content


Title: Metric Program Synthesis
We present a new domain-agnostic synthesis technique for generating programs from input-output examples. Our method, called metric program synthesis, relaxes the well-known observational equivalence idea (used widely in bottom-up enumerative synthesis) into a weaker notion of observational similarity, with the goal of reducing the search space that the synthesizer needs to explore. Our method clusters programs into equivalence classes based on a distance metric and constructs a version space that compactly represents ""approximately correct"" programs. Then, given a ""close enough"" program sampled from this version space, our approach uses a distance-guided repair algorithm to find a program that exactly matches the given input-output examples. We have implemented our proposed metric program synthesis technique in a tool called SyMetric and evaluate it in three different domains considered in prior work. Our evaluation shows that SyMetric outperforms other domain-agnostic synthesizers that use observational equivalence and that it achieves results competitive with domain-specific synthesizers that are either designed for or trained on those domains.  more » « less
Award ID(s):
1918839
NSF-PAR ID:
10404351
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
POPL 2022
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a new general-purpose synthesis technique for generating programs from input-output examples. Our method, called metric program synthesis, relaxes the observational equivalence idea (used widely in bottom-up enumerative synthesis) into a weaker notion of observational similarity, with the goal of reducing the search space that the synthesizer needs to explore. Our method clusters programs into equivalence classes based on an expert-provided distance metric and constructs a version space that compactly represents “approximately correct” programs. Then, given a “close enough” program sampled from this version space, our approach uses a distance-guided repair algorithm to find a program that exactly matches the given input-output examples. We have implemented our proposed metric program synthesis technique in a tool called SyMetric and evaluate it in three different domains considered in prior work. Our evaluation shows that SyMetric outperforms other domain-agnostic synthesizers that use observational equivalence and that it achieves results competitive with domain-specific synthesizers that are either designed for or trained on those domains.

     
    more » « less
  2. While input-output examples are a natural form of specification for program synthesis engines, they can be imprecise for domains such as table transformations. In this paper, we investigate how extracting readily-available information about the user intent behind these input-output examples helps speed up synthesis and reduce overfitting. We present Gauss, a synthesis algorithm for table transformations that accepts partial input-output examples, along with user intent graphs. Gauss includes a novel conflict-resolution reasoning algorithm over graphs that enables it to learn from mistakes made during the search and use that knowledge to explore the space of programs even faster. It also ensures the final program is consistent with the user intent specification, reducing overfitting. We implement Gauss for the domain of table transformations (supporting Pandas and R), and compare it to three state-of-the-art synthesizers accepting only input-output examples. We find that it is able to reduce the search space by 56×, 73× and 664× on average, resulting in 7×, 26× and 7× speedups in synthesis times on average, respectively. 
    more » « less
  3. We propose a novel trace-guided approach to tackle the challenges of ambiguity and generalization in synthesis of recursive functional programs from input-output examples. Our approach augments the search space of programs with recursion traces consisting of recursive subcalls of the programs. Our method is based on a new version space algebra (VSA) for succinct representation and efficient manipulation of pairs of recursion traces and programs that are consistent with each other. We have implemented this approach in a tool called SyRup and evaluated it on benchmarks from prior work. Our evaluation demonstrates that SyRup not only requires fewer examples to achieve a certain success rate than existing synthesizers, but is also less sensitive to the quality of the examples.

     
    more » « less
  4. Securing applications on untrusted platforms can involve protection against legitimate end-users who act in the role of malicious reverse engineers and hackers. Such adversaries have access to the full execution environment of programs, whether the program comes in the form of software or hardware. In this paper, we consider the nature of obfuscating algorithms that perform iterative, step-wise transformation of programs into more complex forms that are intended to increase the complexity (time, resources) for malicious reverse engineers. We consider simple Boolean logic programs as the domain of interest and examine a specific transformation technique known as iterative sub-circuit selection and replacement (ISR), which represents a practical, syntactic approach for obfuscation. Specifically, we focus on improving the security of ISR by maximizing the flexibility and potential security of the replacement step of the algorithm which can be formulated in the following question: given a selection of Boolean logic gates (i.e., a sub-circuit), how can we produce a semantically equivalent (polymorphic) version of the sub-circuit such that the distribution of potential replacements represents a random, uniform distribution from the set of all possible replacements. This practical question is related to the theoretic study of indistinguishability obfuscation, where a transformer for a class of circuits guarantees that given any two semantically equivalent circuits from the class, the distribution of variants from their obfuscation are computationally indistinguishable. Ideally, polymorphic circuits that follow a random, uniform distribution provide stronger protection against malicious analyzers that target identification of distinct patterns as a basis for deobfuscation and simplification. In this paper, we introduce a novel approach for polymorphic circuit replacement called random Boolean logic expansion (RBLE), which applies Boolean logic laws (of reduction) in reverse. We compare this approach against another proposed method of polymorphic replacement that relies on static circuit libraries. As a contribution, we show the strengths and weaknesses of each approach, examine initial results from empirical studies to estimate the uniformity of polymorphic distributions, and provide the argument for how such algorithms can be readily applied in software contexts. RBLE provides a unique method to generate polymorphic variants of arbitrary input, output, and gate size. We report initial findings for studying variants produced by this method and, from empirical evaluation, show that RBLE has promise for generating distributions of unique, uniform circuits when size is unconstrained, but for targeted size distributions, the approach requires some adjustment in order to reach potential circuit variants. 
    more » « less
  5. While computer-aided design is a major part of many modern manufacturing pipelines, the design files typically generated describe raw geometry. Lost in this representation is the procedure by which these designs were generated. In this paper, we present a method for reverse-engineering the process by which 3D models may have been generated, in the language of constructive solid geometry (CSG). Observing that CSG is a formal grammar, we formulate this inverse CSG problem as a program synthesis problem. Our solution is an algorithm that couples geometric processing with state-of-the-art program synthesis techniques. In this scheme, geometric processing is used to convert the mixed discrete and continuous domain of CSG trees to a pure discrete domain where modern program synthesizers excel. We demonstrate the efficiency and scalability of our algorithm on several different examples, including those with over 100 primitive parts. We show that our algorithm is able to find simple programs which are close to the ground truth, and demonstrate our method's applicability in mesh re-editing. Finally, we compare our method to prior state-of-the-art. We demonstrate that our algorithm dominates previous methods in terms of resulting CSG compactness and runtime, and can handle far more complex input meshes than any previous method. 
    more » « less