A key challenge in program synthesis is synthesizing programs that use libraries, which most real-world software does. The current state of the art is to model libraries with mock library implementations that perform the same function in a simpler way. However, mocks may still be large and complex, and must include many implementation details, both of which could limit synthesis performance. To address this problem, we introduce JLibSketch, a Java program synthesis tool that allows library behavior to be described with algebraic specifications, which are rewrite rules for sequences of method calls, e.g., encryption followed by decryption (with the same key) is the identity. JLibSketch implements rewrite rules by compiling JLibSketch problems into problems for the Sketch program synthesis tool. More specifically, after compilation, library calls are represented by abstract data types (ADTs), and rewrite rules manipulate those ADTs. We formalize compilation and prove it sound and complete if the rewrite rules are ordered and non-unifiable. We evaluated JLibSketch by using it to synthesize nine programs that use libraries from three domains: data structures, cryptography, and file systems. We found that algebraic specifications are, on average, about half the size of mocks. We also found that algebraic specifications perform better than mocks on seven of the nine programs, sometimes significantly so, and perform equally well on the last two programs. Thus, we believe that JLibSketch takes an important step toward synthesis of programs that use libraries.
more »
« less
Bootstrapping Library-Based Synthesis
Constraint-based program synthesis techniques have been widely used in numerous settings. However, synthesizing programs that use libraries remains a major challenge. To handle complex or black-box libraries, the state of the art is to provide carefully crafted mocks or models to the synthesizer, requiring extra manual work. We address this challenge by proposing TOSHOKAN, a new synthesis framework as an alternative approach in which library-using programs can be generated without any user-provided artifacts at the cost of moderate performance overhead. The framework extends the classic counterexample-guided synthesis framework with a bootstrapping, log-based library model. The model collects input-output samples from running failed candidate programs on witness inputs. We prove that the framework is sound when a sound, bounded verifier is available, and also complete if the underlying synthesizer and verifier promise to produce minimal outputs. We implement and incorporate the framework to JSKETCH, a Java sketching tool. Experiments show that TOSHOKAN can successfully synthesize programs that use a variety of libraries, ranging from mathematical functions to data structures. Comparing to state-of-the-art synthesis algorithms which use mocks or models, TOSHOKAN reduces up to 159 lines of code of required manual inputs, at the cost of less than 40s of performance overheads.
more »
« less
- PAR ID:
- 10403344
- Editor(s):
- Singh, Gagandeep; Urban, Caterina
- Date Published:
- Journal Name:
- Lecture notes in computer science
- Volume:
- 13790
- ISSN:
- 0302-9743
- Page Range / eLocation ID:
- 272 - 298
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Many problem domains, including program synthesis and rewrite-based optimization, require searching astronomically large spaces of programs. Existing approaches often rely on building specialized data structures—version-space algebras, finite tree automata, or e-graphs—to compactly represent such spaces. At their core, all these data structures exploit independence of subterms; as a result, they cannot efficiently represent more complex program spaces, where the choices of subterms are entangled. We introduceequality-constrained tree automata(ECTAs), a new data structure, designed to compactly represent large spaces of programs with entangled subterms. We present efficient algorithms for extracting programs from ECTAs, implemented in a performant Haskell library, ecta. Using the ecta library, we construct Hectare, a type-driven program synthesizer for Haskell. Hectare significantly outperforms a state-of-the-art synthesizer Hoogle+—providing an average speedup of 8×—despite its implementation being an order of magnitude smaller.more » « less
-
While reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic modulo theories (TSL-MT), a framework that unites the two approaches to synthesize a single program. In our approach, reactive synthesis and SyGuS collaborate in the synthesis process, and generate executable code that implements both reactive and data-level properties. We present a tool, temos, that combines state-of-the-art methods in reactive synthesis and SyGuS to synthesize programs from TSL-MT specifications. We demonstrate the applicability of our approach over a set of benchmarks, and present a deep case study on synthesizing a music keyboard synthesizer.more » « less
-
We present an enumerative program synthesis framework calledcomponent-based refactoringthat can refactor “direct” style code that does not use library components into equivalent “combinator” style code that does use library components. This framework introduces a sound but incomplete technique to check the equivalence of direct code and combinator code calledequivalence by canonicalizationthat does not rely on input-output examples or logical specifications. Moreover, our approach can repurpose existing compiler optimizations, leveraging decades of research from the programming languages community. We instantiated our new synthesis framework in two contexts: (i) higher-order functional combinators such asmapandfilterin the staticallytyped functional programming language Elm and (ii) high-performance numerical computing combinators provided by the NumPy library for Python. We implemented both instantiations in a tool calledCobblerand evaluated it on thousands of real programs to test the performance of the component-based refactoring framework in terms of execution time and output quality. Our work offers evidence that synthesis-backed refactoring can apply across a range of domains without specification beyond the input program.more » « less
-
While deep learning (DL) has permeated, and become an integral component of many critical software systems, today software engineering research hasn't explored how to separately test data and models that are integral for DL approaches to work effectively. The main challenge in independently testing these components arises from the tight dependency between data and models. This research explores this gap, introducing our methodology of mock deep testing for unit testing of DL applications. To enable unit testing, we introduce a design paradigm that decomposes the workflow into distinct, manageable components, minimizes sequential dependencies, and modularizes key stages of the DL, including data preparation and model design. For unit testing these components, we propose modeling their dependencies using mocks. In the context of DL, mocks refer to mock data and mock model that mimic the behavior of the original data and model, respectively. This modular approach facilitates independent development and testing of the components, ensuring comprehensive quality assurance throughout the development process. We have developed KUnit, a framework for enabling mock deep testing for the Keras library, a popular library for developing DL applications. We empirically evaluated KUnit to determine the effectiveness of mocks in independently testing data and models. Our assessment of 50 DL programs obtained from Stack Overflow and GitHub shows that mocks effectively identified 10 issues in the data preparation stage and 53 issues in the model design stage. We also conducted a user study with 36 participants using KUnit to perceive the effectiveness of our approach. Participants using KUnit successfully resolved 25 issues in the data preparation stage and 38 issues in the model design stage. Our findings highlight that mock objects provide a lightweight emulation of the dependencies for unit testing, facilitating early bug detection. Lastly, to evaluate the usability of KUnit, we conducted a post-study survey. The results reveal that KUnit is helpful to DL application developers, enabling them to independently test each component (data and model) and resolve issues effectively in different stages.more » « less
An official website of the United States government

