Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
The goal of active learning for program synthesis is to synthesize the desired program by asking targeted questions that minimize user interaction. While prior work has explored active learning in the purely symbolic setting, such techniques are inadequate for the increasingly popular paradigm of neurosymbolic program synthesis, where the synthesized program incorporates neural components. When applied to the neurosymbolic setting, such techniques can -- and, in practice, do -- return an unintended program due to mispredictions of neural components. This paper proposes a new active learning technique that can handle the unique challenges posed by neural network mispredictions. Our approach is based upon a new evaluation strategy called constrained conformal evaluation (CCE), which accounts for neural mispredictions while taking into account user-provided feedback. Our proposed method iteratively makes CCE more precise until all remaining programs are guaranteed to be observationally equivalent. We have implemented this method in a tool called SmartLabel and experimentally evaluated it on three neurosymbolic domains. Our results demonstrate that SmartLabel identifies the ground truth program for 98% of the benchmarks, requiring under 5 rounds of user interaction on average. In contrast, prior techniques for active learning are only able to converge to the ground truth program for at most 65% of the benchmarks.more » « less
-
Many data processing systems allow SQL queries that call user-defined functions (UDFs) written in conventional programming languages. While such SQL extensions provide convenience and flexibility to users, queries involving UDFs are not as efficient as their pure SQL counterparts that invoke SQL’s highly-optimized built-in functions. Motivated by this problem, we propose a new technique for translating SQL queries with UDFs to pure SQL expressions. Unlike prior work in this space, our method is not based on syntactic rewrite rules and can handle a much more general class of UDFs. At a high-level, our method is based on counterexample-guided inductive synthesis (CEGIS) but employs a novel compositional strategy that decomposes the synthesis task into simpler sub-problems. However, because there is no universal decomposition strategy that works for all UDFs, we propose a novel lazy inductive synthesis approach that generates a sequence of decompositions that correspond to increasingly harder inductive synthesis problems. Because most realistic UDF-to-SQL translation tasks are amenable to a fine-grained decomposition strategy, our lazy inductive synthesis method scales significantly better than traditional CEGIS. We have implemented our proposed technique in a tool called CLIS for optimizing Spark SQL programs containing Scala UDFs. To evaluate CLIS, we manually study 100 randomly selected UDFs and find that 63 of them can be expressed in pure SQL. Our evaluation on these 63 UDFs shows that CLIS can automatically synthesize equivalent SQL expressions in 92% of the cases and that it can solve 2.4× more benchmarks compared to a baseline that does not use our compositional approach. We also show that CLIS yields an average speed-up of 3.5× for individual UDFs and 1.3× to 3.1× in terms of end-to-end application performance.more » « less
An official website of the United States government

Full Text Available