skip to main content

This content will become publicly available on October 20, 2022

Title: UDF to SQL translation through compositional lazy inductive synthesis
Many data processing systems allow SQL queries that call user-defined functions (UDFs) written in conventional programming languages. While such SQL extensions provide convenience and flexibility to users, queries involving UDFs are not as efficient as their pure SQL counterparts that invoke SQL’s highly-optimized built-in functions. Motivated by this problem, we propose a new technique for translating SQL queries with UDFs to pure SQL expressions. Unlike prior work in this space, our method is not based on syntactic rewrite rules and can handle a much more general class of UDFs. At a high-level, our method is based on counterexample-guided inductive synthesis (CEGIS) but employs a novel compositional strategy that decomposes the synthesis task into simpler sub-problems. However, because there is no universal decomposition strategy that works for all UDFs, we propose a novel lazy inductive synthesis approach that generates a sequence of decompositions that correspond to increasingly harder inductive synthesis problems. Because most realistic UDF-to-SQL translation tasks are amenable to a fine-grained decomposition strategy, our lazy inductive synthesis method scales significantly better than traditional CEGIS. We have implemented our proposed technique in a tool called CLIS for optimizing Spark SQL programs containing Scala UDFs. To evaluate CLIS, we manually study 100 more » randomly selected UDFs and find that 63 of them can be expressed in pure SQL. Our evaluation on these 63 UDFs shows that CLIS can automatically synthesize equivalent SQL expressions in 92% of the cases and that it can solve 2.4× more benchmarks compared to a baseline that does not use our compositional approach. We also show that CLIS yields an average speed-up of 3.5× for individual UDFs and 1.3× to 3.1× in terms of end-to-end application performance. « less
Authors:
; ; ;
Award ID(s):
1918889
Publication Date:
NSF-PAR ID:
10320423
Journal Name:
Proceedings of the ACM on Programming Languages
Volume:
5
Issue:
OOPSLA
ISSN:
2475-1421
Sponsoring Org:
National Science Foundation
More Like this
  1. Syntax-guided synthesis (SyGuS) aims to find a program satisfying semantic specification as well as user-provided structural hypotheses. There are two main synthesis approaches: enumerative synthesis, which repeatedly enumerates possible candidate programs and checks their correctness, and deductive synthesis, which leverages a symbolic procedure to construct implementations from specifications. Neither approach is strictly better than the other: automated deductive synthesis is usually very efficient but only works for special grammars or applications; enumerative synthesis is very generally applicable but limited in scalability. In this paper, we propose a cooperative synthesis technique for SyGuS problems with the conditional linear integer arithmetic (CLIA)more »background theory, as a novel integration of the two approaches, combining the best of the two worlds. The technique exploits several novel divide-and-conquer strategies to split a large synthesis problem to smaller subproblems. The subproblems are solved separately and their solutions are combined to form a final solution. The technique integrates two synthesis engines: a pure deductive component that can efficiently solve some problems, and a height-based enumeration algorithm that can handle arbitrary grammar. We implemented the cooperative synthesis technique, and evaluated it on a wide range of benchmarks. Experiments showed that our technique can solve many challenging synthesis problems not possible before, and tends to be more scalable than state-of-the-art synthesis algorithms.« less
  2. Enhancing battery energy storage capability and reducing the cost per average energy capacity is urgent to satisfy the increasing energy demand in modern society. The lithium-sulfur (Li-S) battery is especially attractive because of its high theoretical specific energy (around 2600 W h kg-1), low cost, and low toxicity.1 Despite these advantages, the practical utilization of lithium-sulfur (Li-S) batteries to date has been hindered by a series of obstacles, including low active material loading, shuttle effects, and sluggish sulfur conversion kinetics.2 The traditional 2D planer thick electrode is considered as a general approach to enhance the mass loading of the Li-Smore »battery.3 However, the longer diffusion length of lithium ions, which resulted in high tortuosity in the compact stacking thick electrode, decreases the penetration ability of the electrolyte into the entire cathode.4 Although an effort to induce catalysts in the cathode was made to promote sulfur conversion kinetic conditions, catalysts based on transition metals suffered from the low electronic conductivity, and some elements (i.e.: Co, Mn) may even absorb and restrict polysulfides for further reaction. 5 To mitigate the issues listed above, herein we propose a novel sulfur cathode design strategy enabled by additive manufacturing and oxidative chemical vapor deposition (oCVD). 6,7 Specifically, the cathode is designed to have a hierarchal hollow structure via a stereolithography technique to increase sulfur usage. Microchannels are constructed on the tailored sulfur cathode to further fortify the wettability of the electrolyte. The as-printed cathode is then sintered at 700 °C in an N2 atmosphere in order to generate a carbon skeleton (i.e.: carbonization of resin) with intrinsic carbon defects. The intrinsic carbon defects are expected to create favorable sulfur conversion conditions with sufficient electronic conductivity. In this study, the oCVD technique is leveraged to produce a conformal coating layer to eliminate shuttle effects. Identified by scanning electron microscopy and energy-dispersive X-ray spectroscopy mapping characterizations, the oCVD PEDOT is not only covered on the surface of the cathode but also on the inner surface of the microchannels. High-resolution x-ray photoelectron spectroscopy analyses (C 1s and S 2p orbitals) between pristine and modified samples demonstrate that a high concentration of the defects has been produced on the sulfur matrix after sintering and posttreatment. In-operando XRD diffractograms show that the Li2S is generated in the oCVD PEDOT-coated sample during the charge and discharge process even with a high current density, confirming an eminent sulfur conversion kinetic condition. In addition, ICP-OES results of lithium metal anode at different states of charge (SoC) verify that the shuttle effects are excellently restricted by oCVD PEDOT. Overall, the high mass loading (> 5 mg cm-2) with an elevated sulfur utilization ratio, accelerated reaction kinetics and stabilized electrochemical process have been achieved on the sulfur cathode by implementing this innovative cathode design strategy. The results of this study demonstrate significant promises of employing pure sulfur powder with high electrochemical performance and suggest a pathway to the higher energy and power density battery. References: 1 Chen, Y. Adv Mater 33, e2003666. 2 Bhargav, A. Joule 4, 285-291. 3 Liu, S. Nano Energy 63, 103894. 4 Chu, T. Carbon Energy 3. 5 Li, Y. Matter 4, 1142-1188. 6 John P. Lock. Macromolecules 39, 4 (2006). 7 Zekoll, S. Energy & Environmental Science 11, 185-201.« less
  3. Edge computing is an emerging paradigm whose goal is to boost with cloud resources available at the edge the computational capability of otherwise weak devices. This paradigm is mostly attractive to reduce user perceived latency. A central mechanism in edge computing is cyber-foraging, i.e., the search and delegation to capable edge cloud processes of tasks too complex, time consuming or resource intensive to be running on user devices or low-latency demanding to be running remotely, as a form of edge function. An edge function is any network or device-specific process that may be run on an edge process instead. Despitemore »the recent interest for this technology from industry and academia, cyber-foraging techniques and protocols have yet to be standardized. In this paper, we leverage decomposition theory to propose an architecture providing insights in the design and implementation of protocols for cyber-foraging of multiple edge functions. In contrast with several existing solutions, we argue that the (distributed) cyber-foraging orchestration should be policy-based and not an ad-hoc solution, i.e., either a pure edge cloud burden or a device decision. To this end, via simulations, we show how our approach can be used by edge computing providers and application programmers to compare and evaluate different alternative cyber-foraging solutions. Our decomposition-based approach has general applicability to other network utility maximization problems, even outside the edge computing domain.« less
  4. Relational database management systems (RDBMS) have limited iterative processing support. Recursive queries were added to ANSI SQL, however, their semantics do not allow aggregation functions, which disqualifies their use for several applications, such as PageRank and shortest path computations. Recently, another SQL extension, iterative Common Table Expressions (CTEs), is proposed to enable users to perform general iterative computations on RDBMSs.In this work 1 , we demonstrate how iterative CTEs can be efficiently incorporated into a production RDBMS without major intrusion to the system. We have prototyped our approach on Futurewei's MPPDB, a shared nothing relational parallel database engine. The implementationmore »is based on a functional rewrite that translates iterative CTEs to other existing SQL operators. Thus, query plans of iterative CTEs can be optimized and executed by the engine with minimal modification to the code base. We have also applied several optimizations specifically for iterative CTEs to i) minimize data movement, ii) reuse results that remain constant and iii) push down predicates to avoid unnecessary data processing. We verified our implementation through extensive experimental evaluation using real world datasets and queries. The results show the feasibility of the rewrite approach and the effectiveness of the optimizations, which improve performance by an order of magnitude in some cases.« less
  5. SUMMARY We introduce a new finite-element (FE) based computational framework to solve forward and inverse elastic deformation problems for earthquake faulting via the adjoint method. Based on two advanced computational libraries, FEniCS and hIPPYlib for the forward and inverse problems, respectively, this framework is flexible, transparent and easily extensible. We represent a fault discontinuity through a mixed FE elasticity formulation, which approximates the stress with higher order accuracy and exposes the prescribed slip explicitly in the variational form without using conventional split node and decomposition discrete approaches. This also allows the first order optimality condition, that is the vanishing ofmore »the gradient, to be expressed in continuous form, which leads to consistent discretizations of all field variables, including the slip. We show comparisons with the standard, pure displacement formulation and a model containing an in-plane mode II crack, whose slip is prescribed via the split node technique. We demonstrate the potential of this new computational framework by performing a linear coseismic slip inversion through adjoint-based optimization methods, without requiring computation of elastic Green’s functions. Specifically, we consider a penalized least squares formulation, which in a Bayesian setting—under the assumption of Gaussian noise and prior—reflects the negative log of the posterior distribution. The comparison of the inversion results with a standard, linear inverse theory approach based on Okada’s solutions shows analogous results. Preliminary uncertainties are estimated via eigenvalue analysis of the Hessian of the penalized least squares objective function. Our implementation is fully open-source and Jupyter notebooks to reproduce our results are provided. The extension to a fully Bayesian framework for detailed uncertainty quantification and non-linear inversions, including for heterogeneous media earthquake problems, will be analysed in a forthcoming paper.« less