In this paper, we present a new DBMS performance benchmark that cansimulateuser exploration with any specified dashboard design made of standard visualization and interaction components. The distinguishing feature of our SImulation-BAsed (or SIMBA) benchmark is its ability tomodel user analysis goalsas a set of SQL queries to be generated through a valid sequence of user interactions, as well asmeasure the completion of analysis goalsby testing for equivalence between the user's previous queries and their goal queries. In this way, the SIMBA benchmark can simulate how an analyst opportunistically searches for interesting insights at the beginning of an exploration session and eventually hones in on specific goals towards the end. To demonstrate the versatility of the SIMBA benchmark, we use it to test the performance of four DBMSs with six different dashboard specifications and compare our results with IDEBench. Our results show how goal-driven simulation can reveal gaps in DBMS performance missed by existing benchmarking methods and across a range of data exploration scenarios.
more »
« less
This content will become publicly available on June 10, 2026
Polygon: Symbolic Reasoning for SQL using Conflict-Driven Under-Approximation Search
We present a novel symbolic reasoning engine for SQL which can efficiently generate an inputIfornqueriesP1, ⋯,Pn, such that their outputs onIsatisfy a given property (expressed in SMT). This is useful in different contexts, such as disproving equivalence of two SQL queries and disambiguating a set of queries. Our first idea is to reason about an under-approximation of eachPi— that is, a subset ofPi’s input-output behaviors. While it makes our approach both semantics-aware and lightweight, this idea alone is incomplete (as a fixed under-approximation might miss some behaviors of interest). Therefore, our second idea is to perform search over an expressive family of under-approximations (which collectively cover all program behaviors of interest), thereby making our approach complete. We have implemented these ideas in a tool, Polygon, and evaluated it on over 30,000 benchmarks across two tasks (namely, SQL equivalence refutation and query disambiguation). Our evaluation results show that Polygon significantly outperforms all prior techniques.
more »
« less
- PAR ID:
- 10630290
- Publisher / Repository:
- Association for Computing Machinery
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 9
- Issue:
- PLDI
- ISSN:
- 2475-1421
- Page Range / eLocation ID:
- 1315 to 1340
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We propose a new synthesis algorithm that canefficientlysearch programs withlocalvariables (e.g., those introduced by lambdas). Prior bottom-up synthesis algorithms are not able to evaluate programs withfree local variables, and therefore cannot effectively reduce the search space of such programs (e.g., using standard observational equivalence reduction techniques), making synthesis slow. Our algorithm can reduce the space of programs with local variables. The key idea, dubbedlifted interpretation, is to lift up the program interpretation process, from evaluating one program at a time to simultaneously evaluating all programs from a grammar. Lifted interpretation provides a mechanism to systematically enumerate all binding contexts for local variables, thereby enabling us to evaluate and reduce the space of programs with local variables. Our ideas are instantiated in the domain of web automation. The resulting tool,Arborist, can automate a significantly broader range of challenging tasks more efficiently than state-of-the-art techniques includingWebRobotand Helena.more » « less
-
We present an enumerative program synthesis framework calledcomponent-based refactoringthat can refactor “direct” style code that does not use library components into equivalent “combinator” style code that does use library components. This framework introduces a sound but incomplete technique to check the equivalence of direct code and combinator code calledequivalence by canonicalizationthat does not rely on input-output examples or logical specifications. Moreover, our approach can repurpose existing compiler optimizations, leveraging decades of research from the programming languages community. We instantiated our new synthesis framework in two contexts: (i) higher-order functional combinators such asmapandfilterin the staticallytyped functional programming language Elm and (ii) high-performance numerical computing combinators provided by the NumPy library for Python. We implemented both instantiations in a tool calledCobblerand evaluated it on thousands of real programs to test the performance of the component-based refactoring framework in terms of execution time and output quality. Our work offers evidence that synthesis-backed refactoring can apply across a range of domains without specification beyond the input program.more » « less
-
This Letter presents a novel, to the best of our knowledge, method to calibrate multi-focus microscopic structured-light three-dimensional (3D) imaging systems with an electrically adjustable camera focal length. We first leverage the conventional method to calibrate the system with a reference focal lengthf0. Then we calibrate the system with other discrete focal lengthsfiby determining virtual features on a reconstructed white plane usingf0. Finally, we fit the polynomial function model using the discrete calibration results forfi. Experimental results demonstrate that our proposed method can calibrate the system consistently and accurately.more » « less
-
Lightweight syntactic analysis tools like Semgrep and Comby leverage the tree structure of code, making them more expressive than string and regex search. Unlike traditional language frameworks (e.g., ESLint) that analyze codebases via explicit syntax tree manipulations, these tools use query languages that closely resemble the source language. However, state-of-the-art matching techniques for these tools require queries to be complete and parsable snippets, which makes in-progress query specifications useless. We propose a new search architecture that relies only on tokenizing (not parsing) a query. We introduce a novel language and matching algorithm to support tree-aware wildcards on this architecture by building on tree automata. We also presentstsearch, a syntactic search tool leveraging our approach. In contrast to past work, our approach supports syntactic searcheven for previously unparsable queries.We show empirically that stsea rch can support all tokenizable queries, while still providing results comparable to Semgrep for existing queries. Our work offers evidence that lightweight syntactic code search can accept in-progress specifications, potentially improving support for interactive settings. CCS Concepts: •Software and its engineering→Formal language definitions;Software maintenance tools;•Information systems→Query representation;•Theory of computation→ Tree languages.more » « less
An official website of the United States government
