In this paper, we present a new DBMS performance benchmark that cansimulateuser exploration with any specified dashboard design made of standard visualization and interaction components. The distinguishing feature of our SImulation-BAsed (or SIMBA) benchmark is its ability tomodel user analysis goalsas a set of SQL queries to be generated through a valid sequence of user interactions, as well asmeasure the completion of analysis goalsby testing for equivalence between the user's previous queries and their goal queries. In this way, the SIMBA benchmark can simulate how an analyst opportunistically searches for interesting insights at the beginning of an exploration session and eventually hones in on specific goals towards the end. To demonstrate the versatility of the SIMBA benchmark, we use it to test the performance of four DBMSs with six different dashboard specifications and compare our results with IDEBench. Our results show how goal-driven simulation can reveal gaps in DBMS performance missed by existing benchmarking methods and across a range of data exploration scenarios.
more »
« less
Polygon: Symbolic Reasoning for SQL using Conflict-Driven Under-Approximation Search
We present a novel symbolic reasoning engine for SQL which can efficiently generate an inputIfornqueriesP1, ⋯,Pn, such that their outputs onIsatisfy a given property (expressed in SMT). This is useful in different contexts, such as disproving equivalence of two SQL queries and disambiguating a set of queries. Our first idea is to reason about an under-approximation of eachPi— that is, a subset ofPi’s input-output behaviors. While it makes our approach both semantics-aware and lightweight, this idea alone is incomplete (as a fixed under-approximation might miss some behaviors of interest). Therefore, our second idea is to perform search over an expressive family of under-approximations (which collectively cover all program behaviors of interest), thereby making our approach complete. We have implemented these ideas in a tool, Polygon, and evaluated it on over 30,000 benchmarks across two tasks (namely, SQL equivalence refutation and query disambiguation). Our evaluation results show that Polygon significantly outperforms all prior techniques.
more »
« less
- PAR ID:
- 10630290
- Publisher / Repository:
- Association for Computing Machinery
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 9
- Issue:
- PLDI
- ISSN:
- 2475-1421
- Page Range / eLocation ID:
- 1315 to 1340
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This Letter presents a novel, to the best of our knowledge, method to calibrate multi-focus microscopic structured-light three-dimensional (3D) imaging systems with an electrically adjustable camera focal length. We first leverage the conventional method to calibrate the system with a reference focal lengthf0. Then we calibrate the system with other discrete focal lengthsfiby determining virtual features on a reconstructed white plane usingf0. Finally, we fit the polynomial function model using the discrete calibration results forfi. Experimental results demonstrate that our proposed method can calibrate the system consistently and accurately.more » « less
-
Nondeterminism introduced by race conditions and message reorderings makes parallel and distributed programming hard. Nevertheless, promising approaches such as LVars and CRDTs address this problem by introducing a partial order structure on shared state that describes how the state evolves over time.Monotoneprograms that respect the order are deterministic. Datalog-inspired languages incorporate this idea of monotonicity in a first-class way but they are not general-purpose. We would like parallel and distributed languages to be as natural to use as any functional language, without sacrificing expressivity, and with a formal basis of study as appealing as the lambda calculus. This paper presents λ∨, a core language for deterministic parallelism that embodies the ideas above. In λ∨, values may increase over time according to astreaming orderand all computations are monotone with respect to that order. The streaming order coincides with the approximation order found in Scott semantics and so unifies the foundations of functional programming with the foundations of deterministic distributed computation. The resulting lambda calculus has a computationally adequate model rooted in domain theory. It integrates the compositionality and power of abstraction characteristic of functional programming with the declarative nature of Datalog.more » « less
-
Understanding variations in the routes by which wild animals gain and lose water is challenging, and common methods require longitudinal sampling, which can be prohibitive. However, a new approach usesΔ′17OBW(Δ′17O of animal body water), calculated from measurements ofδ′17O andδ′18O in a single sample, as a natural tracer of water flux.Δ′17OBWis promising, but its relationship to organismal variables such as metabolic rate and water intake have not been validated. Here, we continuously measured oxygen influxes and effluxes of captive deer mice (Peromyscus maniculatus), and manipulated their water intake and metabolic rate. We used these oxygen flux data to predictΔ′17OBWfor the mice and compared these model predictions withΔ′17OBWmeasured in blood plasma samples. As expected,Δ′17OBWpositively correlated with drinking water intake and negatively correlated with metabolic rate. All predictedΔ′17OBW(based on measured oxygen fluxes) values differed from measuredΔ′17OBWvalues by <30 per meg (mean absolute difference: 11 ± 9 per meg), suggesting high accuracy for this modelling approach because studies currently report a range of 300 per meg forΔ′17OBWamong mammals, birds and fish.more » « less
-
This paper is about semantic regular expressions (SemREs). This is a concept that was recently proposed by Smore (Chen et al. 2023) in which classical regular expressions are extended with a primitive to query external oracles such as databases and large language models (LLMs). SemREs can be used to identify lines of text containing references to semantic concepts such as cities, celebrities, political entities, etc. The focus in their paper was on automatically synthesizing semantic regular expressions from positive and negative examples. In this paper, we study themembership testing problem. First, we present a two-pass NFA-based algorithm to determine whether a stringwmatches a SemRErinO(|r|2|w|2+ |r| |w|3) time, assuming the oracle responds to each query in unit time. In common situations, where oracle queries are not nested, we show that this procedure runs inO(|r|2|w|2) time. Experiments with a prototype implementation of this algorithm validate our theoretical analysis, and show that the procedure massively outperforms a dynamic programming-based baseline, and incurs a ≈ 2 × overhead over the time needed for interaction with the oracle. Second, we establish connections between SemRE membership testing and the triangle finding problem from graph theory, which suggest that developing algorithms which are simultaneously practical and asymptotically faster might be challenging. Furthermore, algorithms for classical regular expressions primarily aim to optimize their time and memory consumption. In contrast, an important consideration in our setting is to minimize the cost of invoking the oracle. We demonstrate an Ω(|w|2) lower bound on the number of oracle queries necessary to make this determination.more » « less
An official website of the United States government

