We present a novel symbolic reasoning engine for SQL which can efficiently generate an inputIfornqueriesP1, ⋯,Pn, such that their outputs onIsatisfy a given property (expressed in SMT). This is useful in different contexts, such as disproving equivalence of two SQL queries and disambiguating a set of queries. Our first idea is to reason about an under-approximation of eachPi— that is, a subset ofPi’s input-output behaviors. While it makes our approach both semantics-aware and lightweight, this idea alone is incomplete (as a fixed under-approximation might miss some behaviors of interest). Therefore, our second idea is to perform search over an expressive family of under-approximations (which collectively cover all program behaviors of interest), thereby making our approach complete. We have implemented these ideas in a tool, Polygon, and evaluated it on over 30,000 benchmarks across two tasks (namely, SQL equivalence refutation and query disambiguation). Our evaluation results show that Polygon significantly outperforms all prior techniques.
more »
« less
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
Analyzing unstructured data has been a persistent challenge in data processing. Recent proposals offer declarative frameworks for LLM-powered processing of unstructured data, but they typically execute user-specified operations as-is in a single LLM call—focusing on cost rather than accuracy. This is problematic for complex tasks, where even well-prompted LLMs can miss relevant information. For instance, reliably extractingallinstances of a specific clause from legal documents often requires decomposing the task, the data, or both. We present DocETL, a system that optimizes complex document processing pipelines, while accounting for LLM shortcomings. DocETL offers a declarative interface for users to deine such pipelines and uses an agent-based approach to automatically optimize them, leveraging novel agent-based rewrites (that we callrewrite directives), as well as an optimization and evaluation framework. We introduce(i)logical rewriting of pipelines, tailored for LLM-based tasks,(ii)an agent-guided plan evaluation mechanism, and(iii)an optimization algorithm that efficiently finds promising plans, considering the latencies of LLM execution. Across four real-world document processing tasks, DocETL improves accuracy by 21–80% over strong baselines. DocETL is open-source at docetl.org and, as of March 2025, has over 1.7k GitHub stars across diverse domains.
more »
« less
- Award ID(s):
- 2129008
- PAR ID:
- 10675167
- Publisher / Repository:
- VLDB
- Date Published:
- Journal Name:
- Proceedings of the VLDB Endowment
- Volume:
- 18
- Issue:
- 9
- ISSN:
- 2150-8097
- Page Range / eLocation ID:
- 3035 to 3048
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider the problem of automatic parallelism in high-performance, tensor-based systems. Our focus is onintra-operator parallelismfor inference tasks on a single GPU server or CPU cluster, where each operator is automatically broken op so that it runs on multiple devices. We assert that tensor-based systems should offer a programming abstraction based on anextended Einstein summation notation, which is a fully declarative, mathematical specification for tensor computations. We show that any computation specified in the Einstein summation notation can be re-written into an equivalenttensor-relationalcomputation that facilitates intra-operator parallelism, and this re-write generalizes existing notations of tensor parallelism such as data parallel and model parallel. We consider the algorithmic problem of optimally computing a tensor-relational decomposition of a graph of operations specified in our extended Einstein summation notation.more » « less
-
Large language models (LLMs) often incorporate multiple text chunks in their inputs to provide the necessary contexts. To speed up the prefill of the long LLM inputs, one canpre-compute the KV cache of a text andre-use the KV cache when the context is reused as the prefix of another LLM input. However, the reused text chunks arenotalways the input prefix, which makes precomputed KV caches not directly usable since they ignore the text’scross-attentionwith the preceding texts. Thus, the benefits of reusing KV caches remain largely unrealized. This paper tackles just one challenge: when an LLM input containsmultipletext chunks,how to quickly combine their precomputed KV cachesin order to achieve the same generation quality as the expensive full prefill (i.e.,without reusing KV cache)? This challenge naturally arises in retrieval-augmented generation (RAG) where the input is supplemented with multiple retrieved texts as the context. We presentCacheBlend, a scheme that reuses the pre-computed KV caches, regardless prefix or not, andselectively recomputes the KV values of a small subset of tokensto partially update each reused KV cache. In the meantime, the small extra delay for recomputing some tokens can be pipelined with the retrieval of KV caches within the same job, allowingCacheBlendto store KV caches in slower devices with more storage capacity while retrieving themwithoutincreasing the inference delay. By comparingCacheBlendwith the state-of-the-art KV cache reusing schemes on six open-source LLMs of various sizes and five popular benchmark datasets of different tasks, we show thatCacheBlendreduces time-to-first-token (TTFT) by 2.2–3.3 × and increases the inference throughput by 2.8-5 × from full KV recompute without compromising generation quality. The code is available athttps://github.com/LMCache/LMCache.more » « less
-
Transfer learning is an effective technique for tuning a deep learning model when training data or computational resources are limited. Instead of training a new model from scratch, the parameters of an existing base model are adjusted for the new task. The accuracy of such a fine-tuned model depends on the suitability of the base model chosen. Model search automates the selection of such a base model by evaluating the suitability of candidate models for a specific task. This entails inference with each candidate model on task-specific data. With thousands of models available through model stores, the computational cost of model search is a major bottleneck for efficient transfer learning. In this work, we presentAlsatian, a novel model search system. Based on the observation that many candidate models overlap to a significant extent and following a careful bottleneck analysis, we propose optimization techniques that are applicable to many model search frameworks. These optimizations include: (i) splitting models into individual blocks that can be shared across models, (ii) caching of intermediate inference results and model blocks, and (iii) selecting a beneficial search order for models to maximize sharing of cached results. In our evaluation on state-of-the-art deep learning models from computer vision and natural language processing, we show thatAlsatianoutperforms baselines by up to 14x.more » « less
-
Probabilistic programming languages (PPLs) are an expressive means for creating and reasoning about probabilistic models. Unfortunatelyhybridprobabilistic programs that involve both continuous and discrete structures are not well supported by today’s PPLs. In this paper we develop a new approximate inference algorithm for hybrid probabilistic programs that first discretizes the continuous distributions and then performs discrete inference on the resulting program. The key novelty is a form of discretization that we callbit blasting, which uses a binary representation of numbers such that a domain of discretized points can be succinctly represented as a discrete probabilistic program overpoly Boolean random variables. Surprisingly, we prove that many common continuous distributions can be bit blasted in a manner that incurs no loss of accuracy over an explicit discretization and supports efficient probabilistic inference. We have built a probabilistic programming system for hybrid programs calledHyBit, which employs bit blasting followed by discrete probabilistic inference. We empirically demonstrate the benefits of our approach over existing sampling-based and symbolic inference approachesmore » « less
An official website of the United States government

