NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficient Algorithms for Cardinality Estimation and Conjunctive Query Evaluation With Simple Degree Constraints

https://doi.org/10.1145/3725233

Im, Sungjin; Moseley, Benjamin; Ngo, Hung; Pruhs, Kirk (June 2025, Proceedings of the ACM on Management of Data)

Cardinality estimation and conjunctive query evaluation are two of the most fundamental problems in database query processing. Recent work proposed, studied, and implemented a robust and practical information-theoretic cardinality estimation framework. In this framework, the estimator is the cardinality upper bound of a conjunctive query subject to ''degree-constraints'', which model a rich set of input data statistics. For general degree constraints, computing this bound is computationally hard. Researchers have naturally sought efficiently computable relaxed upper bounds that are as tight as possible. The polymatroid bound is the tightest among those relaxed upper bounds. While it is an open question whether the polymatroid bound can be computed in polynomial-time in general, it is known to be computable in polynomial-time for some classes of degree constraints. Our focus is on a common class of degree constraints called simple degree constraints. Researchers had not previously determined how to compute the polymatroid bound in polynomial time for this class of constraints. Our first main result is a polynomial time algorithm to compute the polymatroid bound given simple degree constraints. Our second main result is a polynomial-time algorithm to compute a ''proof sequence'' establishing this bound. This proof sequence can then be incorporated in the PANDA-framework to give a faster algorithm to evaluate a conjunctive query. In addition, we show computational limitations to extending our results to broader classes of degree constraints. Finally, our technique leads naturally to a new relaxed upper bound called theflow bound,which is computationally tractable.
more » « less
Full Text Available
Online Scheduling via Gradient Descent for Weighted Flow Time Minimization

https://doi.org/10.1137/1.9781611978322.128

Chen, Qingyun; Im, Sungjin; Petety, Aditya (January 2025, Society for Industrial and Applied Mathematics (SODA))

Full Text Available
Binary Search with Distributional Predictions

Dinitz, Michael; Im, Sungjin; Lavastida, Thomas; Moseley, Benjamin; Niaparast, Aidin; Vassilvitskii, Sergei (December 2024, Open Review (NeurIPS))

Full Text Available
Strategic Facility Location via Predictions

Chen, Qingyun; Im, Sungjin; Gravin, Nick (December 2024, WINE 2024: Conference on Web and Internet Economics)

Full Text Available
Polynomial Time Convergence of the Iterative Evaluation of Datalogo Programs

https://doi.org/10.1145/3695839

Im, Sungjin; Moseley, Benjamin; Ngo, Hung Q; Pruhs, Kirk (November 2024, Proceedings of the ACM on Management of Data)

Datalog^ois an extension of Datalog that allows for aggregation and recursion over an arbitrary commutative semiring. Like Datalog, Datalogo programs can be evaluated via the natural iterative algorithm until a fixed point is reached. However unlike Datalog, the natural iterative evaluation of some Datalogo programs over some semirings may not converge. It is known that the commutative semirings for which the iterative evaluation of Datalogo programs is guaranteed to converge are exactly those semirings that are stable. Previously, the best known upper bound on the number of iterations until convergence over p-stable semirings is ∑i=1 ^n (p+2)ⁱ= Θ(pⁿ) steps, where n is (essentially) the output size. We establish that, in fact, the natural iterative evaluation of a Datalogo program over a p-stable semiring converges within a polynomial number of iterations. In particular our upper bound is O(σ p n²( n²lg Λ + lg σ)) where σ is the number of elements in the semiring present in either the input databases or the Datalogo program, and λ is the maximum number of terms in any product in the Datalogo program.
more » « less
Full Text Available
Online Load and Graph Balancing for Random Order Inputs

https://doi.org/10.1145/3626183.3659983

Im, Sungjin; Kumar, Ravi; Li, Shi; Petety, Aditya; Purohit, Manish (June 2024, ACM)

Full Text Available
Data Exchange Markets via Utility Balancing

https://doi.org/10.1145/3589334.3645364

Bhaskara, Aditya; Gollapudi, Sreenivas; Im, Sungjin; Kollias, Kostas; Munagala, Kamesh; Sankar, Govind S (May 2024, ACM)

Full Text Available
Controlling Tail Risk in Online Ski-Rental

https://doi.org/10.1137/1.9781611977912.147

Dinitz, Michael; Im, Sungjin; Lavastida, Thomas; Moseley, Benjamin; Vassilvitskii, Sergei (January 2024, ACM-SIAM Symposium on Discrete Algorithms)

Full Text Available
On the Convergence Rate of Linear Datalog ^∘ over Stable Semirings

https://doi.org/10.4230/LIPIcs.ICDT.2024.11

Im, Sungjin; Moseley, Benjamin; Ngo, Hung; Pruhs, Kirk (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Cormode, Graham; Shekelyan, Michael (Ed.)
Datalog^∘ is an extension of Datalog, where instead of a program being a collection of union of conjunctive queries over the standard Boolean semiring, a program may now be a collection of sum-product queries over an arbitrary commutative partially ordered pre-semiring. Datalog^∘ is more powerful than Datalog in that its additional algebraic structure alows for supporting recursion with aggregation. At the same time, Datalog^∘ retains the syntactic and semantic simplicity of Datalog: Datalog^∘ has declarative least fixpoint semantics. The least fixpoint can be found via the naïve evaluation algorithm that repeatedly applies the immediate consequence operator until no further change is possible. It was shown in [Mahmoud Abo Khamis et al., 2022] that, when the underlying semiring is p-stable, then the naïve evaluation of any Datalog^∘ program over the semiring converges in a finite number of steps. However, the upper bounds on the rate of convergence were exponential in the number n of ground IDB atoms. This paper establishes polynomial upper bounds on the convergence rate of the naïve algorithm on linear Datalog^∘ programs, which is quite common in practice. In particular, the main result of this paper is that the convergence rate of linear Datalog^∘ programs under any p-stable semiring is O(pn³). Furthermore, we show a matching lower bound by constructing a p-stable semiring and a linear Datalog^∘ program that requires Ω(pn³) iterations for the naïve iteration algorithm to converge. Next, we study the convergence rate in terms of the number of elements in the semiring for linear Datalog^∘ programs. When L is the number of elements, the convergence rate is bounded by O(pn log L). This significantly improves the convergence rate for small L. We show a nearly matching lower bound as well.
more » « less
Full Text Available
Non-clairvoyant Scheduling with Predictions

https://doi.org/10.1145/3593969

Im, Sungjin; Kumar, Ravi; Qaem, Mahshid Montazer; Purohit, Manish (December 2023, ACM Transactions on Parallel Computing)

In the single-machinenon-clairvoyantscheduling problem, the goal is to minimize the total completion time of jobs whose processing times areunknowna priori. We revisit this well-studied problem and consider the question of how to effectively use (possibly erroneous) predictions of the processing times. We study this question from ground zero by first asking what constitutes a good prediction; we then propose a new measure to gauge prediction quality and design scheduling algorithms with strong guarantees under this measure. Our approach to derive a prediction error measure based on natural desiderata could find applications for other online problems.
more » « less
Full Text Available

« Prev Next »

Search for: All records