Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this article, we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can use the seminaïve evaluation algorithm on any datalog program.
This content will become publicly available on January 1, 2025
 NSFPAR ID:
 10519093
 Editor(s):
 Cormode, Graham; Shekelyan, Michael
 Publisher / Repository:
 Schloss Dagstuhl – LeibnizZentrum für Informatik
 Date Published:
 Volume:
 290
 ISSN:
 18688969
 ISBN:
 9783959773126
 Page Range / eLocation ID:
 290290
 Subject(s) / Keyword(s):
 Datalog convergence rate semiring Theory of computation → Database query languages (principles)
 Format(s):
 Medium: X Size: 20 pages; 843415 bytes Other: application/pdf
 Size(s):
 20 pages 843415 bytes
 Right(s):
 Creative Commons Attribution 4.0 International license; info:eurepo/semantics/openAccess
 Sponsoring Org:
 National Science Foundation
More Like this


We study the classic set cover problem from the perspective of sublinear algorithms. Given access to a collection of m sets over n elements in the query model, we show that sublinear algorithms derived from existing techniques have almost tight query complexities. On one hand, first we show an adaptation of the streaming algorithm presented in [17] to the sublinear query model, that returns an αapproximate cover using Õ(m(n/k)^1/(α–1) + nk) queries to the input, where k denotes the value of a minimum set cover. We then complement this upper bound by proving that for lower values of k, the required number of queries is , even for estimating the optimal cover size. Moreover, we prove that even checking whether a given collection of sets covers all the elements would require Ω(nk) queries. These two lower bounds provide strong evidence that the upper bound is almost tight for certain values of the parameter k. On the other hand, we show that this bound is not optimal for larger values of the parameter k, as there exists a (1 + ε)approximation algorithm with Õ(mn/kε^2) queries. We show that this bound is essentially tight for sufficiently small constant ε, by establishing a lower bound of query complexity. Our lowerbound results follow by carefully designing two distributions of instances that are hard to distinguish. In particular, our first lower bound involves a probabilistic construction of a certain set system with a minimum set cover of size αk, with the key property that a small number of “almost uniformly distributed” modifications can reduce the minimum set cover size down to k. Thus, these modifications are not detectable unless a large number of queries are asked. We believe that our probabilistic construction technique might find applications to lower bounds for other combinatorial optimization problems.more » « less

Datalog is a declarative programming language that has gained popularity in various domains due to its simplicity, expressiveness, and efficiency. But pure Datalog is limited to monotone queries, and cannot be used in most practical applications. For that reason, newer systems are relaxing the language by allowing nonmonotone queries to be freely combined with recursion. But by departing from the elegant fixpoint semantics of pure datalog, these systems often result in inefficient query execution, for example they perform redundant computations, or use redundant storage. In this paper, we propose Temporel, a system that allows recursion to be freely combined with nonmonotone operators. Temporel optimizes the program by compiling it into a novel intermediate representation that we call TempoDL. Our experimental results show that our system outperforms a stateoftheart Datalog engine as well as a vectorized and a compiled inmemory database system for a wide range of applications from machine learning to graph processing.

The theory of integral quadratic constraints (IQCs) allows the certification of exponential convergence of interconnected systems containing nonlinear or uncertain elements. In this work, we adapt the IQC theory to study firstorder methods for smooth and stronglymonotone games and show how to design tailored quadratic constraints to get tight upper bounds of convergence rates. Using this framework, we recover the existing bound for the gradient method~(GD), derive sharper bounds for the proximal point method~(PPM) and optimistic gradient method~(OG), and provide for the first time a global convergence rate for the negative momentum method~(NM) with an iteration complexity O(κ1.5), which matches its known lower bound. In addition, for timevarying systems, we prove that the gradient method with optimal step size achieves the fastest provable worstcase convergence rate with quadratic Lyapunov functions. Finally, we further extend our analysis to stochastic games and study the impact of multiplicative noise on different algorithms. We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch (in contrast with the stochastic stronglyconvex optimization setting, where such acceleration has been demonstrated). However, we exhibit an algorithm which achieves acceleration with two gradient queries per batch.more » « less

TaShma, Amnon (Ed.)In a recent work, Gryaznov, Pudlák and Talebanfard (CCC '22) introduced a linear variant of readonce branching programs, with motivations from circuit and proof complexity. Such a readonce linear branching program is a branching program where each node is allowed to make 𝔽₂linear queries, and is readonce in the sense that the queries on each path is linearly independent. As their main result, they constructed an explicit function with averagecase complexity 2^{n/3o(n)} against a slightly restricted model, which they call strongly readonce linear branching programs. The main tool in their lower bound result is a new type of extractor, called directional affine extractors, that they introduced. Our main result is an explicit function with 2^{no(n)} averagecase complexity against the strongly readonce linear branching program model, which is almost optimal. This result is based on a new connection from this problem to sumset extractors, which is a randomness extractor model introduced by Chattopadhyay and Li (STOC '16) as a generalization of many other wellstudied models including twosource extractors, affine extractors and smallspace extractors. With this new connection, our lower bound naturally follows from a recent construction of sumset extractors by Chattopadhyay and Liao (STOC '22). In addition, we show that directional affine extractors imply sumset extractors in a restricted setting. We observe that such restricted sumset sources are enough to derive lower bounds, and obtain an arguably more modular proof of the lower bound by Gryaznov, Pudlák and Talebanfard. We also initiate a study of pseudorandomness against linear branching programs. Our main result here is a hitting set generator construction against regular linear branching programs with constant width. We derive this result based on a connection to Kakeya sets over finite fields.more » « less