skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Convergence of datalog over (Pre-) Semirings
Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this article, we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can use the semi-naïve evaluation algorithm on any datalog program.  more » « less
Award ID(s):
2314527
PAR ID:
10518782
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Journal of the ACM
Volume:
71
Issue:
2
ISSN:
0004-5411
Page Range / eLocation ID:
1 to 55
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cormode, Graham; Shekelyan, Michael (Ed.)
    Datalog^∘ is an extension of Datalog, where instead of a program being a collection of union of conjunctive queries over the standard Boolean semiring, a program may now be a collection of sum-product queries over an arbitrary commutative partially ordered pre-semiring. Datalog^∘ is more powerful than Datalog in that its additional algebraic structure alows for supporting recursion with aggregation. At the same time, Datalog^∘ retains the syntactic and semantic simplicity of Datalog: Datalog^∘ has declarative least fixpoint semantics. The least fixpoint can be found via the naïve evaluation algorithm that repeatedly applies the immediate consequence operator until no further change is possible. It was shown in [Mahmoud Abo Khamis et al., 2022] that, when the underlying semiring is p-stable, then the naïve evaluation of any Datalog^∘ program over the semiring converges in a finite number of steps. However, the upper bounds on the rate of convergence were exponential in the number n of ground IDB atoms. This paper establishes polynomial upper bounds on the convergence rate of the naïve algorithm on linear Datalog^∘ programs, which is quite common in practice. In particular, the main result of this paper is that the convergence rate of linear Datalog^∘ programs under any p-stable semiring is O(pn³). Furthermore, we show a matching lower bound by constructing a p-stable semiring and a linear Datalog^∘ program that requires Ω(pn³) iterations for the naïve iteration algorithm to converge. Next, we study the convergence rate in terms of the number of elements in the semiring for linear Datalog^∘ programs. When L is the number of elements, the convergence rate is bounded by O(pn log L). This significantly improves the convergence rate for small L. We show a nearly matching lower bound as well. 
    more » « less
  2. Datalogois an extension of Datalog that allows for aggregation and recursion over an arbitrary commutative semiring. Like Datalog, Datalogo programs can be evaluated via the natural iterative algorithm until a fixed point is reached. However unlike Datalog, the natural iterative evaluation of some Datalogo programs over some semirings may not converge. It is known that the commutative semirings for which the iterative evaluation of Datalogo programs is guaranteed to converge are exactly those semirings that are stable. Previously, the best known upper bound on the number of iterations until convergence over p-stable semirings is ∑i=1 ^n (p+2)i= Θ(pn) steps, where n is (essentially) the output size. We establish that, in fact, the natural iterative evaluation of a Datalogo program over a p-stable semiring converges within a polynomial number of iterations. In particular our upper bound is O(σ p n2( n2lg Λ + lg σ)) where σ is the number of elements in the semiring present in either the input databases or the Datalogo program, and λ is the maximum number of terms in any product in the Datalogo program. 
    more » « less
  3. Datalog is a declarative programming language that has gained popularity in various domains due to its simplicity, expressiveness, and efficiency. But pure Datalog is limited to monotone queries, and cannot be used in most practical applications. For that reason, newer systems are relaxing the language by allowing non-monotone queries to be freely combined with recursion. But by departing from the elegant fixpoint semantics of pure datalog, these systems often result in inefficient query execution, for example they perform redundant computations, or use redundant storage. In this paper, we propose Temporel, a system that allows recursion to be freely combined with non-monotone operators. Temporel optimizes the program by compiling it into a novel intermediate representation that we call TempoDL. Our experimental results show that our system outperforms a state-of-the-art Datalog engine as well as a vectorized and a compiled in-memory database system for a wide range of applications from machine learning to graph processing. 
    more » « less
  4. null (Ed.)
    This article reviews the development of a family of Datalog-like languages with procedural, forward chaining semantics, providing an alternative to the classical declarative, model-theoretic semantics. These languages also provide a unified formalism that can express important classes of queries including fixpoint, while, and all computable queries. They can also incorporate in a natural fashion updates and nondeterminism, features widely used in data-driven reactive systems such as active databases, production systems, and data-driven workflows. 
    more » « less
  5. Modern Datalog engines (e.g., LogicBlox, Soufflé, ddlog) enable their users to write declarative queries which com- pute recursive deductions over extensional facts, leaving high-performance operationalization (query planning, semi- naïve evaluation, and parallelization) to the engine. Such engines form the backbone of modern high-throughput ap- plications in static analysis, network monitoring, and social- media mining. In this paper, we present a methodology for implementing a modern in-memory Datalog engine on data center GPUs, allowing us to achieve significant (up to 45×) gains compared to Soufflé (a modern CPU-based en- gine) on context-sensitive points-to analysis of PostgreSQL. We present GPUlog, a Datalog engine backend that imple- ments iterated relational algebra kernels over a novel range- indexed data structure we call the hash-indexed sorted ar- ray (HISA). HISA combines the algorithmic benefits of in- cremental range-indexed relations with the raw computa- tion throughput of operations over dense data structures. Our experiments show that GPUlog is significantly faster than CPU-based Datalog engines while achieving a favorable memory footprint compared to contemporary GPU-based joins. 
    more » « less