skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing
In database and large-scale data analytics, recursive aggregate processing plays an important role, which is generally implemented under a framework of incremental compuping and executed synchronously and/or asynchronously. We identify three barriers in existing recursive aggregate data processing. First, the processing scope is largely limited to monotonic programs. Second, checking on conditions for monotonicity and correctness for async processing is sophisticated and manually done. Third, execution engines may be suboptimal due to separation of sync and async execution.In this paper, we lay an analytical foundation for conditions to check if a recursive aggregate program that is mono-tonic or even non-monotonic can be executed incrementally and asynchronously with its correct result. We design and implement a condition verification tool that can automatically check if a given program satisfies the conditions. We further propose a unified sync-async engine to execute these pro-grams for high performance. To integrate all these effective methods together, we have developed a distributed Datalog system, called PowerLog. Our evaluation shows that PowerLog can outperform three representative Datalog systems on both monotonic and non-monotonic recursive programs.  more » « less
Award ID(s):
2005884
PAR ID:
10227060
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of ACM SIGMOD Conference on Management of Data
Page Range / eLocation ID:
2439-2454
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In database and large-scale data analytics, recursive aggregate processing plays an important role, which is generally implemented under a framework of incremental computing and executed synchronously and/or asynchronously. We identify three barriers in existing recursive aggregate data processing. First, the processing scope is largely limited to monotonic programs. Second, checking on conditions for monotonicity and correctness for async processing is sophisticated and manually done. Third, execution engines may be suboptimal due to separation of sync and async execution. In this paper, we lay an analytical foundation for conditions to check if a recursive aggregate program that is monotonic or even non-monotonic can be executed incrementally and asynchronously with its correct result. We design and implement a condition verification tool that can automatically check if a given program satisfies the conditions. We further propose a unified sync-async engine to execute these programs for high performance. To integrate all these effective methods together, we have developed a distributed Datalog system, called PowerLog. Our evaluation shows that PowerLog can outperform three representative Datalog systems on both monotonic and non-monotonic recursive programs. 
    more » « less
  2. Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this article, we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can use the semi-naïve evaluation algorithm on any datalog program. 
    more » « less
  3. In general, the performance of parallel graph processing is determined by three pairs of critical parameters, namely synchronous or asynchronous execution mode (Sync or Async), Push or Pull communication mechanism (Push or Pull), and Data-driven or Topology-driven traversing scheme (DD or TD), which increases the complexity and sophistication of programming and system implementation of GPU. Existing graph-processing frameworks mainly use a single combination in the entire execution for a given application, but we have observed their variable and suboptimal performance. In this paper, we present SEP-Graph, a highly efficient software framework for graph-processing on GPU. The hybrid execution mode is automatically switched among three pairs of parameters, with an objective to achieve the shortest execution time in each iteration. We also apply a set of optimizations to SEP-Graph, considering the characteristics of graph algorithms and underlying GPU architectures. We show the effectiveness of SEP-Graph based on our intensive and comparative performance evaluation on NVIDIA 1080, P100, and V100 GPUs. Compared with existing and representative GPU graph-processing framework Groute and Gunrock, SEP-Graph can reduce execution time up to 45.8 times and 39.4 times. 
    more » « less
  4. Cormode, Graham; Shekelyan, Michael (Ed.)
    Datalog^∘ is an extension of Datalog, where instead of a program being a collection of union of conjunctive queries over the standard Boolean semiring, a program may now be a collection of sum-product queries over an arbitrary commutative partially ordered pre-semiring. Datalog^∘ is more powerful than Datalog in that its additional algebraic structure alows for supporting recursion with aggregation. At the same time, Datalog^∘ retains the syntactic and semantic simplicity of Datalog: Datalog^∘ has declarative least fixpoint semantics. The least fixpoint can be found via the naïve evaluation algorithm that repeatedly applies the immediate consequence operator until no further change is possible. It was shown in [Mahmoud Abo Khamis et al., 2022] that, when the underlying semiring is p-stable, then the naïve evaluation of any Datalog^∘ program over the semiring converges in a finite number of steps. However, the upper bounds on the rate of convergence were exponential in the number n of ground IDB atoms. This paper establishes polynomial upper bounds on the convergence rate of the naïve algorithm on linear Datalog^∘ programs, which is quite common in practice. In particular, the main result of this paper is that the convergence rate of linear Datalog^∘ programs under any p-stable semiring is O(pn³). Furthermore, we show a matching lower bound by constructing a p-stable semiring and a linear Datalog^∘ program that requires Ω(pn³) iterations for the naïve iteration algorithm to converge. Next, we study the convergence rate in terms of the number of elements in the semiring for linear Datalog^∘ programs. When L is the number of elements, the convergence rate is bounded by O(pn log L). This significantly improves the convergence rate for small L. We show a nearly matching lower bound as well. 
    more » « less
  5. Abstract A hyperproperty relates executions of a program and is used to formalize security objectives such as confidentiality, non-interference, privacy, and anonymity. Formally, a hyperproperty is a collection of allowable sets of executions. A program violates a hyperproperty if the set of its executions is not in the collection specified by the hyperproperty. The logicHyperCTL*has been proposed in the literature to formally specify and verify hyperproperties. The problem of checking whether a finite-state program satisfies aHyperCTL*formula is known to be decidable. However, the problem turns out to be undecidable for procedural (recursive) programs. Surprisingly, we show that decidability can be restored if we consider restricted classes of hyperproperties, namely those that relate only those executions of a program which have the same call-stack access pattern. We call such hyperproperties,stack-aware hyperproperties.Our decision procedure can be used as a proof method for establishing security objectives such as noninference for recursive programs, and also for refuting security objectives such as observational determinism. Further, if the call stack size is observable to the attacker, the decision procedure provides exact verification. 
    more » « less