skip to main content


Title: Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing
In database and large-scale data analytics, recursive aggregate processing plays an important role, which is generally implemented under a framework of incremental compuping and executed synchronously and/or asynchronously. We identify three barriers in existing recursive aggregate data processing. First, the processing scope is largely limited to monotonic programs. Second, checking on conditions for monotonicity and correctness for async processing is sophisticated and manually done. Third, execution engines may be suboptimal due to separation of sync and async execution.In this paper, we lay an analytical foundation for conditions to check if a recursive aggregate program that is mono-tonic or even non-monotonic can be executed incrementally and asynchronously with its correct result. We design and implement a condition verification tool that can automatically check if a given program satisfies the conditions. We further propose a unified sync-async engine to execute these pro-grams for high performance. To integrate all these effective methods together, we have developed a distributed Datalog system, called PowerLog. Our evaluation shows that PowerLog can outperform three representative Datalog systems on both monotonic and non-monotonic recursive programs.  more » « less
Award ID(s):
2005884
NSF-PAR ID:
10227060
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of ACM SIGMOD Conference on Management of Data
Page Range / eLocation ID:
2439-2454
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In database and large-scale data analytics, recursive aggregate processing plays an important role, which is generally implemented under a framework of incremental computing and executed synchronously and/or asynchronously. We identify three barriers in existing recursive aggregate data processing. First, the processing scope is largely limited to monotonic programs. Second, checking on conditions for monotonicity and correctness for async processing is sophisticated and manually done. Third, execution engines may be suboptimal due to separation of sync and async execution. In this paper, we lay an analytical foundation for conditions to check if a recursive aggregate program that is monotonic or even non-monotonic can be executed incrementally and asynchronously with its correct result. We design and implement a condition verification tool that can automatically check if a given program satisfies the conditions. We further propose a unified sync-async engine to execute these programs for high performance. To integrate all these effective methods together, we have developed a distributed Datalog system, called PowerLog. Our evaluation shows that PowerLog can outperform three representative Datalog systems on both monotonic and non-monotonic recursive programs. 
    more » « less
  2. In general, the performance of parallel graph processing is determined by three pairs of critical parameters, namely synchronous or asynchronous execution mode (Sync or Async), Push or Pull communication mechanism (Push or Pull), and Data-driven or Topology-driven traversing scheme (DD or TD), which increases the complexity and sophistication of programming and system implementation of GPU. Existing graph-processing frameworks mainly use a single combination in the entire execution for a given application, but we have observed their variable and suboptimal performance. In this paper, we present SEP-Graph, a highly efficient software framework for graph-processing on GPU. The hybrid execution mode is automatically switched among three pairs of parameters, with an objective to achieve the shortest execution time in each iteration. We also apply a set of optimizations to SEP-Graph, considering the characteristics of graph algorithms and underlying GPU architectures. We show the effectiveness of SEP-Graph based on our intensive and comparative performance evaluation on NVIDIA 1080, P100, and V100 GPUs. Compared with existing and representative GPU graph-processing framework Groute and Gunrock, SEP-Graph can reduce execution time up to 45.8 times and 39.4 times. 
    more » « less
  3. null (Ed.)
    Concurrent programs are notoriously hard to write correctly, as scheduling nondeterminism introduces subtle errors that are both hard to detect and to reproduce. The most common concurrency errors are (data) races, which occur when memory-conflicting actions are executed concurrently. Consequently, considerable effort has been made towards developing efficient techniques for race detection. The most common approach is dynamic race prediction: given an observed, race-free trace σ of a concurrent program, the task is to decide whether events of σ can be correctly reordered to a trace σ * that witnesses a race hidden in σ. In this work we introduce the notion of sync(hronization)-preserving races. A sync-preserving race occurs in σ when there is a witness σ * in which synchronization operations (e.g., acquisition and release of locks) appear in the same order as in σ. This is a broad definition that strictly subsumes the famous notion of happens-before races. Our main results are as follows. First, we develop a sound and complete algorithm for predicting sync-preserving races. For moderate values of parameters like the number of threads, the algorithm runs in Õ( N ) time and space, where N is the length of the trace σ. Second, we show that the problem has a Ω( N /log 2 N ) space lower bound, and thus our algorithm is essentially time and space optimal. Third, we show that predicting races with even just a single reversal of two sync operations is NP-complete and even W1-hard when parameterized by the number of threads. Thus, sync-preservation characterizes exactly the tractability boundary of race prediction, and our algorithm is nearly optimal for the tractable side. Our experiments show that our algorithm is fast in practice, while sync-preservation characterizes races often missed by state-of-the-art methods. 
    more » « less
  4. Memory safety invariants extracted from a program can help defend and detect against both software and hardware memory violations. For instance, by allowing only specific instructions to access certain memory locations, system can detect out-of-bound or illegal pointer dereferences that lead to correctness and security issues. In this paper, we propose CPU abstractions, called, to specify and check program invariants to provide defense mechanism against both software and hardware memory violations at runtime. ensures that the invariants must be satisfied at every memory accesses. We present a fast invariant address translation and retrieval scheme using a specialized cache. It stores and checks invariants related to global, stack and heap objects. The invariant checks can be performed synchronously or asynchronously. uses synchronous checking for high security-critical programs, while others are protected by asynchronous checking. A fast exception is proposed to alert any violations as soon as possible in order to close the gap for transient attacks. Our evaluation shows that can detect both software and hardware, spatial and temporal memory violations. incurs 53% overhead when checking synchronously, or 15% overhead when checking asynchronously. 
    more » « less
  5. Writing concurrent programs is notoriously hard due to scheduling non-determinism. The most common concurrency bugs are data races, which are accesses to a shared resource that can be executed concurrently. Dynamic data-race prediction is the most standard technique for detecting data races: given an observed, data-race-free trace t, the task is to determine whether t can be reordered to a trace t* that exposes a data-race. Although the problem has received significant practical attention for over three decades, its complexity has remained elusive. In this work, we address this lacuna, identifying sources of intractability and conditions under which the problem is efficiently solvable. Given a trace t of size n over k threads, our main results are as follows. First, we establish a general O(k · n2·(k-1) upper-bound, as well as an O(nk) upper-bound when certain parameters of t are constant. In addition, we show that the problem is NP-hard and even W[1]-hard parameterized by k, and thus unlikely to be fixed-parameter tractable. Second, we study the problem over acyclic communication topologies, such as server-clients hierarchies. We establish an O(k2 · d · n2 · log n) upper-bound, where d is the number of shared variables accessed in t. In addition, we show that even for traces with k = 2 threads, the problem has no O(n2-ϵ) algorithm under the Orthogonal Vectors conjecture. Since any trace with 2 threads defines an acyclic topology, our upper-bound for this case is optimal up to polynomial improvements for up to moderate values of k and d. Finally, motivated by existing heuristics, we study a distance-bounded version of the problem, where the task is to expose a data race by a witness trace that is similar to t. We develop an algorithm that works in O(n) time when certain parameters of t are constant. 
    more » « less