skip to main content


Search for: All records

Award ID contains: 1813173

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. null (Ed.)
    Concolic testing combines concrete execution with symbolic execution along the executed path to automatically generate new test inputs that exercise program paths and deliver high code coverage during testing. The GKLEE tool uses this approach to expose data races in CUDA programs written for execution of GPGPUs. In programs employing concurrent dynamic data structures, automatic generation of data structures with appropriate shapes that cause threads to follow selected, possibly divergent, paths is a challenge. Moreover, a single non-conflicting data structure must be generated for multiple threads, that is, a single shape must be found that simultaneously causes all threads to follow their respective chosen paths. When an execution exposes a bug (e.g., a data race), the generated data structure shape helps the programmer understand the cause of the bug. Because GKLEE does not permit pointers that construct dynamic data structures to be made symbolic, it cannot automatically generate data structures of different shapes and must rely on the user to write code that constructs them to exercise desired paths. We have developed DSGEN for automatically generating non-conflicting dynamic data structures with different shapes and integrated it with GKLEE to uncover and facilitate understanding of data races in programs that employ complex concurrent dynamic data structures. In comparison to GKLEE, DSGEN increases the number of races detected from 10 to 25 by automatically generating a total of 1,897 shapes in implementations of four complex concurrent dynamic data structures -- B-Tree, Hash-Array Mapped Trie, RRB-Tree, and Skip List. 
    more » « less
  2. null (Ed.)
    While much of the research on graph analytics over large power-law graphs has focused on developing algorithms for evaluating a single global graph query, in practice we may be faced with a stream of queries. We observe that, due to their global nature, vertex specific graph queries present an opportunity for sharing work across queries. To take advantage of this opportunity, we have developed the VRGQ framework that accelerates the evaluation of a stream of queries via coarsegrained value reuse. In particular, the results of queries for a small set of source vertices are reused to speedup all future queries. We present a two step algorithm that in its first step initializes the query result based upon value reuse and then in the second step iteratively evaluates the query to convergence. The reused results for a small number of queries are held in a reuse table. Our experiments with best reuse configurations on four power law graphs and thousands of graph queries of five kinds yielded average speedups of 143×, 13.2×, 6.89×, 1.43×, and 1.18×. 
    more » « less
  3. null (Ed.)
    For compute-intensive iterative queries over a streaming graph, it is critical to evaluate the queries continuously and incrementally for best efficiency. However, the existing incremental graph processing requires a priori knowledge of the query (e.g., the source vertex of a vertex-specific query); otherwise, it has to fall back to the expensive full evaluation that starts from scratch. To alleviate this restriction, this work presents a principled solution to generalizing the incremental graph processing, such that queries, without their a priori knowledge, can also be evaluated incrementally. The solution centers around the concept of graph triangle inequalities, an idea inspired by the classical triangle inequality principle in the Euclidean space. Interestingly, similar principles can also be derived for many vertex-specific graph problems. These principles can help establish rigorous constraints between the evaluation of one graph query and the results of another, thus enabling reusing the latter to accelerate the former. Based on this finding, a novel streaming graph system, called Tripoline, is built which enables incremental evaluation of queries without their a priori knowledge. Built on top of a state-of-the-art shared-memory streaming graph engine (Aspen), Tripoline natively supports high-throughput low-cost graph updates. A systematic evaluation with a set of eight vertex-specific graph problems and four real-world large graphs confirms both the effectiveness of the proposed techniques and the efficiency of Tripoline. 
    more » « less
  4. null (Ed.)
    Simultaneous evaluating a batch of iterative graph queries on a distributed system enables amortization of high communication and computation costs across multiple queries. As demonstrated by our prior work on MultiLyra [BigData'19], batched graph query processing can deliver significant speedups and scale up to batch sizes of hundreds of queries.In this paper, we greatly expand the applicable scenarios for batching by developing BEAD, a system that supports Batching in the presence of Evolving Analytics Demands. First, BEAD allows the graph data set to evolve (grow) over time, more vertices (e.g., users) and edges (e.g., interactions) are added. In addition, as the graph data set evolves, BEAD also allows the user to add more queries of interests to the query batch to accommodate new user demands. The key to the superior efficiency offered by BEAD lies in a series of incremental evaluation techniques that leverage the results of prior request to "fast-foward" the evaluation of the current request.We performed experiments comparing batching in BEAD with batching in MultiLyra for multiple input graphs and algorithms. Experiments demonstrate that BEAD's batched evaluation of 256 queries, following graph changes that add up to 100K edges to a billion edge Twitter graph and also query changes of up to 32 new queries, outperforms MultiLyra's batched evaluation by factors of up to 26.16 × and 5.66 × respectively. 
    more » « less
  5. null (Ed.)
    Graph processing frameworks are typically designed to optimize the evaluation of a single graph query. However, in practice, we often need to respond to multiple graph queries, either from different users or from a single user performing a complex analytics task. Therefore in this paper we develop SimGQ, a system that optimizes simultaneous evaluation of a group of vertex queries that originate at different source vertices (e.g., multiple shortest path queries originating at different source vertices) and delivers substantial speedups over a conventional framework that evaluates and responds to queries one by one. The performance benefits are achieved via batching and sharing. Batching fully utilizes system resources to evaluate a batch of queries and amortizes runtime overheads incurred due to fetching vertices and edge lists, synchronizing threads, and maintaining computation frontiers. Sharing dynamically identifies shared queries that substantially represent subcomputations in the evaluation of different queries in a batch, evaluates the shared queries, and then uses their results to accelerate the evaluation of all queries in the batch. With four input power-law graphs and four graph algorithms SimGQ achieves speedups of up to 45.67 × with batch sizes of up to 512 queries over the baseline implementation that evaluates the queries one by one using the state of the art Ligra system. Moreover, both batching and sharing contribute substantially to the speedups. 
    more » « less
  6. null (Ed.)
    Graph processing workloads are memory intensive with irregular access patterns and large memory footprint resulting in low data locality. Their popular software implementations typically employ either Push or Pull style propagation of changes through the graph over multiple iterations that follow the Bulk Synchronous Model. The performance of these algorithms on traditional computing systems is limited by random reads/writes of vertex values, synchronization overheads, and additional overheads for tracking active sets of vertices or edges across iterations. In this paper, we present GraphPulse, a hardware framework for asynchronous graph processing with event-driven scheduling that overcomes the performance limitations of software frameworks. Event-driven computation model enables a parallel dataflow-style execution where atomic updates and active sets tracking are inherent to the model; thus, scheduling complexity is reduced and scalability is enhanced. The dataflow nature of the architecture also reduces random reads of vertex values by carrying the values in the events themselves. We capitalize on the update properties commonly present in graph algorithms to coalesce in-flight events and substantially reduce the event storage requirement and the processing overheads incurred. GraphPulse event-model naturally supports asynchronous graph processing, enabling substantially faster convergence by exploiting available parallelism, reducing work, and eliminating synchronization at iteration boundaries. The framework provides easy to use programming interface for faster development of hardware graph accelerators. A single GraphPulse accelerator achieves up to 74x speedup (28x on average) over Ligra, a state of the art software framework, running on a 12 core CPU. It also achieves an average of 6.2x speedup over Graphicionado, a state of the art graph processing accelerator. 
    more » « less