We describe a paradigm for designing parallel algorithms via approximation, and illustrate it on the bedgecover problem.
A bedgecover of minimum weight in a graph
is a subset $C$ of its edges such that
at least a specified number $b(v)$ of edges in $C$ is incident on each vertex $v$, and the sum of the edge weights in $C$ is minimum.
The Greedy algorithm and a variant, the LSE algorithm, provide $3/2$approximation guarantees in the worstcase for this problem, but these algorithms have limited parallelism. Hence we design two new $2$approximation algorithms with greater concurrency. The MCE algorithm reduces the computation of a bedgecover to that of finding a b'matching, by exploiting the relationship between these subgraphs in an approximation context.
The LSENW is derived from the LSEalgorithm using static edge weights rather than dynamically computing effective edge weights. This relaxation gives LSE a worse approximation guarantee but makes it more amenable to parallelization. We prove that both the MCE and LSENW algorithms compute the same bedgecover with at most twice the weight of the minimum weight edge cover. In practice,
the $2$approximation and $3/2$approximation algorithms compute edge covers of weight within $10\%$ the optimal. We implement three of the approximation algorithms, MCE, LSE, and LSENW on shared memory multicore machines, including an Intel Xeon and an IBM Power8 machine with 8 TB memory. The MCE algorithm is the fastest of these by an order of magnitude or more. It computes an edge cover in a graph with billions of edges in $20$ seconds using two hundred threads on the IBM Power8. We also show that the parallel depth and work can be bounded for the Suitor and bSuitor algorithms when edge weights are random.
more »
« less
SharedMemory Parallel Maximal Clique Enumeration
We present sharedmemory parallel methods for Maximal Clique Enumeration (MCE) from a graph. MCE is a fundamental and wellstudied graph analytics task, and is a widely used primitive for identifying dense structures in a graph. Due to its computationally intensive nature, parallel methods are imperative for dealing with large graphs. However, surprisingly, there do not yet exist scalable and parallel methods for MCE on a sharedmemory parallel machine. In this work, we present efficient sharedmemory parallel algorithms for MCE, with the following properties: (1) the parallel algorithms are provably workefficient relative to a stateoftheart sequential algorithm (2) the algorithms have a provably small parallel depth, showing that they can scale to a large number of processors, and (3) our implementations on a multicore machine shows a good speedup and scaling behavior with increasing number of cores, and are substantially faster than prior sharedmemory parallel algorithms for MCE.
more »
« less
 Award ID(s):
 1725702
 NSFPAR ID:
 10073156
 Date Published:
 Journal Name:
 arXiv.org
 ISSN:
 23318422
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Motivated by the significantly higher cost of writing than reading in emerging memory technologies, we consider parallel algorithm design under such asymmetric readwrite costs, with the goal of reducing the number of writes while preserving workefficiency and low span. We present a nestedparallel model of computation that combines (i) small pertask stackallocated memories with symmetric readwrite costs and (ii) an unbounded heapallocated shared memory with asymmetric readwrite costs, and show how the costs in the model map efficiently onto a more concrete machine model under a workstealing scheduler. We use the new model to design reduced write, workefficient, low span parallel algorithms for a number of fundamental problems such as reduce, list contraction, tree contraction, breadthfirst search, ordered filter, and planar convex hull. For the latter two problems, our algorithms are outputsensitive in that the work and number of writes decrease with the output size. We also present a reduced write, low span minimum spanning tree algorithm that is nearly workefficient (off by the inverse Ackermann function). Our algorithms reveal several interesting techniques for significantly reducing shared memory writes in parallel algorithms without asymptotically increasing the number of shared memory reads.more » « less

null (Ed.)There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publiclyavailable realworld graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoreticallyefficient parallel graph algorithms can scale to the largest publiclyavailable graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoreticallyefficient parallel algorithms for 20 important graph problems. We also present the interfaces, optimizations, and graph processing techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing stateoftheart implementations on the largest realworld graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publiclyavailable as the Graph Based Benchmark Suite (GBBS).more » « less

We present shared memory parallel algorithms for maximal biclique enumeration (MBE), the task of enumerating all complete dense subgraphs (maximal bicliques) from a bipartite graph, which is widely used in the analysis of social, biological, and transactional networks. Since MBE is computationally expensive, it is necessary to use parallel computing to scale to large graphs. Our parallel algorithm ParMBE efficiently uses the power of multiple cores that share memory. From a theoretical view, ParMBE is workefficient with respect to a stateoftheart sequential algorithm. Our experimental evaluation shows that ParMBE scales well up to 64 cores, and is significantly faster than current parallel algorithms. Since ParMBE was yielding a superlinear speedup compared to the sequential algorithm on which it was based (MineLMBC), we develop an improved sequential algorithm FMBE, through "sequentializing" ParMBE.more » « less

Computing the singlesource shortest path (SSSP) is one of the fundamental graph algorithms, and is used in many applications. Here, we focus on computing SSSP on large dynamic graphs, i.e. graphs whose structure evolves with time. We posit that instead of recomputing the SSSP for each set of changes on the dynamic graphs, it is more efficient to update the results based only on the region of change. To this end, we present a novel twostep sharedmemory algorithm for updating SSSP on weighted largescale graphs. The key idea of our algorithm is to identify changes, such as vertex/edge addition and deletion, that affect the shortest path computations and update only the parts of the graphs affected by the change. We provide the proof of correctness of our proposed algorithm. Our experiments on real and synthetic networks demonstrate that our algorithm is as much as 4X faster compared to computing SSSP with Galois, a stateoftheart parallel graph analysis software for shared memory architectures. We also demonstrate how increasing the asynchrony can lead to even faster updates. To the best of our knowledge, this is one of the first practical parallel algorithms for updating networks on sharedmemory systems, that is also scalable to large networks.more » « less