We consider the maximum vertexweighted matching problem (MVM), in which nonnegative weights are assigned to the vertices of a graph, and the weight of a matching is the sum of the weights of the matched vertices. Although exact algorithms for MVM are faster than exact algorithms for the maximum edgeweighted matching problem, there are graphs on which these exact algorithms could take hundreds of hours. For a natural number k, we design a k/(k + 1)approximation algorithm for MVM on nonbipartite graphs that updates the matching along certain short paths in the graph: either augmenting paths of length at most 2k + 1 or weightincreasing paths of length at most 2k. The choice of k = 2 leads to a 2/3approximation algorithm that computes nearly optimal weights fast. This algorithm could be initialized with a 2/3approximate maximum cardinality matching to reduce its runtime in practice. A 1/2approximation algorithm may be obtained using k = 1, which is faster than the 2/3approximation algorithm but it computes lower weights. The 2/3approximation algorithm has time complexity O(Δ2m) while the time complexity of the 1/2approximation algorithm is O(Δm), where m is the number of edges and Δ is the maximum degree of a vertex.more »
Adaptive Anonymization of Data Using bEdge Cover
We explore the problem of sharing data that pertains to individuals with anonymity guarantees, where each user requires a desired level of privacy.
We propose the first sharedmemory as well as distributed memory parallel algorithms for the kanonimity problem that achieves this goal, and produces high quality anonymized datasets.
The new algorithm is based on an optimization procedure that iteratively computes weights on the edges of a dissimilarity matrix, and at each iteration computes a minimum weighted bedgecover in the graph. We describe how a 2approximation algorithm for computing the bedgecover can be used to solve the adaptive anonymity problem in parallel.
We are able to solve adaptive anonymity problems with hundreds of thousands of instances and hundreds of features on a supercomputer in under five minutes. Our algorithm scales up to 8000 cores on a distributed memory supercomputer, while also providing good speedups on shared memory multiprocessors. On smaller problems where an a Belief Propagation algorithm is feasible, our algorithm is two orders of magnitude faster.
 Award ID(s):
 1637534
 Publication Date:
 NSFPAR ID:
 10110193
 Journal Name:
 Proceedings of ACM/IEEE Supercomputing Conference (SC18)
 Sponsoring Org:
 National Science Foundation
More Like this


We present an optimized FloydWarshall (FloydWarshall) algorithm that computes the Allpairs shortest path (APSP) for GPU accelerated clusters. The FloydWarshall algorithm due to its structural similarities to matrixmultiplication is well suited for highly parallel GPU architectures. To achieve high parallel efficiency, we address two key algorithmic challenges: reducing high communication overhead and addressing limited GPU memory. To reduce high communication costs, we redesign the parallel (a) to expose more parallelism, (b) aggressively overlap communication and computation with pipelined and asynchronous scheduling of operations, and (c) tailored MPIcollective. To cope with limited GPU memory, we employ an offload model, where the data resides on the host and is transferred to GPU ondemand. The proposed optimizations are supported with detailed performance models for tuning. Our optimized parallel FloydWarshall implementation is up to 5x faster than a strong baseline and achieves 8.1 PetaFLOPS/sec on 256~nodes of the Summit supercomputer at Oak Ridge National Laboratory. This performance represents 70% of the theoretical peak and 80% parallel efficiency. The offload algorithm can handle 2.5x larger graphs with a 20% increase in overall running time.

We describe a paradigm for designing parallel algorithms via approximation, and illustrate it on the bedgecover problem. A bedgecover of minimum weight in a graph is a subset $C$ of its edges such that at least a specified number $b(v)$ of edges in $C$ is incident on each vertex $v$, and the sum of the edge weights in $C$ is minimum. The Greedy algorithm and a variant, the LSE algorithm, provide $3/2$approximation guarantees in the worstcase for this problem, but these algorithms have limited parallelism. Hence we design two new $2$approximation algorithms with greater concurrency. The MCE algorithm reduces the computation of a bedgecover to that of finding a b'matching, by exploiting the relationship between these subgraphs in an approximation context. The LSENW is derived from the LSEalgorithm using static edge weights rather than dynamically computing effective edge weights. This relaxation gives LSE a worse approximation guarantee but makes it more amenable to parallelization. We prove that both the MCE and LSENW algorithms compute the same bedgecover with at most twice the weight of the minimum weight edge cover. In practice, the $2$approximation and $3/2$approximation algorithms compute edge covers of weight within $10\%$ the optimal. We implement three of themore »

A bmatching is a subset of edges M such that at most b(v) edges in M are incident on each vertex v, where b(v) is specified. We present a distributedmemory parallel algorithm, \bsuitor, that computes a bmatching with more than half the maximum weight in a graph with weights on the edges. The approximation algorithm is designed to have high concurrency and low time complexity. We organize the implementation of the algorithm in terms of asynchronous supersteps that combine computation and communication, and balance the computational work and frequency of communication to obtain high performance. Since the performance of the bsuitor algorithm is strongly influenced by communication, we present several strategies to reduce the communication volume. We implement the algorithm using a hybrid strategy where internode communication uses MPI and intranode computation is done with OpenMP threads. We demonstrate strong and weak scaling of bsuitor up to 16,000 processors on two supercomputers at NERSC. We compute a bmatching in a graph with 2 billion edges in under 4 seconds using 16,000 processors.

In the unitcost comparison model, a black box takes an input two items and outputs the result of the comparison. Problems like sorting and searching have been studied in this model, and it has been general ized to include the concept of priced information, where different pairs of items (say database records) have different comparison costs. These comparison costs can be arbitrary (in which case no algorithm can be close to optimal (Charikar et al. STOC 2000)), structured (for exam ple, the comparison cost may depend on the length of the databases (Gupta et al. FOCS 2001)), or stochastic (Angelov et al. LATIN 2008). Motivated by the database setting where the cost depends on the sizes of the items, we consider the problems of sorting and batched predecessor where two nonuniform sets of items A and B are given as input. (1) In the RAM setting, we consider the scenario where both sets have n keys each. The cost to compare two items in A is a, to compare an item of A to an item of B is b, and to compare two items in B is c. We give upper and lower bounds for the case a ≤more »