skip to main content


Title: Adaptive Anonymization of Data Using b-Edge Cover
We explore the problem of sharing data that pertains to individuals with anonymity guarantees, where each user requires a desired level of privacy. We propose the first shared-memory as well as distributed memory parallel algorithms for the k-anonimity problem that achieves this goal, and produces high quality anonymized datasets. The new algorithm is based on an optimization procedure that iteratively computes weights on the edges of a dissimilarity matrix, and at each iteration computes a minimum weighted b-edgecover in the graph. We describe how a 2-approximation algorithm for computing the b-edgecover can be used to solve the adaptive anonymity problem in parallel. We are able to solve adaptive anonymity problems with hundreds of thousands of instances and hundreds of features on a supercomputer in under five minutes. Our algorithm scales up to 8000 cores on a distributed memory supercomputer, while also providing good speedups on shared memory multiprocessors. On smaller problems where an a Belief Propagation algorithm is feasible, our algorithm is two orders of magnitude faster.  more » « less
Award ID(s):
1637534
NSF-PAR ID:
10110193
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of ACM/IEEE Supercomputing Conference (SC18)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the maximum vertex-weighted matching problem (MVM), in which non-negative weights are assigned to the vertices of a graph, and the weight of a matching is the sum of the weights of the matched vertices. Although exact algorithms for MVM are faster than exact algorithms for the maximum edge-weighted matching problem, there are graphs on which these exact algorithms could take hundreds of hours. For a natural number k, we design a k/(k + 1)approximation algorithm for MVM on non-bipartite graphs that updates the matching along certain short paths in the graph: either augmenting paths of length at most 2k + 1 or weight-increasing paths of length at most 2k. The choice of k = 2 leads to a 2/3-approximation algorithm that computes nearly optimal weights fast. This algorithm could be initialized with a 2/3-approximate maximum cardinality matching to reduce its runtime in practice. A 1/2-approximation algorithm may be obtained using k = 1, which is faster than the 2/3-approximation algorithm but it computes lower weights. The 2/3-approximation algorithm has time complexity O(Δ2m) while the time complexity of the 1/2-approximation algorithm is O(Δm), where m is the number of edges and Δ is the maximum degree of a vertex. Results from our serial implementations show that on average the 1/2-approximation algorithm runs faster than the Suitor algorithm (currently the fastest 1/2-approximation algorithm) while the 2/3-approximation algorithm runs as fast as the Suitor algorithm but obtains higher weights for the matching. One advantage of the proposed algorithms is that they are well-suited for parallel implementation since they can process vertices to match in any order. The 1/2- and 2/3-approximation algorithms have been implemented on a shared memory parallel computer, and both approximation algorithms obtain good speedups, while the former algorithm runs faster on average than the parallel Suitor algorithm. Care is needed to design the parallel algorithm to avoid cyclic waits, and we prove that it is live-lock free. 
    more » « less
  2. null (Ed.)
    We present an optimized Floyd-Warshall (Floyd-Warshall) algorithm that computes the All-pairs shortest path (APSP) for GPU accelerated clusters. The Floyd-Warshall algorithm due to its structural similarities to matrix-multiplication is well suited for highly parallel GPU architectures. To achieve high parallel efficiency, we address two key algorithmic challenges: reducing high communication overhead and addressing limited GPU memory. To reduce high communication costs, we redesign the parallel (a) to expose more parallelism, (b) aggressively overlap communication and computation with pipelined and asynchronous scheduling of operations, and (c) tailored MPI-collective. To cope with limited GPU memory, we employ an offload model, where the data resides on the host and is transferred to GPU on-demand. The proposed optimizations are supported with detailed performance models for tuning. Our optimized parallel Floyd-Warshall implementation is up to 5x faster than a strong baseline and achieves 8.1 PetaFLOPS/sec on 256~nodes of the Summit supercomputer at Oak Ridge National Laboratory. This performance represents 70% of the theoretical peak and 80% parallel efficiency. The offload algorithm can handle 2.5x larger graphs with a 20% increase in overall running time. 
    more » « less
  3. We describe a paradigm for designing parallel algorithms via approximation, and illustrate it on the b-edgecover problem. A b-edgecover of minimum weight in a graph is a subset $C$ of its edges such that at least a specified number $b(v)$ of edges in $C$ is incident on each vertex $v$, and the sum of the edge weights in $C$ is minimum. The Greedy algorithm and a variant, the LSE algorithm, provide $3/2$-approximation guarantees in the worst-case for this problem, but these algorithms have limited parallelism. Hence we design two new $2$-approximation algorithms with greater concurrency. The MCE algorithm reduces the computation of a b-edgecover to that of finding a b'-matching, by exploiting the relationship between these subgraphs in an approximation context. The LSE-NW is derived from the LSEalgorithm using static edge weights rather than dynamically computing effective edge weights. This relaxation gives LSE a worse approximation guarantee but makes it more amenable to parallelization. We prove that both the MCE and LSE-NW algorithms compute the same b-edgecover with at most twice the weight of the minimum weight edge cover. In practice, the $2$-approximation and $3/2$-approximation algorithms compute edge covers of weight within $10\%$ the optimal. We implement three of the approximation algorithms, MCE, LSE, and LSE-NW on shared memory multi-core machines, including an Intel Xeon and an IBM Power8 machine with 8 TB memory. The MCE algorithm is the fastest of these by an order of magnitude or more. It computes an edge cover in a graph with billions of edges in $20$ seconds using two hundred threads on the IBM Power8. We also show that the parallel depth and work can be bounded for the Suitor and b-Suitor algorithms when edge weights are random. 
    more » « less
  4. Computational fluid dynamics (CFD) is increasingly used to study blood flows in patient-specific arteries for understanding certain cardiovascular diseases. The techniques work quite well for relatively simple problems but need improvements when the problems become harder when (a) the geometry becomes complex (eg, a few branches to a full pulmonary artery), (b) the model becomes more complex (eg, fluid-only to coupled fluid-structure interaction), (c) both the fluid and wall models become highly nonlinear, and (d) the computer on which we run the simulation is a supercomputer with tens of thousands of processor cores. To push the limit of CFD in all four fronts, in this paper, we develop and study a highly parallel algorithm for solving a monolithically coupled fluid-structure system for the modeling of the interaction of the blood flow and the arterial wall. As a case study, we consider a patient-specific, full size pulmonary artery obtained from computed tomography (CT) images, with an artificially added layer of wall with a fixed thickness. The fluid is modeled with a system of incompressible Navier-Stokes equations, and the wall is modeled by a geometrically nonlinear elasticity equation. As far as we know, this is the first time the unsteady blood flow in a full pulmonary artery is simulated without assuming a rigid wall. The proposed numerical algorithm and software scale well beyond 10 000 processor cores on a supercomputer for solving the fluid-structure interaction problem discretized with a stabilized finite element method in space and an implicit scheme in time involving hundreds of millions of unknowns. 
    more » « less
  5. Abstract

    Computational fluid dynamics (CFD) is increasingly used to study blood flows in patient‐specific arteries for understanding certain cardiovascular diseases. The techniques work quite well for relatively simple problems but need improvements when the problems become harder when (a) the geometry becomes complex (eg, a few branches to a full pulmonary artery), (b) the model becomes more complex (eg, fluid‐only to coupled fluid‐structure interaction), (c) both the fluid and wall models become highly nonlinear, and (d) the computer on which we run the simulation is a supercomputer with tens of thousands of processor cores. To push the limit of CFD in all four fronts, in this paper, we develop and study a highly parallel algorithm for solving a monolithically coupled fluid‐structure system for the modeling of the interaction of the blood flow and the arterial wall. As a case study, we consider a patient‐specific, full size pulmonary artery obtained from computed tomography (CT) images, with an artificially added layer of wall with a fixed thickness. The fluid is modeled with a system of incompressible Navier‐Stokes equations, and the wall is modeled by a geometrically nonlinear elasticity equation. As far as we know, this is the first time the unsteady blood flow in a full pulmonary artery is simulated without assuming a rigid wall. The proposed numerical algorithm and software scale well beyond 10 000 processor cores on a supercomputer for solving the fluid‐structure interaction problem discretized with a stabilized finite element method in space and an implicit scheme in time involving hundreds of millions of unknowns.

     
    more » « less