Adaptive Anonymization of Data Using b-Edge Cover
We explore the problem of sharing data that pertains to individuals with anonymity guarantees, where each user requires a desired level of privacy. We propose the first shared-memory as well as distributed memory parallel algorithms for the k-anonimity problem that achieves this goal, and produces high quality anonymized datasets. The new algorithm is based on an optimization procedure that iteratively computes weights on the edges of a dissimilarity matrix, and at each iteration computes a minimum weighted b-edgecover in the graph. We describe how a 2-approximation algorithm for computing the b-edgecover can be used to solve the adaptive anonymity problem in parallel. We are able to solve adaptive anonymity problems with hundreds of thousands of instances and hundreds of features on a supercomputer in under five minutes. Our algorithm scales up to 8000 cores on a distributed memory supercomputer, while also providing good speedups on shared memory multiprocessors. On smaller problems where an a Belief Propagation algorithm is feasible, our algorithm is two orders of magnitude faster.
Authors:
Award ID(s):
Publication Date:
NSF-PAR ID:
10110193
Journal Name:
Proceedings of ACM/IEEE Supercomputing Conference (SC18)
3. We describe a paradigm for designing parallel algorithms via approximation, and illustrate it on the b-edgecover problem. A b-edgecover of minimum weight in a graph is a subset $C$ of its edges such that at least a specified number $b(v)$ of edges in $C$ is incident on each vertex $v$, and the sum of the edge weights in $C$ is minimum. The Greedy algorithm and a variant, the LSE algorithm, provide $3/2$-approximation guarantees in the worst-case for this problem, but these algorithms have limited parallelism. Hence we design two new $2$-approximation algorithms with greater concurrency. The MCE algorithm reduces the computation of a b-edgecover to that of finding a b'-matching, by exploiting the relationship between these subgraphs in an approximation context. The LSE-NW is derived from the LSEalgorithm using static edge weights rather than dynamically computing effective edge weights. This relaxation gives LSE a worse approximation guarantee but makes it more amenable to parallelization. We prove that both the MCE and LSE-NW algorithms compute the same b-edgecover with at most twice the weight of the minimum weight edge cover. In practice, the $2$-approximation and $3/2$-approximation algorithms compute edge covers of weight within $10\%$ the optimal. We implement three of themore »