Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Motivated by performance optimization of large-scale graph processing systems that distribute the graph across multiple machines, we consider the balanced graph partitioning problem. Compared to most of the previous work, we study the multi-dimensional variant in which balance according to multiple weight functions is required. As we demonstrate by experimental evaluation, such multi-dimensional balance is essential for achieving performance improvements for typical distributed graph processing workloads. We propose a new scalable technique for the multidimensional balanced graph partitioning problem. It is based on applying randomized projected gradient descent to a non-convex continuous relaxation of the objective. We show how to implement the new algorithm efficiently in both theory and practice utilizing various approaches for the projection step. Experiments with large-scale graphs containing up to hundreds of billions of edges indicate that our algorithm has superior performance compared to the state of the art.more » « less
-
We present first massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of $$n$$ input $$d$$-dimensional vectors under Hamming, $$\ell_1, \ell_2$$ and $$\ell_\infty$$ distances. All our algorithms run in $$O(\log n)$$ rounds of MPC for any fixed $$d$$ and achieve $$(1+\epsilon)$$-approximation for all distances (except Hamming for which we show an exact algorithm). We also show constant-factor inapproximability results for $$o(\log n)$$-round algorithms under standard MPC hardness assumptions (for sufficiently large dimension depending on the distance used). Efficiency of implementation of our algorithms in Apache Spark is demonstrated through experiments on the largest available vector datasets from the UCI machine learning repository exhibiting speedups of several orders of magnitude.more » « less
An official website of the United States government

Full Text Available