Massively Parallel Algorithms and Hardness for Single-Linkage Clustering under ℓp-Distances

Yaroslavtsev, Grigory; Vadapalli, Adithya

Citation Details

We present first massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of $$n$$ input $$d$$-dimensional vectors under Hamming, $$\ell_1, \ell_2$$ and $$\ell_\infty$$ distances. All our algorithms run in $$O(\log n)$$ rounds of MPC for any fixed $$d$$ and achieve $$(1+\epsilon)$$-approximation for all distances (except Hamming for which we show an exact algorithm). We also show constant-factor inapproximability results for $$o(\log n)$$-round algorithms under standard MPC hardness assumptions (for sufficiently large dimension depending on the distance used). Efficiency of implementation of our algorithms in Apache Spark is demonstrated through experiments on the largest available vector datasets from the UCI machine learning repository exhibiting speedups of several orders of magnitude. more »

Award ID(s):: 1657477

PAR ID:: 10088159

Author(s) / Creator(s):: Yaroslavtsev, Grigory; Vadapalli, Adithya

Date Published:: 2018-07-13

Journal Name:: 35th International Conference on Machine Learning (ICML'18)

Page Range / eLocation ID:: 5596-5605

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this