Efficient Distributed Algorithms for the K-Nearest Neighbors Problem

Fathi, Reza; Molla, Anisur Rahaman; Pandurangan, Gopal

doi:10.1145/3350755.3400268

Citation Details

Efficient Distributed Algorithms for the K-Nearest Neighbors Problem

The K-nearest neighbors is a basic problem in machine learning with numerous applications. In this problem, given a (training) set of n data points with labels and a query point q, we want to assign a label to q based on the labels of the K-nearest points to the query. We study this problem in the k-machine model, a model for distributed large-scale data. In this model, we assume that the n points are distributed (in a balanced fashion) among the k machines and the goal is to compute an answer given a query point to a machine using a small number of communication rounds. Our main result is a randomized algorithm in the k-machine model that runs in O(log K) communication rounds with high success probability (regardless of the number of machines k and the number of points n). The message complexity of the algorithm is small taking only O(k log K) messages. Our bounds are essentially the best possible for comparison-based algorithms. We also implemented our algorithm and show that it performs well in practice. more »

Award ID(s):: 1633720

PAR ID:: 10197167

Author(s) / Creator(s):: Fathi, Reza; Molla, Anisur Rahaman; Pandurangan, Gopal

Date Published:: 2020-07-01

Journal Name:: SPAA '20: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures

Page Range / eLocation ID:: 527 to 529

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
https://doi.org/10.1145/3350755.3400268

More Like this