A Simple Algorithm for kNN Sampling in General Metrics

Gardner, Kirk P.; Sheehy, Donald R.

Citation Details

Finding the kth nearest neighbor to a query point is a ubiquitous operation in many types of metric computations, especially those in unsupervised machine learning. In many such cases, the distance to k sample points is used as an estimate of the local density of the sample. In this paper, we give an algorithm that takes a finite metric (P,d) and an integer k and produces a subset S ⊆ P with the property that for any q ∈ P, the distance to the second nearest point of S to q is a constant factor approximation to the distance to the kth nearest point of P to q. Thus, the sample S may be used in lieu of P. In addition to being much smaller than P, the distance queries on S only require finding the second nearest neighbor instead of the kth nearest neighbor. This is a significant improvement, especially because theoretical guarantees on kth nearest neighbor methods often require k to grow as a function of the input size n. more »

Award ID(s):: 2017980

NSF-PAR ID:: 10211947

Author(s) / Creator(s):: Gardner, Kirk P.; Sheehy, Donald R.

Editor(s):: Keil, Mark; Mondal, Debajyoti

Date Published:: 2020-01-01

Journal Name:: Proceedings of the 32nd Canadian Conference on Computational Geometry

Page Range / eLocation ID:: 345 - 351

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this