Keil, Mark; Mondal, Debajyoti
                            (Ed.)
                        
                    
            
                            Finding the kth nearest neighbor to a query point is a ubiquitous operation in many types of metric computations, especially those in unsupervised machine learning. In many such cases, the distance to k sample points is used as an estimate of the local density of the sample. In this paper, we give an algorithm that takes a finite metric (P,d) and an integer k and produces a subset S ⊆ P with the property that for any q ∈ P, the distance to the second nearest point of S to q is a constant factor approximation to the distance to the kth nearest point of P to q. Thus, the sample S may be used in lieu of P. In addition to being much smaller than P, the distance queries on S only require finding the second nearest neighbor instead of the kth nearest neighbor. This is a significant improvement, especially because theoretical guarantees on kth nearest neighbor methods often require k to grow as a function of the input size n. 
                        more » 
                        « less   
                     An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    