skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Distributed adaptive nearest neighbor classifier: algorithm and theory
When data is of an extraordinarily large size or physically stored in different locations, the distributed nearest neighbor (NN) classifier is an attractive tool for classification. We propose a novel distributed adaptive NN classifier for which the number of nearest neighbors is a tuning parameter stochastically chosen by a data-driven criterion. An early stopping rule is proposed when searching for the optimal tuning parameter, which not only speeds up the computation but also improves the finite sample performance of the proposed algorithm. Convergence rate of excess risk of the distributed adaptive NN classifier is investigated under various sub-sample size compositions. In particular, we show that when the sub-sample sizes are sufficiently large, the proposed classifier achieves the nearly optimal convergence rate. Effectiveness of the proposed approach is demonstrated through simulation studies as well as an empirical application to a real-world dataset.  more » « less
Award ID(s):
2005779
PAR ID:
10450329
Author(s) / Creator(s):
Date Published:
Journal Name:
Statisctics and computing
ISSN:
2197-1706
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We introduce a variant of the k-nearest neighbor classifier in which k is chosen adaptively for each query, rather than being supplied as a parameter. The choice of k depends on properties of each neighborhood, and therefore may significantly vary between different points. For example, the algorithm will use larger k for predicting the labels of points in noisy regions. We provide theory and experiments that demonstrate that the algorithm performs comparably to, and sometimes better than, k-NN with an optimal choice of k. In particular, we bound the convergence rate of our classifier in terms of a lo- cal quantity we call the “advantage”, giving results that are both more general and more accurate than the smoothness-based bounds of earlier nearest neighbor work. Our analysis uses a variant of the uniform convergence theorem of Vapnik- Chervonenkis that is for empirical estimates of conditional probabilities and may be of independent interest. 
    more » « less
  2. In this paper, we study the application of quasi-Newton methods for solving empirical risk minimization (ERM) problems defined over a large dataset. Traditional deterministic and stochastic quasi-Newton methods can be executed to solve such problems; however, it is known that their global convergence rate may not be better than first-order methods, and their local superlinear convergence only appears towards the end of the learning process. In this paper, we use an adaptive sample size scheme that exploits the superlinear convergence of quasi-Newton methods globally and throughout the entire learning process. The main idea of the proposed adaptive sample size algorithms is to start with a small subset of data points and solve their corresponding ERM problem within its statistical accuracy, and then enlarge the sample size geometrically and use the optimal solution of the problem corresponding to the smaller set as an initial point for solving the subsequent ERM problem with more samples. We show that if the initial sample size is sufficiently large and we use quasi-Newton methods to solve each subproblem, the subproblems can be solved superlinearly fast (after at most three iterations), as we guarantee that the iterates always stay within a neighborhood that quasi-Newton methods converge superlinearly. Numerical experiments on various datasets confirm our theoretical results and demonstrate the computational advantages of our method. 
    more » « less
  3. We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor in (0, 1), given an arbitrary sample path with “covering time” L, we establish that the algorithm is guaranteed to output an "-accurate estimate of the optimal Q-function nearly optimal sample complexity. 
    more » « less
  4. Application-layer transfer configurations play a crucial role in achieving desirable performance in high-speed networks. However, finding the optimal configuration for a given transfer task is a difficult problem as it depends on various factors including dataset characteristics, network settings, and background traffic. The state-of-the-art transfer tuning solutions rely on real-time sample transfers to evaluate various configurations and estimate the optimal one. However, existing approaches to run sample transfers incur high delay and measurement errors, thus significantly limit the efficiency of the transfer tuning algorithms. In this paper, we introduce adaptive feed forward deep neural network (DNN) to minimize the error rate of sample transfers without increasing their execution time. We ran 115K file transfers in four different high-speed networks and used their logs to train an adaptive DNN that can quickly and accurately predict the throughput of sample transfers by analyzing instantaneous throughput values. The results gathered in various networks with rich set of transfer configurations indicate that the proposed model reduces error rate by up to 50% compared to the state-of-the-art solutions while keeping the execution time low. We also show that one can further reduce delay or error rate by tuning hyperparameters of the model to meet specific needs of user or application. Finally, transfer learning analysis reveals that the model developed in one network would yield accurate results in other networks with similar transfer convergence characteristics, alleviating the needs to run an extensive data collection and model derivation efforts for each network. 
    more » « less
  5. The weighted nearest neighbors (WNN) estimator has been popularly used as a flexible and easy-to-implement nonparametric tool for mean regression estimation. The bagging technique is an elegant way to form WNN estimators with weights automatically generated to the nearest neighbors (Steele, 2009; Biau et al., 2010); we name the resulting estimator as the distributional nearest neighbors (DNN) for easy reference. Yet, there is a lack of distributional results for such estimator, limiting its application to statistical inference. Moreover, when the mean regression function has higher-order smoothness, DNN does not achieve the optimal nonparametric convergence rate, mainly because of the bias issue. In this work, we provide an in-depth technical analysis of the DNN, based on which we suggest a bias reduction approach for the DNN estimator by linearly combining two DNN estimators with different subsampling scales, resulting in the novel two-scale DNN (TDNN) estimator. The two-scale DNN estimator has an equivalent representation of WNN with weights admitting explicit forms and some being negative. We prove that, thanks to the use of negative weights, the two-scale DNN estimator enjoys the optimal nonparametric rate of convergence in estimating the regression function under the fourth order smoothness condition. We further go beyond estimation and establish that the DNN and two-scale DNN are both asymptotically normal as the subsampling scales and sample size diverge to infinity. For the practical implementation, we also provide variance estimators and a distribution estimator using the jackknife and bootstrap techniques for the two-scale DNN. These estimators can be exploited for constructing valid confidence intervals for nonparametric inference of the regression function. The theoretical results and appealing nite-sample performance of the suggested two-scale DNN method are illustrated with several simulation examples and a real data application. 
    more » « less