skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 1813160

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Dasgupta, S.; Haghtalab, N. (Ed.)
    We consider a lifelong learning scenario in which a learner faces a neverending and arbitrary stream of facts and has to decide which ones to retain in its limited memory. We introduce a mathematical model based on the online learning framework, in which the learner measures itself against a collection of experts that are also memory-constrained and that reflect different policies for what to remember. Interspersed with the stream of facts are occasional questions, and on each of these the learner incurs a loss if it has not remembered the corresponding fact. Its goal is to do almost as well as the best expert in hindsight, while using roughly the same amount of memory. We identify difficulties with using the multiplicative weights update algorithm in this memory-constrained scenario, and design an alternative scheme whose regret guarantees are close to the best possible. 
    more » « less
  2. We prove asymptotic convergence for a general class of k-means algorithms performed over streaming data from a distribution— the centers asymptotically converge to the set of stationary points of the k-means objective function. To do so, we show that online k-means over a distribution can be interpreted as stochastic gradient descent with a stochastic learning rate schedule. Then, we prove convergence by extending techniques used in optimization literature to handle settings where center-specific learning rates may depend on the past trajectory of the centers. 
    more » « less
  3. Within machine learning, active learning studies the gains in performance made possible by adaptively selecting data points to label. In this work, we show through upper and lower bounds, that for a simple benign setting of well-specified logistic regression on a uniform distribution over a sphere, the expected excess error of both active learning and random sampling have the same inverse proportional dependence on the number of samples. Importantly, due to the nature of lower bounds, any more general setting does not allow a better dependence on the number of samples. Additionally, we show a variant of uncertainty sampling can achieve a faster rate of convergence than random sampling by a factor of the Bayes error, a recent empirical observation made by other work. Qualitatively, this work is pessimistic with respect to the asymptotic dependence on the number of samples, but optimistic with respect to finding performance gains in the constants. 
    more » « less
  4. Recent work introduced the model of learning from discriminative feature feedback, in which a human annotator not only provides labels of instances, but also identifies discriminative features that highlight important differences between pairs of instances. It was shown that such feedback can be conducive to learning, and makes it possible to efficiently learn some concept classes that would otherwise be in- tractable. However, these results all relied upon perfect annotator feedback. In this pa- per, we introduce a more realistic, robust ver- sion of the framework, in which the annotator is allowed to make mistakes. We show how such errors can be handled algorithmically, in both an adversarial and a stochastic setting. In particular, we derive regret bounds in both settings that, as in the case of a perfect an- notator, are independent of the number of features. We show that this result cannot be obtained by a naive reduction from the robust setting to the non-robust setting. 
    more » « less
  5. Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call data-copying – where the gener- ative model memorizes and outputs training samples or small variations thereof. We pro- vide a three sample non-parametric test for detecting data-copying that uses the training set, a separate sample from the target dis- tribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets. 
    more » « less
  6. We consider the problem of embedding a relation, represented as a directed graph, into Euclidean space. For three types of embeddings motivated by the recent literature on knowledge graphs, we obtain characterizations of which relations they are able to capture, as well as bounds on the minimal dimensionality and precision needed. 
    more » « less
  7. We introduce a variant of the k-nearest neighbor classifier in which k is chosen adaptively for each query, rather than being supplied as a parameter. The choice of k depends on properties of each neighborhood, and therefore may significantly vary between different points. For example, the algorithm will use larger k for predicting the labels of points in noisy regions. We provide theory and experiments that demonstrate that the algorithm performs comparably to, and sometimes better than, k-NN with an optimal choice of k. In particular, we bound the convergence rate of our classifier in terms of a lo- cal quantity we call the “advantage”, giving results that are both more general and more accurate than the smoothness-based bounds of earlier nearest neighbor work. Our analysis uses a variant of the uniform convergence theorem of Vapnik- Chervonenkis that is for empirical estimates of conditional probabilities and may be of independent interest. 
    more » « less
  8. null (Ed.)
    One widely-studied model of teaching calls for a teacher to provide the minimal set of labeled examples that uniquely specifies a target concept. The assumption is that the teacher knows the learner’s hypothesis class, which is often not true of real-life teaching scenarios. We consider the problem of teaching a learner whose representation and hypothesis class are unknown—that is, the learner is a black box. We show that a teacher who does not interact with the learner can do no better than providing random examples. We then prove, however, that with interaction, a teacher can efficiently find a set of teaching examples that is a provably good approximation to the optimal set. As an illustration, we show how this scheme can be used to shrink training sets for any family of classifiers: that is, to find an approximately-minimal subset of training instances that yields the same classifier as the entire set. 
    more » « less