Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Recently there has been significant activity in developing algorithms with provable guarantees for topic modeling. In this work we consider a broad generalization of the traditional topic modeling framework, where we no longer assume that words are drawn i.i.d. and instead view a topic as a complex distribution over sequences of paragraphs. Since one could not hope to even represent such a distribution in general (even if paragraphs are given using some natural feature representation), we aim instead to directly learn a predictor that given a new document, accurately predicts its topic mixture, without learning the distributions explicitly. We present several natural conditions under which one can do this from unlabeled data only, and give efficient algorithms to do so, also discussing issues such as noise tolerance and sample complexity. More generally, our model can be viewed as a generalization of the multiview or cotraining setting in machine learning.

In this work we propose a model where the value of a buyer for some product (like a slice of pizza) is a combination of their personal desire for the product (how hungry they are for pizza) and the quality of the product (how good the pizza is). Sellers in this setting have a twodimensional optimization problem of determining both the quality level at which to make their product (how expensive ingredients to use) and the price at which to sell it. We analyze optimal seller strategies as well as analogs of Walrasian equilibria in this setting. A key question we are interested in is: to what extent will the price of a good be a reliable indicator of the good’s quality? One result we show is that indeed in this model, price will be a surprisingly robust signal for quality under optimal seller behavior. In particular, while the specific quality and price that a seller should choose will depend highly on the specific distribution of buyers, for optimal sellers, price and quality will be linearly related, independent of that distribution. We also show that for the case of multiple buyers and sellers, an analog of Walrasian equilibrium exists inmore »

In this work, we show that for a nontrivial hypothesis class C, we can estimate the distance of a target function f to C (estimate the error rate of the best h∈C) using substantially fewer labeled examples than would be needed to actually {\em learn} a good h∈C. Specifically, we show that for the class C of unions of d intervals on the line, in the active learning setting in which we have access to a pool of unlabeled examples drawn from an arbitrary underlying distribution D, we can estimate the error rate of the best h∈C to an additive error ϵ with a number of label requests that is {\em independent of d} and depends only on ϵ. In particular, we make O((1/ϵ^6)log(1/ϵ)) label queries to an unlabeled pool of size O((d/ϵ^2)log(1/ϵ)). This task of estimating the distance of an unknown f to a given class C is called {\em tolerant testing} or {\em distance estimation} in the testing literature, usually studied in a membership query model and with respect to the uniform distribution. Our work extends that of Balcan et al. (2012) who solved the {\em non}tolerant testing problem for this class (distinguishing the zeroerror case from themore »

We consider a collaborative PAC learning model, in which k players attempt to learn the same underlying concept. We ask how much more information is required to learn an accurate classifier for all players simultaneously. We refer to the ratio between the sample complexity of collaborative PAC learning and its noncollaborative (singleplayer) counterpart as the overhead. We design learning algorithms with O(ln(k)) and O(ln2 (k)) overhead in the personalized and centralized variants our model. This gives an exponential improvement upon the naïve algorithm that does not share information among players. We complement our upper bounds with an Ω(ln(k)) overhead lower bound, showing that our results are tight up to a logarithmic factor.

An important longterm goal in machine learning systems is to build learning agents that, like humans, can learn many tasks over their lifetime, and moreover use information from these tasks to improve their ability to do so efficiently. In this work, our goal is to provide new theoretical insights into the potential of this paradigm. In particular, we propose a lifelong learning framework that adheres to a novel notion of resource efficiency that is critical in many realworld domains where feature evaluations are costly. That is, our learner aims to reuse information from previously learned related tasks to learn future tasks in a featureefficient manner. Furthermore, we consider novel combinatorial ways in which learning tasks can relate. Specifically, we design lifelong learning algorithms for two structurally different and widely used families of target functions: decision trees/lists and monomials/polynomials. We also provide strong featureefficiency guarantees for these algorithms; in fact, we show that in order to learn future targets, we need only slightly more feature evaluations per training example than what is needed to predict on an arbitrary example using those targets. We also provide algorithms with guarantees in an agnostic model where not all the targets are related to eachmore »

We revisit the problem of designing optimal, individually rational matching mechanisms (in a general sense, allowing for cycles in directed graphs), where each player — who is associated with a subset of vertices — matches as many of his own vertices when he opts into the matching mechanism as when he opts out. We offer a new perspective on this problem by considering an arbitrary graph, but assuming that vertices are associated with players at random. Our main result asserts that, under certain conditions, any fixed optimal matching is likely to be individually rational up to lowerorder terms. We also show that a simple and practical mechanism is (fully) individually rational, and likely to be optimal up to lowerorder terms. We discuss the implications of our results for market design in general, and kidney exchange in particular.

We develop the first polynomialtime algorithm for cotraining of homogeneous linear separators under \em weak dependence, a relaxation of the condition of independence given the label. Our algorithm learns from purely unlabeled data, except for a single labeled example to break symmetry of the two classes, and works for any data distribution having an inversepolynomial margin and with center of mass at the origin.

In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier from the gathered data. This can give rise to computational and statistical challenges. For example, in most cases there are no known computationally efficient learning algorithms that are robust to the high level of noise that exists in crowdsourced data, and efforts to eliminate noise through voting often require a large number of queries per example. In this paper, we show how by interleaving the process of labeling and learning, we can attain computational efficiency with much less overhead in the labeling cost. In particular, we consider the realizable setting where there exists a true target function in F and consider a pool of labelers. When a noticeable fraction of the labelers are perfect, and the rest behave arbitrarily, we show that any F that can be efficiently learned in the traditional realizable PAC model can be learned in a computationally efficient manner by querying the crowd, despite high amounts of noise in the responses. Moreover, we show that this can be donemore »

For a set P of n points in the unit ball b ⊆ R d , consider the problem of finding a small subset T ⊆ P such that its convexhull εapproximates the convexhull of the original set. Specifically, the Hausdorff distance between the convex hull of T and the convex hull of P should be at most ε. We present an efficient algorithm to compute such an ε ′ approximation of size kalg, where ε ′ is a function of ε, and kalg is a function of the minimum size kopt of such an εapproximation. Surprisingly, there is no dependence on the dimension d in either of the bounds. Furthermore, every point of P can be ε approximated by a convexcombination of points of T that is O(1/ε2 )sparse. Our result can be viewed as a method for sparse, convex autoencoding: approximately representing the data in a compact way using sparse combinations of a small subset T of the original data. The new algorithm can be kernelized, and it preserves sparsity in the original input.