skip to main content

Title: Communication-Aware Collaborative Learning
Algorithms for noiseless collaborative PAC learning have been analyzed and optimized in recent years with respect to sample complexity. In this paper, we study collaborative PAC learning with the goal of reducing communication cost at essentially no penalty to the sample complexity. We develop communication efficient collaborative PAC learning algorithms using distributed boosting. We then consider the communication cost of collaborative learning in the presence of classification noise. As an intermediate step, we show how collaborative PAC learning algorithms can be adapted to handle classification noise. With this insight, we develop communication efficient algorithms for collaborative PAC learning robust to classification noise.
Authors:
; ;
Award ID(s):
1815011
Publication Date:
NSF-PAR ID:
10283178
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
35
Issue:
8
Page Range or eLocation-ID:
6786-6793
ISSN:
2374-3468
Sponsoring Org:
National Science Foundation
More Like this
  1. We introduce a simple framework for designing private boosting algorithms. We give natural conditions under which these algorithms are differentially private, efficient, and noise-tolerant PAC learners. To demonstrate our framework, we use it to construct noise-tolerant and private PAC learners for large-margin halfspaces whose sample complexity does not depend on the dimension. We give two sample complexity bounds for our large-margin halfspace learner. One bound is based only on differential privacy, and uses this guarantee as an asset for ensuring generalization. This first bound illustrates a general methodology for obtaining PAC learners from privacy, which may be of independent interest.more »The second bound uses standard techniques from the theory of large-margin classification (the fat-shattering dimension) to match the best known sample complexity for differentially private learning of large-margin halfspaces, while additionally tolerating random label noise.« less
  2. Several well-studied models of access to data samples, including statistical queries, local differential privacy and low-communication algorithms rely on queries that provide information about a function of a single sample. (For example, a statistical query (SQ) gives an estimate of $\E_{x\sim D}[q(x)]$ for any choice of the query function $q:X\rightarrow \R$, where $D$ is an unknown data distribution.) Yet some data analysis algorithms rely on properties of functions that depend on multiple samples. Such algorithms would be naturally implemented using $k$-wise queries each of which is specified by a function $q:X^k\rightarrow \R$. Hence it is natural to ask whether algorithmsmore »using $k$-wise queries can solve learning problems more efficiently and by how much. Blum, Kalai, Wasserman~\cite{blum2003noise} showed that for any weak PAC learning problem over a fixed distribution, the complexity of learning with $k$-wise SQs is smaller than the (unary) SQ complexity by a factor of at most $2^k$. We show that for more general problems over distributions the picture is substantially richer. For every $k$, the complexity of distribution-independent PAC learning with $k$-wise queries can be exponentially larger than learning with $(k+1)$-wise queries. We then give two approaches for simulating a $k$-wise query using unary queries. The first approach exploits the structure of the problem that needs to be solved. It generalizes and strengthens (exponentially) the results of Blum \etal \cite{blum2003noise}. It allows us to derive strong lower bounds for learning DNF formulas and stochastic constraint satisfaction problems that hold against algorithms using $k$-wise queries. The second approach exploits the $k$-party communication complexity of the $k$-wise query function.« less
  3. We consider a collaborative PAC learning model, in which k players attempt to learn the same underlying concept. We ask how much more information is required to learn an accurate classifier for all players simultaneously. We refer to the ratio between the sample complexity of collaborative PAC learning and its non-collaborative (single-player) counterpart as the overhead. We design learning algorithms with O(ln(k)) and O(ln2 (k)) overhead in the personalized and centralized variants our model. This gives an exponential improvement upon the naïve algorithm that does not share information among players. We complement our upper bounds with an Ω(ln(k)) overhead lowermore »bound, showing that our results are tight up to a logarithmic factor.« less
  4. In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier from the gathered data. This can give rise to computational and statistical challenges. For example, in most cases there are no known computationally efficient learning algorithms that are robust to the high level of noise that exists in crowdsourced data, and efforts to eliminate noise through voting often require a large number of queries per example. In this paper, we show how bymore »interleaving the process of labeling and learning, we can attain computational efficiency with much less overhead in the labeling cost. In particular, we consider the realizable setting where there exists a true target function in F and consider a pool of labelers. When a noticeable fraction of the labelers are perfect, and the rest behave arbitrarily, we show that any F that can be efficiently learned in the traditional realizable PAC model can be learned in a computationally efficient manner by querying the crowd, despite high amounts of noise in the responses. Moreover, we show that this can be done while each labeler only labels a constant number of examples and the number of labels requested per example, on average, is a constant. When no perfect labelers exist, a related task is to find a set of the labelers which are good but not perfect. We show that we can identify all good labelers, when at least the majority of labelers are good.« less
  5. We consider the PAC learnability of the functions at the nodes of a discrete networked dynamical system, assuming that the underlying network is known. We provide tight bounds on the sample complexity of learning threshold functions. We establish a computational intractability result for efficient PAC learning of such functions. We develop efficient consistent learners when the number of negative examples is small. Using synthetic and real-world networks, we experimentally study how the network structure and sample complexity influence the quality of inference.