- NSF-PAR ID:
- 10057475
- Date Published:
- Journal Name:
- NIPS 2017
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Motivated by an attempt to understand the formation and development of (human) language, we introduce a "distributed compression" problem. In our problem a sequence of pairs of players from a set of K players are chosen and tasked to communicate messages drawn from an unknown distribution Q. Arguably languages are created and evolve to compress frequently occurring messages, and we focus on this aspect. The only knowledge that players have about the distribution Q is from previously drawn samples, but these samples differ from player to player. The only common knowledge between the players is restricted to a common prior distribution P and some constant number of bits of information (such as a learning algorithm). Letting T_eps denote the number of iterations it would take for a typical player to obtain an eps-approximation to Q in total variation distance, we ask whether T_eps iterations suffice to compress the messages down roughly to their entropy and give a partial positive answer. We show that a natural uniform algorithm can compress the communication down to an average cost per message of O(H(Q) + log (D(P || Q) + O(1)) in $\tilde{O}(T_eps)$ iterations while allowing for O(eps)-error, where D(. || .) denotes the KL-divergence between distributions. For large divergences this compares favorably with the static algorithm that ignores all samples and compresses down to H(Q) + D(P || Q) bits, while not requiring (T_eps . K) iterations that it would take players to develop optimal but separate compressions for each pair of players. Along the way we introduce a "data-structural" view of the task of communicating with a natural language and show that our natural algorithm can also be implemented by an efficient data structure, whose storage is comparable to the storage requirements of Q and whose query complexity is comparable to the lengths of the message to be compressed. Our results give a plausible mathematical analogy to the mechanisms by which human languages get created and evolve, and in particular highlights the possibility of coordination towards a joint task (agreeing on a language) while engaging in distributed learning.more » « less
-
With the dramatic growth of data in both amount and scale, distributed machine learning has become an important tool for the massive data to finish the tasks as prediction, classification, etc. However, due to the practical physical constraints and the potential privacy leakage of data, it is infeasible to aggregate raw data from all data owners or the learning purpose. To tackle this problem, the distributed privacy-preserving learning approaches are introduced to learn over all distributed data without exposing the real information. However, existing approaches have limits on the complicated distributed system. On the one hand, traditional privacy-preserving learning approaches rely on heavy cryptographic primitives on training data, in which the learning speed is dramatically slowed down due to the computation overheads. On the other hand, the complicated system architecture becomes a barrier in the practical distributed system. In this paper, we propose an efficient privacy-preserving machine learning scheme for hierarchical distributed systems. We modify and improve the collaborative learning algorithm. The proposed scheme not only reduces the overhead for the learning process but also provides the comprehensive protection for each layer of the hierarchical distributed system. In addition, based on the analysis of the collaborative convergency in different learning groups, we also propose an asynchronous strategy to further improve the learning efficiency of hierarchical distributed system. At the last, extensive experiments on real-world data are implemented to evaluate the privacy, efficacy, and efficiency of our proposed schemes.more » « less
-
We consider the problem of accurately recovering a matrix B of size M by M, which represents a probability distribution over M^2 outcomes, given access to an observed matrix of "counts" generated by taking independent samples from the distribution B. How can structural properties of the underlying matrix B be leveraged to yield computationally efficient and information theoretically optimal reconstruction algorithms? When can accurate reconstruction be accomplished in the sparse data regime? This basic problem lies at the core of a number of questions that are currently being considered by different communities, including building recommendation systems and collaborative filtering in the sparse data regime, community detection in sparse random graphs, learning structured models such as topic models or hidden Markov models, and the efforts from the natural language processing community to compute "word embeddings". Many aspects of this problem---both in terms of learning and property testing/estimation and on both the algorithmic and information theoretic sides---remain open. Our results apply to the setting where B has a low rank structure. For this setting, we propose an efficient (and practically viable) algorithm that accurately recovers the underlying M by M matrix using O(M) samples} (where we assume the rank is a constant). This linear sample complexity is optimal, up to constant factors, in an extremely strong sense: even testing basic properties of the underlying matrix (such as whether it has rank 1 or 2) requires Omega(M) samples. Additionally, we provide an even stronger lower bound showing that distinguishing whether a sequence of observations were drawn from the uniform distribution over M observations versus being generated by a well-conditioned Hidden Markov Model with two hidden states requires Omega(M) observations, while our positive results for recovering B immediately imply that Omega(M) observations suffice to learn such an HMM. This lower bound precludes sublinear-sample hypothesis tests for basic properties, such as identity or uniformity, as well as sublinear sample estimators for quantities such as the entropy rate of HMMs.more » « less
-
We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is motivated by the difficulty of running fully adaptive algorithms in real-world applications (such as medical domains), and we propose to quantify adaptivity using the notion of local switching cost. Our main contribution, Q-Learning with UCB2 exploration, is a model-free algorithm for H-step episodic MDP that achieves sublinear regret whose local switching cost in K episodes is O(H3SA log K), and we provide a lower bound of Ω(HSA) on the local switching cost for any no-regret algorithm. Our algorithm can be naturally adapted to the concurrent setting [13], which yields nontrivial results that improve upon prior work in certain aspects.more » « less
-
Abstract Natural language helps express mathematical thinking and contexts. Conventional mathematical notation (CMN) best suits expressions and equations. Each is essential; each also has limitations, especially for learners. Our research studies how programming can be a advantageous third language that can also help restore mathematical connections that are hidden by topic‐centred curricula. Restoring opportunities for surprise and delight reclaims mathematics' creative nature. Studies of children's use of language in mathematics and their programming behaviours guide our iterative design/redesign of mathematical microworlds in which students, ages 7–11, use programming in their regular school lessons
as a language for learning mathematics . Though driven by mathematics, not coding, the microworlds develop the programming over time so that it continues to support children's developing mathematical ideas. This paper briefly describes microworlds EDC has tested with well over 400 7‐to‐8‐year‐olds in school, and others tested (or about to be tested) with over 200 8‐to‐11‐year‐olds. Our challenge was to satisfy schools' topical orientation and fit easily within regular classroom study but use and foreshadow other mathematical learning to remove the siloes. The design/redesign research and evaluation is exploratory, without formal methodology. We are also more formally studying effects on children's learning. That ongoing study is not reported here.Practitioner notes What is already known
Active learning—
doing —supports learning.Collaborative learning—doing
together —supports learning.Classroom discourse—focused, relevant
discussion , not just listening—supports learning.Clear articulation of one's thinking, even just to oneself, helps develop that thinking.
What this paper adds
The common languages we use for classroom mathematics—natural language for conveying the meaning and context of mathematical situations and for explaining our reasoning; and the formal (written) language of conventional mathematical notation, the symbols we use in mathematical expressions and equations—are both essential but each presents hurdles that necessitate the other. Yet, even together, they are insufficient especially for young learners.
Programming, appropriately designed and used, can be the third language that both reduces barriers and provides the missing expressive and creative capabilities children need.
Appropriate design for use in regular mathematics classrooms requires making key mathematical content obvious, strong and the ‘driver’ of the activities, and requires reducing tech ‘overhead’ to near zero.
Continued usefulness across the grades requires developing children's sophistication and knowledge with the language; the powerful ways that children rapidly acquire facility with (natural) language provides guidance for ways they can learn a formal language as well.
Implications for policy and/or practice
Mathematics teaching can take advantage of the ways children learn through experimentation and attention to the results, and of the ways children use their language brain even for mathematics.
In particular, programming—in microworlds driven by the mathematical content, designed to minimise distraction and overhead, open to exploration and discovery
en route to focused aims, and in which childrenself ‐evaluate—can allow clear articulation of thought, experimentation with immediate feedback.As it aids the mathematics, it also builds computational thinking and satisfies schools' increasing concerns to broaden access to ideas of computer science.