skip to main content


Title: Teaching a black-box learner
One widely-studied model of teaching calls for a teacher to provide the minimal set of labeled examples that uniquely specifies a target concept. The assumption is that the teacher knows the learner’s hypothesis class, which is often not true of real-life teaching scenarios. We consider the problem of teaching a learner whose representation and hypothesis class are unknown—that is, the learner is a black box. We show that a teacher who does not interact with the learner can do no better than providing random examples. We then prove, however, that with interaction, a teacher can efficiently find a set of teaching examples that is a provably good approximation to the optimal set. As an illustration, we show how this scheme can be used to shrink training sets for any family of classifiers: that is, to find an approximately-minimal subset of training instances that yields the same classifier as the entire set.  more » « less
Award ID(s):
1813160
NSF-PAR ID:
10274231
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
97
ISSN:
2640-3498
Page Range / eLocation ID:
1547-1555
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We call a learner super-teachable if a teacher can trim down an iid training set while making the learner learn even better. We provide sharp super-teaching guarantees on two learners: the maximum likelihood estimator for the mean of a Gaussian, and the large margin classifier in 1D. For general learners, we provide a mixed-integer nonlinear programming-based algorithm to find a super teaching set. Empirical experiments show that our algorithm is able to find good super-teaching sets for both regression and classification problems. 
    more » « less
  2. We study the problem of active learning with the added twist that the learner is assisted by a helpful teacher. We consider the following natural interaction protocol: At each round, the learner proposes a query asking for the label of an instance xq, the teacher provides the requested label {xq,yq} along with explanatory information to guide the learning process. In this paper, we view this information in the form of an additional contrastive example ({xc,yc}) where xc is picked from a set constrained by xq (e.g., dissimilar instances with the same label). Our focus is to design a teaching algorithm that can provide an informative sequence of contrastive examples to the learner to speed up the learning process. We show that this leads to a challenging sequence optimization problem where the algorithm's choices at a given round depend on the history of interactions. We investigate an efficient teaching algorithm that adaptively picks these contrastive examples. We derive strong performance guarantees for our algorithm based on two problem-dependent parameters and further show that for specific types of active learners (e.g., a generalized binary search learner), the proposed teaching algorithm exhibits strong approximation guarantees. Finally, we illustrate our bounds and demonstrate the effectiveness of our teaching framework via two numerical case studies. 
    more » « less
  3. A private learner is trained on a sample of labeled points and generates a hypothesis that can be used for predicting the labels of newly sampled points while protecting the privacy of the training set [Kasiviswannathan et al., FOCS 2008]. Past research uncovered that private learners may need to exhibit significantly higher sample complexity than non-private learners as is the case of learning of one-dimensional threshold functions [Bun et al., FOCS 2015, Alon et al., STOC 2019]. We explore prediction as an alternative to learning. A predictor answers a stream of classification queries instead of outputting a hypothesis. Earlier work has considered a private prediction model with a single classification query [Dwork and Feldman, COLT 2018]. We observe that when answering a stream of queries, a predictor must modify the hypothesis it uses over time, and in a manner that cannot rely solely on the training set. We introduce private everlasting prediction taking into account the privacy of both the training set and the (adaptively chosen) queries made to the predictor. We then present a generic construction of private everlasting predictors in the PAC model. The sample complexity of the initial training sample in our construction is quadratic (up to polylog factors) in the VC dimension of the concept class. Our construction allows prediction for all concept classes with finite VC dimension, and in particular threshold functions over infinite domains, for which (traditional) private learning is known to be impossible. 
    more » « less
  4. In the 21st Century, it becomes of utmost importance for the educator and learner to be mindful of the evolution and application of factors that govern the mental state. Many studies revealed that the success of a professional is strongly dependent on their emotion management skills to manage themselves and associated responsibilities in a demanding environment. Emotionally intelligent professionals are also able to handle challenging situations involving other people. These days many industries, research establishments, and universities that hire graduate students conduct specialized training to enhance their soft skills, mainly interpersonal skills, to make their employees perform at their highest potential. One can maximize the gain from soft skills if they are well aware of the state of human psychology developed in the form of emotional intelligence and positive intelligence. In the last two decades, the concept of emotional intelligence was created by professional personality coaching groups. These trainings are heavily attended by professionals engaged in marketing and organization leaders to enhance their capability in the workplace. However, emotional intelligence is mainly about being aware of the mental state and maintaining control of one's actions during various mental states, such as anger, happiness, sadness, remorse, etc. Aspiring graduate students in science and technology generally lack formal training in understanding human behavior and traits that can adversely impact their ability to perform and innovate at the highest level. This paper focuses on training graduate students about the popular and practical transactional analysis science and assessing their competence in utilizing this knowledge to decipher their own and other people's behavior. Transactional analysis was taught to students via Student presentation-based effective teaching (SPET) methodology. Under this approach, graduate students enrolled in the MECH 500 Class were provided a set of questions to answer by self-reading of the recommended textbook "I am OK You are OK by Thomas Harris." Each student individually answered the assignment questions and then worked in the group to prepare a group presentation for the in-class discussion. Three group discussions were conducted to present different views about the four types of transactions and underlying human traits. Before transactional analysis training, students were also trained in Positive intelligence psychology tools for a similar objective. After the discussion, students were surveyed about the depth of their understanding. Students also reflected their views on the utility of transactional analysis with respect to positive intelligence. More than 75% of students mention that they gain high competency in understanding, defining, and utilizing transactional analysis. This study presents insights for positively impacting graduate students' mindsets as they pursue an unpredicted course of research that can sometimes become very challenging. 
    more » « less
  5. Belkin, M. ; Kpotufe, S. (Ed.)
    We study the problem of robust learning under clean-label data-poisoning attacks, where the at-tacker injects (an arbitrary set of) correctly-labeled examples to the training set to fool the algorithm into making mistakes on specific test instances at test time. The learning goal is to minimize the attackable rate (the probability mass of attackable test instances), which is more difficult than optimal PAC learning. As we show, any robust algorithm with diminishing attackable rate can achieve the optimal dependence on ε in its PAC sample complexity, i.e., O(1/ε). On the other hand, the attackable rate might be large even for some optimal PAC learners, e.g., SVM for linear classifiers. Furthermore, we show that the class of linear hypotheses is not robustly learnable when the data distribution has zero margin and is robustly learnable in the case of positive margin but requires sample complexity exponential in the dimension. For a general hypothesis class with bounded VC dimension, if the attacker is limited to add at most t >0 poison examples, the optimal robust learning sample complexity grows almost linearly with t. 
    more » « less