NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Curriculum Design for Machine Learners in Sequential Decision Tasks

https://doi.org/10.1109/TETCI.2018.2829980

Peng, Bei; MacGlashan, James; Loftin, Robert; Littman, Michael L.; Roberts, David L.; Taylor, Matthew E. (August 2018, IEEE Transactions on Emerging Topics in Computational Intelligence)

Full Text Available
Effectively Learning from Pedagogical Demonstrations

Ho, Mark K; Littman, Michael L.; Cushman, Fiery; Austerweil, Joseph L. (January 2018, Proceedings of the Annual Conference of the Cognitive Science Society)

When observing others’ behavior, people use Theory of Mind to infer unobservable beliefs, desires, and intentions. And when showing what activity one is doing, people will modify their behavior in order to facilitate more accurate interpretation and learning by an observer. Here, we present a novel model of how demonstrators act and observers interpret demonstrations corresponding to different levels of recursive social reasoning (i.e. a cognitive hierarchy) grounded in Theory of Mind. Our model can explain how demonstrators show others how to perform a task and makes predictions about how sophisticated observers can reason about communicative intentions. Additionally, we report an experiment that tests (1) how well an observer can learn from demonstrations that were produced with the intent to communicate, and (2) how an observer’s interpretation of demonstrations influences their judgments.
more » « less
Full Text Available
Social is special: A normative framework for teaching with and learning from evaluative feedback

https://doi.org/10.1016/j.cognition.2017.03.006

Ho, Mark K.; MacGlashan, James; Littman, Michael L.; Cushman, Fiery (October 2017, Cognition)

Full Text Available
Interactive Learning from Policy-Dependent Human Feedback

MacGlashan, James; K Ho, Mark; Loftin, Robert; Peng, Bei; Wang, Guan; Roberts, David L.; Taylor, Matthew E.; Littman, Michael L. (July 2017, ICML)

This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner’s current policy. We present empirical results that show this assumption to be false—whether human trainers give a positive or negative feedback for a decision is influenced by the learner’s current policy. Based on this insight, we introduce Convergent Actor-Critic by Humans (COACH), an algorithm for learning from policy-dependent feedback that converges to a local optimum. Finally, we demonstrate that COACH can successfully learn multiple behaviors on a physical robot.
more » « less
Full Text Available
Environment-Independent Task Specifications via GLTL

Littman, Michael L.; Topcu, Ufuk; Fu, Jie; Isbell, Charles; Wen, Min; MacGlashan, James (April 2017, arXiv.org)

We propose a new task-specification language for Markov decision processes that is designed to be an improvement over reward functions by being environment independent. The language is a variant of Linear Temporal Logic (LTL) that is extended to probabilistic specifications in a way that permits approximations to be learned in finite time. We provide several small environments that demonstrate the advantages of our geometric LTL (GLTL) language and illustrate how it can be used to specify standard reinforcement-learning tasks straightforwardly.
more » « less
Full Text Available
Curriculum Design for Machine Learners in Sequential Decision Tasks

Peng, Bei; MacGlashan, James; Loftin, Robert; Littman, Michael L.; Roberts, David L.; Taylor, Matthew E. (April 2017, AAMAS)

Existing machine-learning work has shown that algorithms can bene t from curricula---learning fi rst on simple examples before moving to more difficult examples. While most existing work on curriculum learning focuses on developing automatic methods to iteratively select training examples with increasing difficulty tailored to the current ability of the learner, relatively little attention has been paid to the ways in which humans design curricula. We argue that a better understanding of the human-designed curricula could give us insights into the development of new machine-learning algorithms and interfaces that can better accommodate machine- or human-created curricula. Our work addresses this emerging and vital area empirically, taking an important step to characterize the nature of human-designed curricula relative to the space of possible curricula and the performance benefits that may (or may not) occur.
more » « less
Full Text Available
Initial Progress Toward Development of a Voice-Based Computer-Delivered Motivational Intervention for Heavy Drinking College Students: An Experimental Study

https://doi.org/10.2196/mental.7571

Kahler, Christopher W; Lechner, William J; MacGlashan, James; Wray, Tyler B; Littman, Michael L (January 2017, JMIR Mental Health)

Full Text Available
Teaching by Intervention: Working Backwards, Undoing Mistakes, or Correcting Mistakes?

Ho, M. K.l; Austerweil, J. L. (January 2017, Proceedings of the Cognitive Science Conference)

When teaching, people often intentionally intervene on a learner while it is acting. For instance, a dog owner might move the dog so it eats out of the right bowl, or a coach might intervene while a tennis player is practicing to teach a skill. How do people teach by intervention? And how do these strategies interact with learning mechanisms? Here, we examine one global and two local strategies: working backwards from the end-goal of a task (backwards chaining), placing a learner in a previous state when an incorrect action was taken (undoing), or placing a learner in the state they would be in if they had taken the correct action (correcting). Depending on how the learner interprets an intervention, different teaching strategies result in better learning. We also examine how people teach by intervention in an interactive experiment and find a bias for using local strategies like undoing.
more » « less
Full Text Available
Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning

https://doi.org/10.1007/s10458-015-9283-7

Loftin, Robert; Peng, Bei; MacGlashan, James; Littman, Michael L.; Taylor, Matthew E.; Huang, Jeff; Roberts, David L. (January 2016, Autonomous Agents and Multi-Agent Systems)

Full Text Available
A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans

Peng, Bei; MacGlashan, James; Loftin, Robert; Littman, Michael L.; Roberts, David L.; Taylor, Matthew E. (January 2016, AAMAS)

As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work entails a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures of performance. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.
more » « less
Full Text Available

« Prev Next »

Search for: All records