skip to main content


Title: Toward Stable Asymptotic Learning with Simulated Learners
Simulations of human learning have shown potential for supporting ITS authoring and testing, in addition to other use cases. To date, simulated learner technologies have often failed to robustly achieve perfect performance with considerable training. In this work we identify an impediment to producing perfect asymptotic learning performance in simulated learners and introduce one significant improvement to the Apprentice Learner Framework to this end.  more » « less
Award ID(s):
1824257
NSF-PAR ID:
10277329
Author(s) / Creator(s):
Editor(s):
Roll I., McNamara D.
Date Published:
Journal Name:
Artificial Intelligence in Education. AIED 2021. Lecture Notes in Computer Science
Volume:
12749
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work entails a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures of performance. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions. 
    more » « less
  2. As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer's target policy. This work entails a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent's action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures of performance. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions. 
    more » « less
  3. This paper presents a practical approach for identifying unknown mechanical parameters, such as mass and friction models of manipulated rigid objects or actuated robotic links, in a succinct manner that aims to improve the performance of policy search algorithms. Key features of this approach are the use of off-the-shelf physics engines and the adaptation of a black-box Bayesian optimization framework for this purpose. The physics engine is used to reproduce in simulation experiments that are performed on a real robot, and the mechanical parameters of the simulated system are automatically fine-tuned so that the simulated trajectories match with the real ones. The optimized model is then used for learning a policy in simulation, before safely deploying it on the real robot. Given the well-known limitations of physics engines in modeling real-world objects, it is generally not possible to find a mechanical model that reproduces in simulation the real trajectories exactly. Moreover, there are many scenarios where a near-optimal policy can be found without having a perfect knowledge of the system. Therefore, searching for a perfect model may not be worth the computational effort in practice. The proposed approach aims then to identify a model that is good enough to approximate the value of a locally optimal policy with a certain confidence, instead of spending all the computational resources on searching for the most accurate model. Empirical evaluations, performed in simulation and on a real robotic manipulation task, show that model identification via physics engines can significantly boost the performance of policy search algorithms that are popular in robotics, such as TRPO, PoWER and PILCO, with no additional real-world data. 
    more » « less
  4. Educational data mining research has demonstrated that the large volume of learning data collected by modern e-learning systems could be used to recognize student behavior patterns and group students into cohorts with similar behavior. However, few attempts have been done to connect and compare behavioral patterns with known dimensions of individual differences. To what extent learner behavior is defined by known individual differences? Which of them could be a better predictor of learner engagement and performance? Could we use behavior patterns to build a data-driven model of individual differences that could be more useful for predicting critical outcomes of the learning process than traditional models? Our paper attempts to answer these questions using a large volume of learner data collected in an online practice system. We apply a sequential pattern mining approach to build individual models of learner practice behavior and reveal latent student subgroups that exhibit considerably different practice behavior. Using these models we explored the connections between learner behavior and both, the incoming and outgoing parameters of the learning process. Among incoming parameters we examined traditionally collected individual differences such as self-esteem, gender, and knowledge monitoring skills. We also attempted to bridge the gap between cluster-based behavior pattern models and traditional scale-based models of individual differences by quantifying learner behavior on a latent data-driven scale. Our research shows that this data-driven model of individual differences performs significantly better than traditional models of individual differences in predicting important parameters of the learning process, such as performance and engagement. 
    more » « less
  5. Simulated learners represent computational theories of human learning that can be used to evaluate educational technologies, provide practice opportunities for teachers, and advance our theoretical understanding of human learning. A key challenge in working with simulated learners is evaluating the accuracy of the simulation compared to the behavior of real human students. One way this evaluation is done is by comparing the error-rate learning curves from a population of human learners and a corresponding set of simulated learners. In this paper, we argue that this approach misses an opportunity to more accurately capture nuances in learning by treating all errors as the same. We present a simulated learner system, the Apprentice Learner (AL) Architecture, and use this more nuanced evaluation to demonstrate ways in which it does and does not explain and accurately predict student learning in terms of the reduction of different kinds of errors over time as it learns, as human students do, from an Intelligent Tutoring System (ITS). 
    more » « less