skip to main content


Title: Adaptive Metareasoning for Bounded Rational Agents
In computational approaches to bounded rationality, metareasoning enables intelligent agents to optimize their own decision-making process in order to produce effective action in a timely manner. While there have been substantial efforts to develop effective meta-level control for anytime algorithms, existing techniques rely on extensive offline work, imposing several critical assumptions that diminish their effectiveness and limit their practical utility in the real world. In order to eliminate these assumptions, adaptive metareasoning enables intelligent agents to adapt to each individual instance of the problem at hand without the need for significant offline preprocessing. Building on our recent work, we first introduce a model-free approach to meta-level control based on reinforcement learning. We then present a meta-level control technique that uses temporal difference learning. Finally, we show empirically that our approach is effective on a common benchmark in meta-level control.  more » « less
Award ID(s):
1813490
NSF-PAR ID:
10097685
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IJCAI-ECAI Workshop on Architectures and Evaluation for Generality, Autonomy and Progress in AI (AEGAP)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Anytime algorithms enable intelligent systems to trade computation time with solution quality. To exploit this crucial ability in real-time decision-making, the system must decide when to interrupt the anytime algorithm and act on the current solution. Existing meta-level control techniques, however, address this problem by relying on significant offline work that diminishes their practical utility and accuracy. We formally introduce an online performance prediction framework that enables meta-level control to adapt to each instance of a problem without any preprocessing. Using this framework, we then present a meta-level control technique and two stopping conditions. Finally, we show that our approach outperforms existing techniques that require substantial offline work. The result is efficient nonmyopic meta-level control that reduces the overhead and increases the benefits of using anytime algorithms in intelligent systems.

     
    more » « less
  2. null (Ed.)
    With increase in the frequency of natural disasters such as hurricanes that disrupt the supply from the grid, there is a greater need for resiliency in electric supply. Rooftop solar photovoltaic (PV) panels along with batteries can provide resiliency to a house in a blackout due to a natural disaster. Our previous work showed that intelligence can reduce the size of a PV+battery system for the same level of post-blackout service compared to a conventional system that does not employ intelligent control. The intelligent controller proposed is based on model predictive control (MPC), which has two main challenges. One, it requires simple yet accurate models as it involves real-time optimization. Two, the discrete actuation for residential loads (on/off) makes the underlying optimization problem a mixed-integer program (MIP) which is challenging to solve. An attractive alternative to MPC is reinforcement learning (RL) as the real-time control computation is both model-free and simple. These points of interest accompany certain trade-offs; RL requires computationally expensive offline learning, and its performance is sensitive to various design choices. In this work, we propose an RL-based controller. We compare its performance with the MPC controller proposed in our prior work and a non-intelligent baseline controller. The RL controller is found to provide a resiliency performance — by commanding critical loads and batteries—similar to MPC with a significant reduction in computational effort. 
    more » « less
  3. Pedagogical planners can provide adaptive support to students in narrative-centered learning environments by dynamically scaffolding student learning and tailoring problem scenarios. Reinforcement learning (RL) is frequently used for pedagogical planning in narrative-centered learning environments. However, RL-based pedagogical planning raises significant challenges due to the scarcity of data for training RL policies. Most prior work has relied on limited-size datasets and offline RL techniques for policy learning. Unfortunately, offline RL techniques do not support on-demand exploration and evaluation, which can adversely impact the quality of induced policies. To address the limitation of data scarcity and offline RL, we propose INSIGHT, an online RL framework for training data-driven pedagogical policies that optimize student learning in narrative-centered learning environments. The INSIGHT framework consists of three components: a narrative-centered learning environment simulator, a simulated student agent, and an RL-based pedagogical planner agent, which uses a reward metric that is associated with effective student learning processes. The framework enables the generation of synthetic data for on-demand exploration and evaluation of RL-based pedagogical planning. We have implemented INSIGHT with OpenAI Gym for a narrative-centered learning environment testbed with rule-based simulated student agents and a deep Q-learning-based pedagogical planner. Our results show that online deep RL algorithms can induce near-optimal pedagogical policies in the INSIGHT framework, while offline deep RL algorithms only find suboptimal policies even with large amounts of data.

     
    more » « less
  4. A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks. 
    more » « less
  5. This paper proposes an intelligent multi-agent approach in a real-time strategy game, StarCraft, based on the deep deterministic policy gradients (DDPG) techniques. An actor and a critic network are established to estimate the optimal control actions and corresponding value functions, respectively. A special reward function is designed based on the agents' own condition and enemies' information to help agents make intelligent control in the game. Furthermore, in order to accelerate the learning process, the transfer learning techniques are integrated into the training process. Specifically, the agents are trained initially in a simple task to learn the basic concept for the combat, such as detouring moving, avoiding and joining attacking. Then, we transfer this experience to the target task with a complex and difficult scenario. From the experiment, it is shown that our proposed algorithm with transfer learning can achieve better performance. 
    more » « less