Anytime algorithms enable intelligent systems to trade computation time with solution quality. To exploit this crucial ability in real-time decision-making, the system must decide when to interrupt the anytime algorithm and act on the current solution. Existing meta-level control techniques, however, address this problem by relying on significant offline work that diminishes their practical utility and accuracy. We formally introduce an online performance prediction framework that enables meta-level control to adapt to each instance of a problem without any preprocessing. Using this framework, we then present a meta-level control technique and two stopping conditions. Finally, we show that our approach outperforms existing techniques that require substantial offline work. The result is efficient nonmyopic meta-level control that reduces the overhead and increases the benefits of using anytime algorithms in intelligent systems.
more » « less- Award ID(s):
- 1813490
- PAR ID:
- 10097681
- Date Published:
- Journal Name:
- Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
- Page Range / eLocation ID:
- 1499 to 1505
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
In computational approaches to bounded rationality, metareasoning enables intelligent agents to optimize their own decision-making process in order to produce effective action in a timely manner. While there have been substantial efforts to develop effective meta-level control for anytime algorithms, existing techniques rely on extensive offline work, imposing several critical assumptions that diminish their effectiveness and limit their practical utility in the real world. In order to eliminate these assumptions, adaptive metareasoning enables intelligent agents to adapt to each individual instance of the problem at hand without the need for significant offline preprocessing. Building on our recent work, we first introduce a model-free approach to meta-level control based on reinforcement learning. We then present a meta-level control technique that uses temporal difference learning. Finally, we show empirically that our approach is effective on a common benchmark in meta-level control.more » « less
-
Vehicle routing problems (VRPs) can be divided into two major categories: offline VRPs, which consider a given set of trip requests to be served, and online VRPs, which consider requests as they arrive in real-time. Based on discussions with public transit agencies, we identify a real-world problem that is not addressed by existing formulations: booking trips with flexible pickup windows (e.g., 3 hours) in advance (e.g., the day before) and confirming tight pickup windows (e.g., 30 minutes) at the time of booking. Such a service model is often required in paratransit service settings, where passengers typically book trips for the next day over the phone. To address this gap between offline and online problems, we introduce a novel formulation, the offline vehicle routing problem with online bookings. This problem is very challenging computationally since it faces the complexity of considering large sets of requests—similar to offline VRPs—but must abide by strict constraints on running time—similar to online VRPs. To solve this problem, we propose a novel computational approach, which combines an anytime algorithm with a learning-based policy for real-time decisions. Based on a paratransit dataset obtained from our partner transit agency, we demonstrate that our novel formulation and computational approach lead to significantly better outcomes in this service setting than existing algorithms.more » « less
-
Anytime heuristic search algorithms try to find a (potentially suboptimal) solution as quickly as possible and then work to find better and better solutions until an optimal solution is obtained or time is exhausted. The most widely-known anytime search algorithms are based on best-first search. In this paper, we propose a new algorithm, rectangle search, that is instead based on beam search, a variant of breadth-first search. It repeatedly explores alternatives at all depth levels and is thus best-suited to problems featuring deep local minima. Experiments using a variety of popular search benchmarks suggest that rectangle search is competitive with fixed-width beam search and often performs better than the previous best anytime search algorithms.more » « less
-
We propose a novel belief space planning technique for continuous dynamics by viewing the belief system as a hybrid dynamical system with time-driven switching. Our approach is based on the perturbation theory of differential equations and extends sequential action control to stochastic dynamics. The resulting algorithm, which we name SACBP, does not require discretization of spaces or time and synthesizes control signals in near real-time. SACBP is an anytime algorithm that can handle general parametric Bayesian filters under certain assumptions. We demonstrate the effectiveness of our approach in an active sensing scenario and a model-based Bayesian reinforcement learning problem. In these challenging problems, we show that the algorithm significantly outperforms other existing solution techniques including approximate dynamic programming and local trajectory optimization.
-
The predictive monitoring problem asks whether a deployed system is likely to fail over the next T seconds under some environmental conditions. This problem is of the utmost importance for cyber-physical systems, and has inspired real-time architectures capable of adapting to such failures upon forewarning. In this paper, we present a linear model-predictive scheme for the real-time monitoring of linear systems governed by time-triggered controllers and time-varying disturbances. The scheme uses a combination of offline (advance) and online computations to decide if a given plant model has entered a state from which no matter what control is applied, the disturbance has a strategy to drive the system to an unsafe region. Our approach is independent of the control strategy used: this allows us to deal with plants that are controlled using model-predictive control techniques or even opaque machine-learning based control algorithms that are hard to reason with using existing reachable set estimation algorithms. Our online computation reuses the symbolic reachable sets computed offline. The real-time monitor instantiates the reachable set with a concrete state estimate, and repeatedly performs emptiness checks with respect to a safety property. We classify the various alarms raised by our approach in terms of what they imply about the system as a whole. We implement our real-time monitoring approach over numerous linear system benchmarks and show that the computation can be performed rapidly in practice. Furthermore, we also examine the alarms reported by our approach and show how some of the alarms can be used to improve the controller.more » « less