skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Surveillance Evasion Through Bayesian Reinforcement Learning
We consider a task of surveillance-evading path-planning in a continuous setting. An Evader strives to escape from a 2D domain while minimizing the risk of detection (and immediate capture). The probability of detection is path-dependent and determined by the spatially inhomogeneous surveillance intensity, which is fixed but a priori unknown and gradually learned in the multi-episodic setting. We introduce a Bayesian reinforcement learning algorithm that relies on a Gaussian Process regression (to model the surveillance intensity function based on the information from prior episodes), numerical methods for Hamilton-Jacobi PDEs (to plan the best continuous trajectories based on the current model), and Confidence Bounds (to balance the exploration vs exploitation). We use numerical experiments and regret metrics to highlight the significant advantages of our approach compared to traditional graph-based algorithms of reinforcement learning.  more » « less
Award ID(s):
2111522 1645643
PAR ID:
10482593
Author(s) / Creator(s):
; ;
Editor(s):
Ruiz, Francisco; Dy, Jennifer; van de Meent, Jan-Willem
Publisher / Repository:
PMLR
Date Published:
Journal Name:
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
Volume:
206
Page Range / eLocation ID:
8448--8462
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper is devoted to the detection of contingencies in modern power systems. Because the systems we consider are under the framework of cyber-physical systems, it is necessary to take into consideration of the information processing aspect and communication networks. A consequence is that noise and random disturbances are unavoidable. The detection problem then becomes one known as quickest detection. In contrast to running the detection problem in a discretetime setting leading to a sequence of detection problems, this work focuses on the problem in a continuous-time setup. We treat stochastic differential equation models. One of the distinct features is that the systems are hybrid involving both continuous states and discrete events that coexist and interact. The discrete event process is modeled by a continuous-time Markov chain representing random environments that are not resented by a continuous sample path. The quickest detection then can be written as an optimal stopping problem. This paper is devoted to finding numerical solutions to the underlying problem. We use a Markov chain approximation method to construct the numerical algorithms. Numerical examples are used to demonstrate the performance. 
    more » « less
  2. We propose a model-based lifelong reinforcement-learning approach that estimates a hierarchical Bayesian posterior distilling the common structure shared across different tasks. The learned posterior combined with a sample-based Bayesian exploration procedure increases the sample efficiency of learning across a family of related tasks. We first derive an analysis of the relationship between the sample complexity and the initialization quality of the posterior in the finite MDP setting. We next scale the approach to continuous-state domains by introducing a Variational Bayesian Lifelong Reinforcement Learning algorithm that can be combined with recent model-based deep RL methods, and that exhibits backward transfer. Experimental results on several challenging domains show that our algorithms achieve both better forward and backward transfer performance than state-of-the-art lifelong RL methods 
    more » « less
  3. In Video Analytics Pipelines (VAP), Analytics Units (AUs) such as object detection and face recognition operating on remote servers rely heavily on surveillance cameras to capture high-quality video streams to achieve high accuracy. Modern network cameras offer an array of parameters that directly influence video quality. While a few of such parameters, e.g., exposure, focus and white balance, are automatically adjusted by the camera internally, the others are not. We denote such camera parameters as non-automated (NAUTO) parameters. In this work, we first show that in a typical surveillance camera deployment, environmental condition changes can have significant adverse effect on the accuracy of insights from the AUs, but such adverse impact can potentially be mitigated by dynamically adjusting NAUTO camera parameters in response to changes in environmental conditions. Second, since most end-users lack the skill or understanding to appropriately configure these parameters and typically use a fixed parameter setting, we present CAMTUNER, to our knowledge, the first framework that dynamically adapts NAUTO camera parameters to optimize the accuracy of AUs in a VAP in response to adverse changes in environmental conditions. CAMTUNER is based on SARSA reinforcement learning and it incorporates two novel components: a light-weight analytics quality estimator and a virtual camera that drastically speed up offline RL training. Our controlled experiments and real-world VAP deployment show that compared to a VAP using the default camera setting, CAMTUNER enhances VAP accuracy by detecting 15.9% additional persons and 2.6%–4.2% additional cars (without any false positives) in a large enterprise parking lot. CAMTUNER opens up new avenues for elevating video analytics accuracy, transcending mere incremental enhancements achieved through refining deep-learning models. 
    more » « less
  4. null (Ed.)
    The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment. This setting will be an increasingly more important paradigm for real-world applications of reinforcement learning such as robotics, in which data collection is slow and potentially dangerous. Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions. This leads to the challenge of constraining the policy to select actions within the support of the dataset during training. We propose to simply learn the Policy in the Latent Action Space (PLAS) such that this requirement is naturally satisfied. We evaluate our method on continuous control benchmarks in simulation and a deformable object manipulation task with a physical robot. We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets, outperforming existing offline reinforcement learning methods with explicit constraints. 
    more » « less
  5. We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor in (0, 1), given an arbitrary sample path with “covering time” L, we establish that the algorithm is guaranteed to output an "-accurate estimate of the optimal Q-function nearly optimal sample complexity. 
    more » « less