We investigate the problem of persistently monitoring a finite set of targets with internal states that evolve with linear stochastic dynamics using a finite set of mobile agents. We approach the problem from the infinite-horizon perspective, looking for periodic movement schedules for the agents. Under linear dynamics and some standard assumptions on the noise distribution, the optimal estimator is a Kalman- Bucy filter. It is shown that when the agents are constrained to move only over a line and they can see at most one target at a time, the optimal movement policy is such that the agent is always either moving with maximum speed or dwelling at a fixed position. Periodic trajectories of this form admit finite parameterization, and we show to compute a stochastic gradient estimate of the performance with respect to the parameters that define the trajectory using Infinitesimal Perturbation Analysis. A gradient-descent scheme is used to compute locally optimal parameters. This approach allows us to deal with a very long persistent monitoring horizon using a small number of parameters.
more »
« less
Multi-agent infinite horizon persistent monitoring of targets with uncertain states in multi-dimensional environments
This paper investigates the problem of persistent monitoring, where a finite set of mobile agents persistently visits a finite set of targets in a multi-dimensional environment. The agents must estimate the targets’ internal states and the goal is to minimize the mean squared estimation error over time. The internal states of the targets evolve with linear stochastic dynamics and thus the optimal estimator is a Kalman-Bucy Filter. We constrain the trajectories of the agents to be periodic and represented by a truncated Fourier series. Taking advantage of the periodic nature of this solution, we define the infinite horizon version of the problem and explore the property that the mean estimation squared error converges to a limit cycle. We present a technique to compute online the gradient of the steady state mean estimation error of the targets’ states with respect to the parameters defining the trajectories and use a gradient descent scheme to obtain locally optimal movement schedules. This scheme allows us to address the infinite horizon problem with only a small number of parameters to be optimized.
more »
« less
- PAR ID:
- 10171438
- Date Published:
- Journal Name:
- Proc. IFAC World Congress
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This article investigates a stochastic optimal control problem with linear Gaussian dynamics, quadratic performance measure, but non-Gaussian observations. The linear Gaussian dynamics characterizes a large number of interacting agents evolving under a centralized control and external disturbances. The aggregate state of the agents is only partially known to the centralized controller by means of the samples taken randomly in time and from anonymous randomly selected agents. Due to removal of the agent identity from the samples, the observation set has a non-Gaussian structure, and as a consequence, the optimal control law that minimizes a quadratic cost is essentially nonlinear and infinite-dimensional, for any finite number of agents. For infinitely many agents, however, this paper shows that the optimal control law is the solution to a reduced order, finite-dimensional linear quadratic Gaussian problem with Gaussian observations sampled only in time. For this problem, the separation principle holds and is used to develop an explicit optimal control law by combining a linear quadratic regulator with a separately designed finite-dimensional minimum mean square error state estimator. Conditions are presented under which this simple optimal control law can be adopted as a suboptimal control law for finitely many agents.more » « less
-
This paper studies the issue of data-driven optimal control design for traffic signals of oversaturated urban road networks. The signal control system based on the store and forward model is generally uncontrollable for which the controllable decomposition is needed. Instead of identifying the unknown parameters like saturation rates and turning ratios, a finite number of measured trajectories can be used to parametrize the system and help directly construct a transformation matrix for Kalman controllable decomposition through the fundamental lemma of J. C. Willems. On top of that, an infinite-horizon linear quadratic regulator (LQR) problem is formulated considering the constraints of green times for traffic signals. The problem can be solved through a two-phase data-driven learning process, where one solves an infinite-horizon unconstrained LQR problem and the other solves a finite-horizon constrained LQR problem. The simulation result shows the theoretical analysis is effective and the proposed data-driven controller can yield desired performance for reducing traffic congestion.more » « less
-
We consider the optimal multi-agent persistent monitoring problem defined by a team of cooperating agents visiting a set of nodes (targets) on a graph with the objective of minimizing a measure of overall node state uncertainty. The solution to this problem involves agent trajectories defined both by the sequence of nodes to be visited by each agent and the amount of time spent at each node. We propose a class of distributed threshold-based parametric controllers through which agent transitions from one node to the next are controlled by thresholds on the node uncertainty. The resulting behavior of the agent-target system is described by a hybrid dynamic system. This enables the use of Infinitesimal Perturbation Analysis (IPA) to determine on-line optimal threshold parameters through gradient descent and thus obtain optimal controllers within this family of threshold-based policies. Simulations are included to illustrate our results and compare them to optimal solutions derived through dynamic programming.more » « less
-
We propose a framework for developing wall models for large-eddy simulation that is able to capture pressure-gradient effects using multi-agent reinforcement learning. Within this framework, the distributed reinforcement learning agents receive off-wall environmental states, including pressure gradient and turbulence strain rate, ensuring adaptability to a wide range of flows characterized by pressure-gradient effects and separations. Based on these states, the agents determine an action to adjust the wall eddy viscosity and, consequently, the wall-shear stress. The model training is in situ with wall-modeled large-eddy simulation grid resolutions and does not rely on the instantaneous velocity fields from high-fidelity simulations. Throughout the training, the agents compute rewards from the relative error in the estimated wall-shear stress, which allows them to refine an optimal control policy that minimizes prediction errors. Employing this framework, wall models are trained for two distinct subgrid-scale models using low-Reynolds-number flow over periodic hills. These models are validated through simulations of flows over periodic hills at higher Reynolds numbers and flows over the Boeing Gaussian bump. The developed wall models successfully capture the acceleration and deceleration of wall-bounded turbulent flows under pressure gradients and outperform the equilibrium wall model in predicting skin friction.more » « less
An official website of the United States government

