Duality between estimation and optimal control is a problem of rich historical significance. The first duality principle appears in the seminal paper of Kalman-Bucy, where the problem of minimum variance estimation is shown to be dual to a linear quadratic (LQ) optimal control problem. Duality offers a constructive proof technique to derive the Kalman filter equation from the optimal control solution. This paper generalizes the classical duality result of Kalman-Bucy to the nonlinear filter: The state evolves as a continuous-time Markov process and the observation is a nonlinear function of state corrupted by an additive Gaussian noise. A dual process is introduced as a backward stochastic differential equation (BSDE). The process is used to transform the problem of minimum variance estimation into an optimal control problem. Its solution is obtained from an application of the maximum principle, and subsequently used to derive the equation of the nonlinear filter. The classical duality result of Kalman-Bucy is shown to be a special case.
more »
« less
Duality-Based Stochastic Policy Optimization for Estimation with Unknown Noise Covariances
Duality of control and estimation allows mapping recent advances in data-guided control to the estimation setup. This paper formalizes and utilizes such a mapping to consider learning the optimal (steady-state) Kalman gain when process and measurement noise statistics are unknown. Specifically, building on the duality between synthesizing optimal control and estimation gains, the filter design problem is formalized as direct policy learning. In this direction, the duality is used to extend existing theoretical guarantees of direct policy updates for Linear Quadratic Regulator (LQR) to establish global convergence of the Gradient Descent (GD) algorithm for the estimation problem–while addressing subtle differences between the two synthesis problems. Subsequently, a Stochastic Gradient Descent (SGD) approach is adopted to learn the optimal Kalman gain without the knowledge of noise covariances. The results are illustrated via several numerical examples.
more »
« less
- Award ID(s):
- 2149470
- PAR ID:
- 10422993
- Date Published:
- Journal Name:
- Proceedings of the American Control Conference
- ISSN:
- 0743-1619
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Common reinforcement learning methods seek optimal controllers for unknown dynamical systems by searching in the "policy" space directly. A recent line of research, starting with [1], aims to provide theoretical guarantees for such direct policy-update methods by exploring their performance in classical control settings, such as the infinite horizon linear quadratic regulator (LQR) problem. A key property these analyses rely on is that the LQR cost function satisfies the "gradient dominance" property with respect to the policy parameters. Gradient dominance helps guarantee that the optimal controller can be found by running gradient-based algorithms on the LQR cost. The gradient dominance property has so far been verified on a case-by-case basis for several control problems including continuous/discrete time LQR, LQR with decentralized controller, H2/H∞ robust control.In this paper, we make a connection between this line of work and classical convex parameterizations based on linear matrix inequalities (LMIs). Using this, we propose a unified framework for showing that gradient dominance indeed holds for a broad class of control problems, such as continuous- and discrete-time LQR, minimizing the L2 gain, and problems using system-level parameterization. Our unified framework provides insights into the landscape of the cost function as a function of the policy, and enables extending convergence results for policy gradient descent to a much larger class of problems.more » « less
-
We investigate the problem of persistently monitoring a finite set of targets with internal states that evolve with linear stochastic dynamics using a finite set of mobile agents. We approach the problem from the infinite-horizon perspective, looking for periodic movement schedules for the agents. Under linear dynamics and some standard assumptions on the noise distribution, the optimal estimator is a Kalman- Bucy filter. It is shown that when the agents are constrained to move only over a line and they can see at most one target at a time, the optimal movement policy is such that the agent is always either moving with maximum speed or dwelling at a fixed position. Periodic trajectories of this form admit finite parameterization, and we show to compute a stochastic gradient estimate of the performance with respect to the parameters that define the trajectory using Infinitesimal Perturbation Analysis. A gradient-descent scheme is used to compute locally optimal parameters. This approach allows us to deal with a very long persistent monitoring horizon using a small number of parameters.more » « less
-
In this paper, we study the robustness property of policy optimization (particularly Gauss–Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning) subject to noise at each iteration. By invoking the concept of input-to-state stability and utilizing Lyapunov’s direct method, it is shown that, if the noise is sufficiently small, the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration. Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided. Based on Willems’ fundamental lemma, a learning-based policy iteration algorithm is proposed. The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal. The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration. Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.more » « less
-
Distributed feedback design and complexity constrained control are examples of problems posed within the domain of structured optimal feedback synthesis. The optimal feedback gain is typically a non-convex function of system primitives. However, in recent years, algorithms have been proposed to obtain locally optimal solutions. In applications to large-scale distributed control, the major obstacle is computational complexity. This paper addresses complexity through a combination of linear-algebraic techniques and computational methods adapted from both machine learning and reinforcement learning. It is shown that for general classes of optimal control problems, the objective function and its gradient can be computed from data. Transformations borrowed from the theory of reinforcement learning are adapted to obtain simulation-based algorithms for computing the structured optimal H2 feedback gain. Customized proximal algorithms based on gradient descent and incremental gradient are tested in computational experiments and their relative merits are discussed.more » « less
An official website of the United States government

