skip to main content


This content will become publicly available on June 6, 2025

Title: A Neural Network Approach for Stochastic Optimal Control
We present a neural network approach for approximating the value function of high- dimensional stochastic control problems. Our training process simultaneously updates our value function estimate and identifies the part of the state space likely to be visited by optimal trajectories. Our approach leverages insights from optimal control theory and the fundamental relation between semi-linear parabolic partial differential equations and forward-backward stochastic differential equations. To focus the sampling on relevant states during neural network training, we use the stochastic Pontryagin maximum principle (PMP) to obtain the optimal controls for the current value function estimate. By design, our approach coincides with the method of characteristics for the non-viscous Hamilton-Jacobi-Bellman equation arising in deterministic control problems. Our training loss consists of a weighted sum of the objective functional of the control problem and penalty terms that enforce the HJB equations along the sampled trajectories. Importantly, training is unsupervised in that it does not require solutions of the control problem. Our numerical experiments highlight our scheme’s ability to identify the relevant parts of the state space and produce meaningful value estimates. Using a two-dimensional model problem, we demonstrate the importance of the stochastic PMP to inform the sampling and compare to a finite element approach. With a nonlinear control affine quadcopter example, we illustrate that our approach can handle complicated dynamics. For a 100-dimensional benchmark problem, we demonstrate that our approach improves accuracy and time-to-solution and, via a modification, we show the wider applicability of our scheme.  more » « less
Award ID(s):
1751636 2038118
NSF-PAR ID:
10512669
Author(s) / Creator(s):
; ;
Publisher / Repository:
SIAM
Date Published:
Journal Name:
SIAM journal on scientific computing
ISSN:
1064-8275
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We propose a neural network approach that yields approximate solutions for high-dimensional optimal control problems and demonstrate its effectiveness using examples from multi-agent path finding. Our approach yields controls in a feedback form, where the policy function is given by a neural network (NN). Specifically, we fuse the Hamilton-Jacobi-Bellman (HJB) and Pontryagin Maximum Principle (PMP) approaches by parameterizing the value function with an NN. Our approach enables us to obtain approximately optimal controls in real-time without having to solve an optimization problem. Once the policy function is trained, generating a control at a given space-time location takes milliseconds; in contrast, efficient nonlinear programming methods typically perform the same task in seconds. We train the NN offline using the objective function of the control problem and penalty terms that enforce the HJB equations. Therefore, our training algorithm does not involve data generated by another algorithm. By training on a distribution of initial states, we ensure the controls' optimality on a large portion of the state-space. Our grid-free approach scales efficiently to dimensions where grids become impractical or infeasible. We apply our approach to several multi-agent collision-avoidance problems in up to 150 dimensions. Furthermore, we empirically observe that the number of parameters in our approach scales linearly with the dimension of the control problem, thereby mitigating the curse of dimensionality. 
    more » « less
  2. null (Ed.)
    We propose a neural network approach for solving high-dimensional optimal control problems. In particular, we focus on multi-agent control problems with obstacle and collision avoidance. These problems immediately become high-dimensional, even for moderate phase-space dimensions per agent. Our approach fuses the Pontryagin Maximum Principle and Hamilton-Jacobi-Bellman (HJB) approaches and parameterizes the value function with a neural network. Our approach yields controls in a feedback form for quick calculation and robustness to moderate disturbances to the system. We train our model using the objective function and optimality conditions of the control problem. Therefore, our training algorithm neither involves a data generation phase nor solutions from another algorithm. Our model uses empirically effective HJB penalizers for efficient training. By training on a distribution of initial states, we ensure the controls' optimality is achieved on a large portion of the state-space. Our approach is grid-free and scales efficiently to dimensions where grids become impractical or infeasible. We demonstrate our approach's effectiveness on a 150-dimensional multi-agent problem with obstacles. 
    more » « less
  3. We present a neural network approach for closed-loop deep brain stimulation (DBS). We cast the problem of finding an optimal neurostimulation strategy as a control problem. In this setting, control policies aim to optimize therapeutic outcomes by tailoring the parameters of a DBS system, typically via electrical stimulation, in real time based on the patient’s ongoing neuronal activity. We approximate the value function offline using a neural network to enable generating controls (stimuli) in real time via the feedback form. The neuronal activity is characterized by a nonlinear, stiff system of differential equations as dictated by the Hodgkin-Huxley model. Our training process leverages the relationship between Pontryagin’s maximum principle and Hamilton-Jacobi-Bellman equations to update the value function estimates simultaneously. Our numerical experiments illustrate the accuracy of our approach for out-of-distribution samples and the robustness to moderate shocks and disturbances in the system. 
    more » « less
  4. Many processes in nature such as conformal changes in biomolecules and clusters of interacting particles, genetic switches, mechanical or electromechanical oscillators with added noise, and many others are modeled using stochastic differential equations with small white noise. The study of rare transitions between metastable states in such systems is of great interest and importance. The direct simulation of rare transitions is difficult due to long waiting times. Transition path theory is a mathematical framework for the quantitative description of rare events. Its crucial component is the committor function, the solution to a boundary value problem for the backward Kolmogorov equation. The key fact exploited in this work is that the optimal controller constructed from the committor leads to the generation of transition trajectories exclusively. We prove this fact for a broad class of stochastic differential equations. Moreover, we demonstrate that the committor computed for a dimensionally reduced system and then lifted to the original phase space still allows us to construct an effective controller and estimate the transition rate with reasonable accuracy. Furthermore, we propose an all-the- way-through scheme for computing the committor via neural networks, sampling the transition trajectories, and estimating the transition rate without meshing the space. We apply the proposed methodology to four test problems: the overdamped Langevin dynamics with Mueller’s potential and the rugged Mueller potential in 10D, the noisy bistable Duffing oscillator, and Lennard-Jones-7 in 2D. 
    more » « less
  5. Safe control designs for robotic systems remain challenging because of the difficulties of explicitly solving optimal control with nonlinear dynamics perturbed by stochastic noise. However, recent technological advances in computing devices enable online optimization or sampling-based methods to solve control problems. For example, Control Barrier Functions (CBFs) have been proposed to numerically solve convex optimization problems that ensure the control input to stay in the safe set. Model Predictive Path Integral (MPPI) control uses forward sampling of stochastic differential equations to solve optimal control problems online. Both control algorithms are widely used for nonlinear systems because they avoid calculating the derivatives of the nonlinear dynamic functions. In this paper, we use Stochastic Control Barrier Functions (SCBFs) constraints to limit sample regions in the samplingbased algorithm, ensuring safety in a probabilistic sense and improving sample efficiency with a stochastic differential equation. We also show that our algorithm needs fewer samples than the original MPPI algorithm does by providing a sampling complexity analysis. 
    more » « less