An intermittent, model-free optimal control algorithm that enables an autonomous vehicle to track a nonpredetermined trajectory at high speed is presented. The approachisbandwidthandenergyefficientinthatcommunication between actuators is limited to instances when it is needed rather than performing unnecessary periodic updates. We formulate the problem by properly augmenting the system and reference (trajectory) data, and then designing a triggering mechanism for the controller to work with a sampled version of the augmented states at some triggering instants. In order to obtain a model-free solution, we leverage a Q-learning framework with a zero-order hold actor network and a critic network to approximate the optimal intermittent controller and the optimal cost, respectively, resulting in appropriate tuning laws. Finally, we provide a numerical example of an ground vehicle driving autonomously at high-speed on a race track. 
                        more » 
                        « less   
                    
                            
                            Dynamic intermittent Q ‐learning–based model‐free suboptimal co‐design of ‐stabilization
                        
                    
    
            Summary This paper proposes an intermittent model‐free learning algorithm for linear time‐invariant systems, where the control policy and transmission decisions are co‐designed simultaneously while also being subjected to worst‐case disturbances. The control policy is designed by introducing an internal dynamical system to further reduce the transmission rate and provide bandwidth flexibility in cyber‐physical systems. Moreover, aQ‐learning algorithm with two actors and a single critic structure is developed to learn the optimal parameters of aQ‐function. It is shown by using an impulsive system approach that the closed‐loop system has an asymptotically stable equilibrium and that no Zeno behavior occurs. Furthermore, a qualitative performance analysis of the model‐free dynamic intermittent framework is given and shows the degree of suboptimality concerning the optimal continuous updated controller. Finally, a numerical simulation of an unknown system is carried out to highlight the efficacy of the proposed framework. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1851588
- PAR ID:
- 10461456
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- International Journal of Robust and Nonlinear Control
- Volume:
- 29
- Issue:
- 9
- ISSN:
- 1049-8923
- Page Range / eLocation ID:
- p. 2673-2694
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            The robust 𝜙-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust 𝜙-regularized fitted Q-iteration for learning an 𝜖-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of 𝜙-divergences achieving robust optimal policies in high-dimensional systems of arbitrary large state space with general function approximation. Second, we introduce the hybrid robust 𝜙-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration. To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems of arbitrary large state space with general function approximation under the hybrid robust 𝜙-regularized reinforcement learning framework.more » « less
- 
            This paper proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm.more » « less
- 
            We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor in (0, 1), given an arbitrary sample path with “covering time” L, we establish that the algorithm is guaranteed to output an "-accurate estimate of the optimal Q-function nearly optimal sample complexity.more » « less
- 
            Summary Malaria is an infectious disease affecting a large population across the world, and interventions need to be efficiently applied to reduce the burden of malaria. We develop a framework to help policy-makers decide how to allocate limited resources in realtime for malaria control. We formalize a policy for the resource allocation as a sequence of decisions, one per intervention decision, that map up-to-date disease related information to a resource allocation. An optimal policy must control the spread of the disease while being interpretable and viewed as equitable to stakeholders. We construct an interpretable class of resource allocation policies that can accommodate allocation of resources residing in a continuous domain and combine a hierarchical Bayesian spatiotemporal model for disease transmission with a policy-search algorithm to estimate an optimal policy for resource allocation within the pre-specified class. The estimated optimal policy under the proposed framework improves the cumulative long-term outcome compared with naive approaches in both simulation experiments and application to malaria interventions in the Democratic Republic of the Congo.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
