skip to main content


Search for: All records

Award ID contains: 1935389

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The theory and application of mean field games has grown significantly since its origins less than two decades ago. This paper considers a special class in which the game is cooperative, and the cost includes a control penalty defined by Kullback-Leibler divergence, as commonly used in reinforcement learning and other fields. Its use as a control cost or regularizer is often preferred because this leads to an attractive solution. This paper considers a particular control paradigm called Kullback-Leibler Quadratic (KLQ) optimal control, and arrives at the following conclusions: 1. in application to distributed control of electric loads, a new modeling technique is introduced to obtain a simple Markov model for each load (the `agent' in mean field theory). 2. It is argued that the optimality equations may be solved using Monte-Carlo techniques---a specialized version of stochastic gradient descent (SGD). 3. The use of averaging minimizes the asymptotic covariance in the SGD algorithm; the form of the optimal covariance is identified for the first time. 
    more » « less
    Free, publicly-accessible full text available December 13, 2024
  2. Theory and application of stochastic approximation (SA) has grown within the control systems community since the earliest days of adaptive control. This paper takes a new look at the topic, motivated by recent results establishing remarkable performance of SA with (sufficiently small) constant step-size \alpha>0. If averaging is implemented to obtain the final parameter estimate, then the estimates are asymptotically unbiased with nearly optimal asymptotic covariance. These results have been obtained for random linear SA recursions with i.i.d.\ coefficients. This paper obtains very different conclusions in the more common case of geometrically ergodic Markovian disturbance: (i) The target bias is identified, even in the case of non-linear SA, and is in general non-zero. The remaining results are established for linear SA recursions: (ii) the bivariate parameter-disturbance process is geometrically ergodic in a topological sense; (iii) the representation for bias has a simpler form in this case, and cannot be expected to be zero if there is multiplicative noise; (iv) the asymptotic covariance of the averaged parameters is within O(\alpha) of optimal. The error term is identified, and may be massive if mean dynamics are not well conditioned. The theory is illustrated with application to TD-learning. 
    more » « less
    Free, publicly-accessible full text available December 13, 2024
  3. The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, a significant connection between the solution to the new convex program, and the solution to standard Q-learning with linear function approximation. The second set of contributions concern algorithm design and analysis: (i) A direct model-free method for approximating the convex program for Q-learning shares properties with its ideal. In particular, a bounded solution is ensured subject to a simple property of the basis functions; (ii) The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense; (iii) The approach can be generalized to a range of performance criteria, and it is found that variance can be reduced by considering ``relative'' dynamic programming equations; (iv) The theory is illustrated with an application to a classical inventory control problem. 
    more » « less
  4. Editor-in-Chief: George Yin (Ed.)
    This paper presents approaches to mean-field control, motivated by distributed control of multi-agent systems. Control solutions are based on a convex optimization problem, whose domain is a convex set of probability mass functions (pmfs). The main contributions follow: 1. Kullback-Leibler-Quadratic (KLQ) optimal control is a special case, in which the objective function is composed of a control cost in the form of Kullback-Leibler divergence between a candidate pmf and the nominal, plus a quadratic cost on the sequence of marginals. Theory in this paper extends prior work on deterministic control systems, establishing that the optimal solution is an exponential tilting of the nominal pmf. Transform techniques are introduced to reduce complexity of the KLQ solution, motivated by the need to consider time horizons that are much longer than the inter-sampling times required for reliable control. 2. Infinite-horizon KLQ leads to a state feedback control solution with attractive properties. It can be expressed as either state feedback, in which the state is the sequence of marginal pmfs, or an open loop solution is obtained that is more easily computed. 3. Numerical experiments are surveyed in an application of distributed control of residential loads to provide grid services, similar to utility-scale battery storage. The results show that KLQ optimal control enables the aggregate power consumption of a collection of flexible loads to track a time-varying reference signal, while simultaneously ensuring each individual load satisfies its own quality of service constraints. 
    more » « less
    Free, publicly-accessible full text available October 31, 2024
  5. From the summary: The goal of this article is two-fold: survey the emerging theory of QSA (quasi-stochastic approximation) and its implication to design, and explain the intimate connection between QSA and ESC (extremum seeking control). The contributions go in two directions: ESC algorithm design can benefit by applying concepts from QSA theory, and the broader research community with interest in gradient-free optimization can benefit from the control theoretic approach inherent to ESC. 
    more » « less
    Free, publicly-accessible full text available October 1, 2024
  6. Foundational and state-of-the-art anomaly-detection methods through power system state estimation are reviewed. Traditional components for bad data detection, such as chi-square testing, residual-based methods, and hypothesis testing, are discussed to explain the motivations for recent anomaly-detection methods given the increasing complexity of power grids, energy management systems, and cyber-threats. In particular, state estimation anomaly detection based on data-driven quickest-change detection and artificial intelligence are discussed, and directions for research are suggested with particular emphasis on considerations of the future smart grid. 
    more » « less
    Free, publicly-accessible full text available September 1, 2024
  7. Andrea Serrani (Ed.)
    Over the past decade, there has been significant progress on the science of load control for the creation of virtual energy storage. This is an alternative to demand response, and it is termed demand dispatch. Distributed control is used to manage millions of flexible loads to modify the power consumption of the aggregation, which can be ramped up and down, just like discharging and charging a battery. A challenge with distributed control is heterogeneity of the population of loads, which complicates control at the aggregate level. It is shown in this article that additional control at each load in the population can result in a far aggregate model. The local control is designed to flatten resonances and produce approximately all-pass response. Analysis is based on mean-field control for the heterogeneous population; the mean-field model is only justified because of the additional local control introduced in this article. Theory and simulations indicate that the resulting input--output dynamics of the aggregate has a nearly flat input--output response: the behavior of an ideal, multi-GW battery system. 
    more » « less
    Free, publicly-accessible full text available July 1, 2024