Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

The theory and application of mean field games has grown significantly since its origins less than two decades ago. This paper considers a special class in which the game is cooperative, and the cost includes a control penalty defined by KullbackLeibler divergence, as commonly used in reinforcement learning and other fields. Its use as a control cost or regularizer is often preferred because this leads to an attractive solution. This paper considers a particular control paradigm called KullbackLeibler Quadratic (KLQ) optimal control, and arrives at the following conclusions: 1. in application to distributed control of electric loads, a new modeling technique is introduced to obtain a simple Markov model for each load (the `agent' in mean field theory). 2. It is argued that the optimality equations may be solved using MonteCarlo techniquesa specialized version of stochastic gradient descent (SGD). 3. The use of averaging minimizes the asymptotic covariance in the SGD algorithm; the form of the optimal covariance is identified for the first time.more » « lessFree, publiclyaccessible full text available December 13, 2024

Theory and application of stochastic approximation (SA) has grown within the control systems community since the earliest days of adaptive control. This paper takes a new look at the topic, motivated by recent results establishing remarkable performance of SA with (sufficiently small) constant stepsize \alpha>0. If averaging is implemented to obtain the final parameter estimate, then the estimates are asymptotically unbiased with nearly optimal asymptotic covariance. These results have been obtained for random linear SA recursions with i.i.d.\ coefficients. This paper obtains very different conclusions in the more common case of geometrically ergodic Markovian disturbance: (i) The target bias is identified, even in the case of nonlinear SA, and is in general nonzero. The remaining results are established for linear SA recursions: (ii) the bivariate parameterdisturbance process is geometrically ergodic in a topological sense; (iii) the representation for bias has a simpler form in this case, and cannot be expected to be zero if there is multiplicative noise; (iv) the asymptotic covariance of the averaged parameters is within O(\alpha) of optimal. The error term is identified, and may be massive if mean dynamics are not well conditioned. The theory is illustrated with application to TDlearning.more » « lessFree, publiclyaccessible full text available December 13, 2024

The paper introduces the first formulation of convex Qlearning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, a significant connection between the solution to the new convex program, and the solution to standard Qlearning with linear function approximation. The second set of contributions concern algorithm design and analysis: (i) A direct modelfree method for approximating the convex program for Qlearning shares properties with its ideal. In particular, a bounded solution is ensured subject to a simple property of the basis functions; (ii) The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a meansquare sense; (iii) The approach can be generalized to a range of performance criteria, and it is found that variance can be reduced by considering ``relative'' dynamic programming equations; (iv) The theory is illustrated with an application to a classical inventory control problem.more » « less

EditorinChief: George Yin (Ed.)This paper presents approaches to meanfield control, motivated by distributed control of multiagent systems. Control solutions are based on a convex optimization problem, whose domain is a convex set of probability mass functions (pmfs). The main contributions follow: 1. KullbackLeiblerQuadratic (KLQ) optimal control is a special case, in which the objective function is composed of a control cost in the form of KullbackLeibler divergence between a candidate pmf and the nominal, plus a quadratic cost on the sequence of marginals. Theory in this paper extends prior work on deterministic control systems, establishing that the optimal solution is an exponential tilting of the nominal pmf. Transform techniques are introduced to reduce complexity of the KLQ solution, motivated by the need to consider time horizons that are much longer than the intersampling times required for reliable control. 2. Infinitehorizon KLQ leads to a state feedback control solution with attractive properties. It can be expressed as either state feedback, in which the state is the sequence of marginal pmfs, or an open loop solution is obtained that is more easily computed. 3. Numerical experiments are surveyed in an application of distributed control of residential loads to provide grid services, similar to utilityscale battery storage. The results show that KLQ optimal control enables the aggregate power consumption of a collection of flexible loads to track a timevarying reference signal, while simultaneously ensuring each individual load satisfies its own quality of service constraints.more » « lessFree, publiclyaccessible full text available October 31, 2024

From the summary: The goal of this article is twofold: survey the emerging theory of QSA (quasistochastic approximation) and its implication to design, and explain the intimate connection between QSA and ESC (extremum seeking control). The contributions go in two directions: ESC algorithm design can benefit by applying concepts from QSA theory, and the broader research community with interest in gradientfree optimization can benefit from the control theoretic approach inherent to ESC.more » « lessFree, publiclyaccessible full text available October 1, 2024

Foundational and stateoftheart anomalydetection methods through power system state estimation are reviewed. Traditional components for bad data detection, such as chisquare testing, residualbased methods, and hypothesis testing, are discussed to explain the motivations for recent anomalydetection methods given the increasing complexity of power grids, energy management systems, and cyberthreats. In particular, state estimation anomaly detection based on datadriven quickestchange detection and artificial intelligence are discussed, and directions for research are suggested with particular emphasis on considerations of the future smart grid.more » « lessFree, publiclyaccessible full text available September 1, 2024

Andrea Serrani (Ed.)Over the past decade, there has been significant progress on the science of load control for the creation of virtual energy storage. This is an alternative to demand response, and it is termed demand dispatch. Distributed control is used to manage millions of flexible loads to modify the power consumption of the aggregation, which can be ramped up and down, just like discharging and charging a battery. A challenge with distributed control is heterogeneity of the population of loads, which complicates control at the aggregate level. It is shown in this article that additional control at each load in the population can result in a far aggregate model. The local control is designed to flatten resonances and produce approximately allpass response. Analysis is based on meanfield control for the heterogeneous population; the meanfield model is only justified because of the additional local control introduced in this article. Theory and simulations indicate that the resulting inputoutput dynamics of the aggregate has a nearly flat inputoutput response: the behavior of an ideal, multiGW battery system.more » « lessFree, publiclyaccessible full text available July 1, 2024