Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We establish the first finite-time logarithmic regret bounds for the self-tuning regulation problem. We introduce a modified version of the certainty equivalence algorithm, which we call PIECE, that clips inputs in addition to utilizing probing inputs for exploration. We show that it has a ClogT upper bound on the regret after T time-steps for bounded noise, and Clog3T in the case of sub-Gaussian noise, unlike the LQ problem where logarithmic regret is shown to be not possible. The PIECE algorithm is also designed to address the critical challenge of poor initial transient performance of reinforcement learning algorithms for linear systems. Comparative simulation results illustrate the improved performance of PIECE.more » « lessFree, publicly-accessible full text available July 29, 2025
-
We study Markov potential games under the infinite horizon average reward criterion. Most previous studies have been for discounted rewards. We prove that both algorithms based on independent policy gradient and independent natural policy gradient converge globally to a Nash equilibrium for the average reward criterion. To set the stage for gradient-based methods, we first establish that the average reward is a smooth function of policies and provide sensitivity bounds for the differential value functions, under certain conditions on ergodicity and the second largest eigenvalue of the underlying Markov decision process (MDP). We prove that three algorithms, policy gradient, proximal-Q, and natural policy gradient (NPG), converge to an ϵ-Nash equilibrium with time complexity O(1ϵ2), given a gradient/differential Q function oracle. When policy gradients have to be estimated, we propose an algorithm with ~O(1mins,aπ(a|s)δ) sample complexity to achieve δ approximation error w.r.t~the ℓ2 norm. Equipped with the estimator, we derive the first sample complexity analysis for a policy gradient ascent algorithm, featuring a sample complexity of ~O(1/ϵ5). Simulation studies are presented.more » « lessFree, publicly-accessible full text available May 2, 2025
-
Interactive decision making, encompassing bandits, contextual bandits, and reinforcement learning, has recently been of interest to theoretical studies of experimentation design and recommender system algorithm research. Recently, it has been shown that the wellknown Graves-Lai constant being zero is a necessary and sufficient condition for achieving bounded (or constant) regret in interactive decision making. As this condition may be a strong requirement for many applications, the practical usefulness of pursuing bounded regret has been questioned. In this paper, we show that the condition of the Graves-Lai constant being zero is also necessary to achieve delay model robustness when reward delays are unknown (i.e., when feedbacks are anonymous). Here, model robustness is measured in terms of ✏-robustness, one of the most widely used and one of the least adversarial robustness concepts in the robust statistics literature. In particular, we show that ✏-robustness cannot be achieved for a consistent (i.e., uniformly sub-polynomial regret) algorithm however small the nonzero ✏ value is when the Grave-Lai constant is not zero. While this is a strongly negative result, we also provide a positive result for linear rewards models (Linear contextual bandits, Reinforcement learning with linear MDP) that the Grave-Lai constant being zero is also sufficient for achieving bounded regret without any knowledge of delay models, i.e., the best of both the efficiency world and the delay robustness world.more » « less
-
In both power system transient stability and electromagnetic transient (EMT) simulations, up to 90% of the computational time is devoted to solve the network equations, i.e., a set of linear equations. Traditional approaches are based on sparse LU factorization, which is inherently sequential. In this paper, EMT simulation is considered and an inverse-based network solution is proposed by a hierarchical method for computing and store the approximate inverse of the conductance matrix. The proposed method can also efficiently update the inverse by modifying only local sub-matrices to reflect changes in the network, e.g., loss of a line. Experiments on a series of simplified 179-bus Western Interconnection demonstrate the advantages of the proposed methods.more » « less
-
Jadbabaie, Ali ; Lygeros, John ; Pappas, George J. ; Parrilo, Pablo ; Recht, Benjamin ; Scaramuzza, Davide ; Tomlin, Claire J. ; Zeilinger, Melanie N. (Ed.)