NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Finite Sample Identification of Partially Observed Bilinear Dynamical Systems

Sattar, Yahya; Jedra, Yassir; Fazel, Maryam; Dean, Sarah (May 2025, Proceedings of Machine Learning Research)
Ozay, N; Balzano, L; Panagou, D; Abate, A (Ed.)
We consider the problem of learning a realization of a partially observed bilinear dynamical system (BLDS) from noisy input-output data. Given a single trajectory of input-output samples, we provide an algorithm and a finite time analysis for learning the system’s Markov-like parameters, from which a balanced realization of the bilinear system can be obtained. The stability of BLDS depends on the sequence of inputs used to excite the system. Moreover, our identification algorithm regresses the outputs to highly correlated, nonlinear, and heavy-tailed covariates. These properties, unique to partially observed bilinear dynamical systems, pose significant challenges to the analysis of our algorithm for learning the unknown dynamics. We address these challenges and provide high probability error bounds on our identification algorithm under a uniform stability assumption. Our analysis provides insights into system theoretic quantities that affect learning accuracy and sample complexity. Lastly, we perform numerical experiments with synthetic data to reinforce these insights.
more » « less
Full Text Available
Learning Optimal Tax Design in Nonatomic Congestion Games

Cui, Qiwen; Fazel, Maryam; Du, Simon S (September 2024, Advances in Neural Information Processing Systems)

In multiplayer games, self-interested behavior among the players can harm the social welfare. Tax mechanisms are a common method to alleviate this issue and induce socially optimal behavior. In this work, we take the initial step of learning the optimal tax that can maximize social welfare with limited feedback in congestion games. We propose a new type of feedback named equilibrium feedback, where the tax designer can only observe the Nash equilibrium after deploying a tax plan. Existing algorithms are not applicable due to the exponentially large tax function space, nonexistence of the gradient, and nonconvexity of the objective. To tackle these challenges, we design a computationally efficient algorithm that leverages several novel components: (1) a piece-wise linear tax to approximate the optimal tax; (2) extra linear terms to guarantee a strongly convex potential function; (3) an efficient subroutine to find the exploratory tax that can provide critical information about the game. The algorithm can find an \eps-optimal tax with O(\beta F^2/eps^2) sample complexity, where \beta is the smoothness of the cost function and F is the number of facilities.
more » « less
Full Text Available
A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Jiang, Haozhe; Cui, Qiwen; Xiong, Zhihan; Fazel, Maryam; Du, Simon S (January 2024, Proceedings of the International Conference on Learning Representations)

We investigate learning the equilibria in non-stationary multi-agent systems and address the challenges that differentiate multi-agent learning from single-agent learning. Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges. To overcome these obstacles, we propose a versatile black-box approach applicable to a broad spectrum of problems, such as general-sum games, potential games, and Markov games, when equipped with appropriate learning and testing oracles for stationary environments. Our algorithms can achieve O(∆^1/4 T^3/4) regret when the degree of nonstationarity, as measured by total variation ∆, is known, and O(∆^1/5 T^4/5) regret when ∆ is unknown, where T is the number of rounds. Meanwhile, our algorithm inherits the favorable dependence on number of agents from the oracles. As a side contribution that may be independent of interest, we show how to test for various types of equilibria by a black-box reduction to single-agent learning, which includes Nash equilibria, correlated equilibria, and coarse correlated equilibria.
more » « less
Full Text Available
On Controller Reduction in Linear Quadratic Gaussian Control with Performance Bounds

Ren, Zhaolin; Zheng, Yang; Fazel, Maryam; Li, Na (June 2023, Proceedings of Machine Learning Research)

The problem of controller reduction has a rich history in control theory. Yet, many questions remain open. In particular, there exist very few results on the order reduction of general non-observer based controllers and the subsequent quantification of the closed-loop performance. Recent developments in model-free policy optimization for Linear Quadratic Gaussian (LQG) control have highlighted the importance of this question. In this paper, we first propose a new set of sufficient conditions ensuring that a perturbed controller remains internally stabilizing. Based on this result, we illustrate how to perform order reduction of general (non-observer based) output feedback controllers using balanced truncation and modal truncation. We also provide explicit bounds on the LQG performance of the reduced-order controller. Furthermore, for single-input-single-output (SISO) systems, we introduce a new controller reduction technique by truncating unstable modes. We illustrate our theoretical results with numerical simulations. Our results will serve as valuable tools to design direct policy search algorithms for control problems with partial observations.
more » « less
Full Text Available
Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies

https://doi.org/10.1146/annurev-control-042920-020021

Hu, Bin; Zhang, Kaiqing; Li, Na; Mesbahi, Mehran; Fazel, Maryam; Başar, Tamer (May 2023, Annual Review of Control, Robotics, and Autonomous Systems)

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis that has been popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently developed theoretical results on the optimization landscape, global convergence, and sample complexityof gradient-based methods for various continuous control problems, such as the linear quadratic regulator (LQR), [Formula: see text] control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.
more » « less
Full Text Available
Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement

Jiang, Haozhe; Cui, Qiwen; Xiong, Zhihan; Fazel, Maryam; Du, Simon S. (February 2023, International Conference on Learning Representations)

This paper investigates when one can efficiently recover an approximate Nash Equilibrium (NE) in offline congestion games. The existing dataset coverage assumption in offline general-sum games inevitably incurs a dependency on the number of actions, which can be exponentially large in congestion games. We consider three different types of feedback with decreasing revealed information. Starting from the facility-level (a.k.a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and show a pessimism-type algorithm that can recover an approximate NE. For the agent-level (a.k.a., bandit) feedback setting, interestingly, we show the one-unit deviation coverage condition is not sufficient. On the other hand, we convert the game to multi-agent linear bandits and show that with a generalized data coverage assumption in offline linear bandits, we can efficiently recover the approximate NE. Lastly, we consider a novel type of feedback, the game-level feedback where only the total reward from all agents is revealed. Again, we show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting, and with a stronger version of the data coverage assumption for linear bandits, we can recover an approximate NE. Together, our results constitute the first study of offline congestion games and imply formal separations between different types of feedback.
more » « less
Full Text Available
Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

Yuan, Rui; Du, Simon S.; Gower, Robert M.; Lazaric, Alessandro; Xiao, Lin (January 2023, International Conference on Learning Representations)

Full Text Available
Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

Xiong, Zhihan; Shen, Ruoqi; Cui, Qiwen; Fazel, Maryam; Du, Simon S. (January 2022, Conference on Neural Information Processing Systems)

Full Text Available
Learning in Congestion Games with Bandit Feedback

Cui, Qiwen; Xiong, Zhihan; Fazel, Maryam; Du, Simon S. (January 2022, Conference on Neural Information Processing Systems)

Full Text Available

Search for: All records