NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Finite Sample Identification of Partially Observed Bilinear Dynamical Systems

Sattar, Yahya; Jedra, Yassir; Fazel, Maryam; Dean, Sarah (May 2025, Proceedings of Machine Learning Research)
Ozay, N; Balzano, L; Panagou, D; Abate, A (Ed.)
We consider the problem of learning a realization of a partially observed bilinear dynamical system (BLDS) from noisy input-output data. Given a single trajectory of input-output samples, we provide an algorithm and a finite time analysis for learning the system’s Markov-like parameters, from which a balanced realization of the bilinear system can be obtained. The stability of BLDS depends on the sequence of inputs used to excite the system. Moreover, our identification algorithm regresses the outputs to highly correlated, nonlinear, and heavy-tailed covariates. These properties, unique to partially observed bilinear dynamical systems, pose significant challenges to the analysis of our algorithm for learning the unknown dynamics. We address these challenges and provide high probability error bounds on our identification algorithm under a uniform stability assumption. Our analysis provides insights into system theoretic quantities that affect learning accuracy and sample complexity. Lastly, we perform numerical experiments with synthetic data to reinforce these insights.
more » « less
Free, publicly-accessible full text available May 22, 2026
Offline Multi-task Transfer RL with Representational Penalization

Bose, Avinandan; Du, Simon S; Fazel, Maryam (April 2025, Proceedings of Machine Learning Research)
Li, Y; Mandt, S; Agrawal, S; Khan, E (Ed.)
We study the problem of representational transfer in offline Reinforcement Learning (RL), where a learner has access to episodic data from a number of source tasks collected a priori, and aims to learn a shared representation to be used in finding a good policy for a target task. Unlike in online RL where the agent interacts with the environment while learning a policy, in the offline setting there cannot be such interactions in either the source tasks or the target task; thus multi-task offline RL can suffer from incomplete coverage. We propose an algorithm to compute pointwise uncertainty measures for the learnt representation in low-rank MDPs, and establish a data-dependent upper bound for the suboptimality of the learnt policy for the target task. Our algorithm leverages the collective exploration done by source tasks to mitigate poor coverage at some points by a few tasks, thus overcoming the limitation of needing uniformly good coverage for a meaningful transfer by existing offline algorithms. We complement our theoretical results with empirical evaluation on a rich-observation MDP which requires many samples for complete coverage. Our findings illustrate the benefits of penalizing and quantifying the uncertainty in the learnt representation.
more » « less
Free, publicly-accessible full text available April 23, 2026
Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning

Bose, Avinandan; Lessard, Laurent; Fazel, Maryam; Dvijotham, Krishnamurthy (April 2025, Proceedings of Machine Learning Research)

The rise of foundation models fine-tuned on human feedback from potentially untrusted users has increased the risk of adversarial data poisoning, necessitating the study of robustness of learning algorithms against such attacks. Existing research on provable certified robustness against data poisoning attacks primarily focuses on certifying robustness for static adversaries who modify a fraction of the dataset used to train the model before the training algorithm is applied. In practice, particularly when learning from human feedback in an online sense, adversaries can observe and react to the learning process and inject poisoned samples that optimize adversarial objectives better than when they are restricted to poisoning a static dataset once, before the learning algorithm is applied. Indeed, it has been shown in prior work that online dynamic adversaries can be significantly more powerful than static ones. We present a novel framework for computing certified bounds on the impact of dynamic poisoning, and use these certificates to design robust learning algorithms. We give an illustration of the framework for the mean estimation problem and binary classification problems and outline directions for extending this in further work.
more » « less
Free, publicly-accessible full text available April 23, 2026
Function Design for Improved Competitive Ratio in Online Resource Allocation with Procurement Costs

https://doi.org/10.1287/ijoo.2021.0012

Ray, Mitas; Sadeghi, Omid; Ratliff, Lillian J; Fazel, Maryam (December 2024, INFORMS Journal on Optimization)

We study the problem of online resource allocation, where customers arrive sequentially, and the seller must irrevocably allocate resources to each incoming customer while also facing a prespecified procurement cost function over the total allocation. The objective is to maximize the reward obtained from fulfilling the customers’ requests sans the cumulative procurement cost. We analyze the competitive ratio of a primal-dual algorithm in this setting and develop an optimization framework for designing a surrogate function for the procurement cost to be used by the algorithm to improve the competitive ratio of the primal-dual algorithm. We use the optimal surrogate function for polynomial procurement cost functions to improve on previous bounds. For general procurement cost functions, our design method uses quasiconvex optimization to find optimal design parameters. We then implement the design techniques and show the improved performance of the algorithm in numerical examples. Finally, we extend the analysis by devising a posted pricing mechanism in which the algorithm does not require the customers’ preferences to be revealed. Funding: M. Fazel’s work was supported in part by the National Science Foundation [Awards 2023166, 2007036, and 1740551]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/ijoo.2021.0012 .
more » « less
Full Text Available
Iterative Linear Quadratic Optimization for Nonlinear Control: Differentiable Programming Algorithmic Templates

https://doi.org/10.5802/ojmo.32

Roulet, Vincent; Srinivasa, Siddhartha; Fazel, Maryam; Harchaoui, Zaid (November 2024, Open Journal of Mathematical Optimization)

Full Text Available
Learning Optimal Tax Design in Nonatomic Congestion Games

Cui, Qiwen; Fazel, Maryam; Du, Simon S (September 2024, Advances in Neural Information Processing Systems)

In multiplayer games, self-interested behavior among the players can harm the social welfare. Tax mechanisms are a common method to alleviate this issue and induce socially optimal behavior. In this work, we take the initial step of learning the optimal tax that can maximize social welfare with limited feedback in congestion games. We propose a new type of feedback named equilibrium feedback, where the tax designer can only observe the Nash equilibrium after deploying a tax plan. Existing algorithms are not applicable due to the exponentially large tax function space, nonexistence of the gradient, and nonconvexity of the objective. To tackle these challenges, we design a computationally efficient algorithm that leverages several novel components: (1) a piece-wise linear tax to approximate the optimal tax; (2) extra linear terms to guarantee a strongly convex potential function; (3) an efficient subroutine to find the exploratory tax that can provide critical information about the game. The algorithm can find an \eps-optimal tax with O(\beta F^2/eps^2) sample complexity, where \beta is the smoothness of the cost function and F is the number of facilities.
more » « less
Full Text Available
Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models

Xu, Weihang; Fazel, Maryam; Du, Simon S (September 2024, Advances in Neural Information Processing Systems)

We study the gradient Expectation-Maximization (EM) algorithm for Gaussian Mixture Models (GMM) in the over-parameterized setting, where a general GMM with n > 1 components learns from data that are generated by a single ground truth Gaussian distribution. While results for the special case of 2-Gaussian mixtures are well-known, a general global convergence analysis for arbitrary n remains unresolved and faces several new technical barriers since the convergence becomes sub-linear and non-monotonic. To address these challenges, we construct a novel likelihood-based convergence analysis framework and rigorously prove that gradient EM converges globally with a sublinear rate O(1/\sqrt{t}). This is the first global convergence result for Gaussian mixtures with more than 2 components. The sublinear convergence rate is due to the algorithmic nature of learning over- parameterized GMM with gradient EM. We also identify a new emerging technical challenge for learning general over-parameterized GMM: the existence of bad local regions that can trap gradient EM for an exponential number of steps.
more » « less
Full Text Available
Initializing services in interactive ml systems for diverse users

Bose, Avinandan; Curmei, Mihaela; Jiang, Daniel; Morgenstern, Jamie H; Dean, Sarah; Ratliff, Lillian; Fazel, Maryam (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024).)

Full Text Available
Initializing services in interactive ML systems for diverse users

Bose, Avinandan; Curmei, Mihaela; Jiang, Daniel; Morgenstern, Jamie; Dean, Sarah; Ratliff, Lillian; Fazel, Maryam (September 2024, Advances in Neural Information Processing Systems)
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
This paper investigates ML systems serving a group of users, with multiple models/services, each aimed at specializing to a sub-group of users. We consider settings where upon deploying a set of services, users choose the one minimizing their personal losses and the learner iteratively learns by interacting with diverse users. Prior research shows that the outcomes of learning dynamics, which comprise both the services' adjustments and users' service selections, hinge significantly on the initial conditions. However, finding good initial conditions faces two main challenges:(i)\emph {Bandit feedback:} Typically, data on user preferences are not available before deploying services and observing user behavior;(ii)\emph {Suboptimal local solutions:} The total loss landscape (ie, the sum of loss functions across all users and services) is not convex and gradient-based algorithms can get stuck in poor local minima. We address these challenges with a randomized algorithm to adaptively select a minimal set of users for data collection in order to initialize a set of services. Under mild assumptions on the loss functions, we prove that our initialization leads to a total loss within a factor of the\textit {globally optimal total loss, with complete user preference data}, and this factor scales logarithmically in the number of services. This result is a generalization of the well-known k-means++ guarantee to a broad problem class which is also of independent interest. The theory is complemented by experiments on real as well as semi-synthetic datasets.
more » « less
Full Text Available
No-Regret Online Prediction with Strategic Experts

Sadeghi, Omid; Fazel, Maryam (February 2024, Advances in Neural Information Processing Systems)

We study a generalization of the online binary prediction with expert advice framework where at each round, the learner is allowed to pick m 1 experts from a pool of K experts and the overall utility is a modular or submodular function of the chosen experts. We focus on the setting in which experts act strategically and aim to maximize their influence on the algorithm’s predictions by potentially misreporting their beliefs about the events. Among others, this setting finds applications in forecasting competitions where the learner seeks not only to make predictions by aggregating different forecasters but also to rank them according to their relative performance. Our goal is to design algorithms that satisfy the following two requirements: 1) Incentive-compatible: Incentivize the experts to report their beliefs truthfully, and 2) No-regret: Achieve sublinear regret with respect to the true beliefs of the best-fixed set of m experts in hindsight. Prior works have studied this framework when m = 1 and provided incentive-compatible no-regret algorithms for the problem. We first show that a simple reduction of our problem to the m = 1 setting is neither efficient nor effective. Then, we provide algorithms that utilize the specific structure of the utility functions to achieve the two desired goals.
more » « less
Full Text Available

« Prev Next »

Search for: All records