NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Reinforcement Learning for Mean Field Games with Strategic Complementarities

Kiyeob Lee, Desik Rengarajan (January 2021, International Conference on Artificial Intelligence and Statistics (AISTATS))
Banerjee, Arindam and (Ed.)
Full Text Available
Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning

Zhou, Zhengqing and (January 2021, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics)
Banerjee, Arindam and (Ed.)
While reinforcement learning has witnessed tremendous success recently in a wide range of domains, robustness–or the lack thereof–remains an important issue that remains inadequately addressed. In this paper, we provide a distributionally robust formulation of offline learning policy in tabular RL that aims to learn a policy from historical data (collected by some other behavior policy) that is robust to the future environment arising as a perturbation of the training environment. We first develop a novel policy evaluation scheme that accurately estimates the robust value (i.e. how robust it is in a perturbed environment) of any given policy and establish its finite-sample estimation error. Building on this, we then develop a novel and minimax-optimal distributionally robust learning algorithm that achieves $$O_P\left(1/\sqrt{n}\right)$$ regret, meaning that with high probability, the policy learned from using $$n$$ training data points will be $$O\left(1/\sqrt{n}\right)$$ close to the optimal distributionally robust policy. Finally, our simulation results demonstrate the superiority of our distributionally robust approach compared to non-robust RL algorithms.
more » « less
Full Text Available
Computation of the Sample Frechet Mean for Sets of Large Graphs with Applications to Regression

Ferguson, Daniel; Meyer, Francois G. (June 2022, SIAM International Conference on Data Mining (SDM 2022))
Banerjee, Arindam; Zhou, Zhi-Hua (Ed.)
To characterize the location (mean, median) of a set of graphs, one needs a notion of centrality that is adapted to metric spaces, since graph sets are not Euclidean spaces. A standard approach is to consider the Fr ́echet mean. In this work, we equip a set of graph with the pseudometric defined by the l2 norm between the eigenvalues of their respective adjacency matrix. Unlike the edit distance, this pseudometric reveals structural changes at multiple scales, and is well adapted to studying various statistical problems for graph-valued data. We describe an algorithm to compute an approximation to the sample Fr ́echet mean of a set of undirected unweighted graphs with a fixed size using this pseudometric.
more » « less
Full Text Available
Differentially Private Monotone Submodular Maximization Under Matroid and Knapsack Constraints

Sadeghi, Omid; Fazel, Maryam (April 2021, Proceedings of Machine Learning Research)
Banerjee, Arindam; Fukumizu, Kenji (Ed.)
Numerous tasks in machine learning and artificial intelligence have been modeled as submodular maximization problems. These problems usually involve sensitive data about individuals, and in addition to maximizing the utility, privacy concerns should be considered. In this paper, we study the general framework of non-negative monotone submodular maximization subject to matroid or knapsack constraints in both offline and online settings. For the offline setting, we propose a differentially private $$(1-\frac{\kappa}{e})$$-approximation algorithm, where $$\kappa\in[0,1]$$ is the total curvature of the submodular set function, which improves upon prior works in terms of approximation guarantee and query complexity under the same privacy budget. In the online setting, we propose the first differentially private algorithm, and we specify the conditions under which the regret bound scales as $$Ø(\sqrt{T})$$, i.e., privacy could be ensured while maintaining the same regret bound as the optimal regret guarantee in the non-private setting.
more » « less
Full Text Available
Stochastic Bandits with Linear Constraints

Pacchiano, Aldo; Ghavamzadeh, Mohammad; Bartlett, Peter L.; Jiang, Heinrich (April 2021, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics)
Banerjee, Arindam; Fukumizu, Kenji (Ed.)
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies, whose expected cumulative reward over the course of multiple rounds is maximum, and each one of them has an expected cost below a certain threshold. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB), and prove a sublinear bound on its regret that is inversely proportional to the difference between the constraint threshold and the cost of a known feasible action. Our algorithm balances exploration and constraint satisfaction using a novel idea that scales the radii of the reward and cost confidence sets with different scaling factors. We further specialize our results to multi-armed bandits and propose a computationally efficient algorithm for this setting and prove a a regret bound that is better than simply casting multi-armed bandits as an instance of linear bandits and using the regret bound of OPLB. We also prove a lower-bound for the problem studied in the paper and provide simulations to validate our theoretical results. Finally, we show how our algorithm and analysis can be extended to multiple constraints and to the case when the cost of the feasible action is unknown.
more » « less
Full Text Available
Random Coordinate Underdamped Langevin Monte Carlo

Ding, Zhiyan; Li, Qin; Lu, Jianfeng; Wright, Stephen (April 2021, Proceedings of Machine Learning Research)
Banerjee, Arindam; Fukumizu, Kenji (Ed.)
Full Text Available
Random Coordinate Underdamped Langevin Monte Carlo

Ding, Zhiyan; Li, Qin; Lu, Jianfeng; Wright, Stephen (April 2021, Proceedings of Machine Learning Research)
Banerjee, Arindam; Fukumizu, Kenji (Ed.)
Full Text Available
Federated Multi-armed Bandits with Personalization

Shi, Chengshuai; Shen, Cong; Yang, Jing (April 2021, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics)
Banerjee, Arindam; Fukumizu, Kenji (Ed.)
Full Text Available
Principal Component Regression with Semirandom Observations via Matrix Completion

Bhaskara, A.; Ruwanpathirana, A.; Wijewardena, M. (January 2021, International Conference on Artificial Intelligence and Statistics (AISTATS))
Banerjee, Arindam; Fukumizu, Kenji (Ed.)
Principal Component Regression (PCR) is a popular method for prediction from data, and is one way to address the so-called multi-collinearity problem in regression. It was shown recently that algorithms for PCR such as hard singular value thresholding (HSVT) are also quite robust, in that they can handle data that has missing or noisy covariates. However, such spectral approaches require strong distributional assumptions on which entries are observed. Specifically, every covariate is assumed to be observed with probability (exactly) p, for some value of p. Our goal in this work is to weaken this requirement, and as a step towards this, we study a "semi-random" model. In this model, every covariate is revealed with probability p, and then an adversary comes in and reveals additional covariates. While the model seems intuitively easier, it is well known that algorithms such as HSVT perform poorly. Our approach is based on studying the closely related problem of Noisy Matrix Completion in a semi-random setting. By considering a new semidefinite programming relaxation, we develop new guarantees for matrix completion, which is our core technical contribution.
more » « less
Full Text Available
Efficient Interpolation of Density Estimators

Turner, Paxton; Liu, Jingbo; Rigollet, Philippe (January 2021, Proceedings of Machine Learning Research)
Banerjee, Arindam; Fukumizu, Kenji (Ed.)
Full Text Available

« Prev Next »

Search for: All records