Search for: All records

Editors contains: "Banerjee, Arindam"

« Prev Next »

Total Resources

13

Resource Type
Conference Paper

12

Conference Proceeding

0

Dataset

0

Journal Article

1

Workshop Report

0

Availability
Full Text / Resource Available

13

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Computation of the Sample Frechet Mean for Sets of Large Graphs with Applications to Regression

Ferguson, Daniel ; Meyer, Francois G. ( June 2022 , SIAM International Conference on Data Mining (SDM 2022))
Banerjee, Arindam ; Zhou, Zhi-Hua (Ed.)
To characterize the location (mean, median) of a set of graphs, one needs a notion of centrality that is adapted to metric spaces, since graph sets are not Euclidean spaces. A standard approach is to consider the Fr ́echet mean. In this work, we equip a set of graph with the pseudometric defined by the l2 norm between the eigenvalues of their respective adjacency matrix. Unlike the edit distance, this pseudometric reveals structural changes at multiple scales, and is well adapted to studying various statistical problems for graph-valued data. We describe an algorithm to compute an approximation to the sample Fr ́echet mean of a set of undirected unweighted graphs with a fixed size using this pseudometric.
more » « less
Full Text Available
Finite-Sample Regret Bound for Distributionally Robust Offline Tabular Reinforcement Learning

Zhou, Zhengqing and ( January 2021 , Proceedings of The 24th International Conference on Artificial Intelligence and Statistics)
Banerjee, Arindam and (Ed.)
While reinforcement learning has witnessed tremendous success recently in a wide range of domains, robustness–or the lack thereof–remains an important issue that remains inadequately addressed. In this paper, we provide a distributionally robust formulation of offline learning policy in tabular RL that aims to learn a policy from historical data (collected by some other behavior policy) that is robust to the future environment arising as a perturbation of the training environment. We first develop a novel policy evaluation scheme that accurately estimates the robust value (i.e. how robust it is in a perturbed environment) of any given policy and establish its finite-sample estimation error. Building on this, we then develop a novel and minimax-optimal distributionally robust learning algorithm that achieves $O_P\left(1/\sqrt{n}\right)$ regret, meaning that with high probability, the policy learned from using $n$ training data points will be $O\left(1/\sqrt{n}\right)$ close to the optimal distributionally robust policy. Finally, our simulation results demonstrate the superiority of our distributionally robust approach compared to non-robust RL algorithms.
more » « less
Full Text Available
Reinforcement Learning for Mean Field Games with Strategic Complementarities

Kiyeob Lee, Desik Rengarajan ( January 2021 , International Conference on Artificial Intelligence and Statistics (AISTATS))
Banerjee, Arindam and (Ed.)
Full Text Available
Random Coordinate Underdamped Langevin Monte Carlo

Ding, Zhiyan ; Li, Qin ; Lu, Jianfeng ; Wright, Stephen ( April 2021 , Proceedings of Machine Learning Research)
Banerjee, Arindam ; Fukumizu, Kenji (Ed.)
Full Text Available
Random Coordinate Underdamped Langevin Monte Carlo

Ding, Zhiyan ; Li, Qin ; Lu, Jianfeng ; Wright, Stephen ( April 2021 , Proceedings of Machine Learning Research)
Banerjee, Arindam ; Fukumizu, Kenji (Ed.)
Full Text Available
Stochastic Bandits with Linear Constraints

Pacchiano, Aldo ; Ghavamzadeh, Mohammad ; Bartlett, Peter L. ; Jiang, Heinrich ( April 2021 , Proceedings of The 24th International Conference on Artificial Intelligence and Statistics)
Banerjee, Arindam ; Fukumizu, Kenji (Ed.)
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies, whose expected cumulative reward over the course of multiple rounds is maximum, and each one of them has an expected cost below a certain threshold. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB), and prove a sublinear bound on its regret that is inversely proportional to the difference between the constraint threshold and the cost of a known feasible action. Our algorithm balances exploration and constraint satisfaction using a novel idea that scales the radii of the reward and cost confidence sets with different scaling factors. We further specialize our results to multi-armed bandits and propose a computationally efficient algorithm for this setting and prove a a regret bound that is better than simply casting multi-armed bandits as an instance of linear bandits and using the regret bound of OPLB. We also prove a lower-bound for the problem studied in the paper and provide simulations to validate our theoretical results. Finally, we show how our algorithm and analysis can be extended to multiple constraints and to the case when the cost of the feasible action is unknown.
more » « less
Full Text Available
Differentially Private Monotone Submodular Maximization Under Matroid and Knapsack Constraints

Sadeghi, Omid ; Fazel, Maryam ( April 2021 , Proceedings of Machine Learning Research)
Banerjee, Arindam ; Fukumizu, Kenji (Ed.)
Numerous tasks in machine learning and artificial intelligence have been modeled as submodular maximization problems. These problems usually involve sensitive data about individuals, and in addition to maximizing the utility, privacy concerns should be considered. In this paper, we study the general framework of non-negative monotone submodular maximization subject to matroid or knapsack constraints in both offline and online settings. For the offline setting, we propose a differentially private $(1-\frac{\kappa}{e})$-approximation algorithm, where $\kappa\in[0,1]$ is the total curvature of the submodular set function, which improves upon prior works in terms of approximation guarantee and query complexity under the same privacy budget. In the online setting, we propose the first differentially private algorithm, and we specify the conditions under which the regret bound scales as $Ø(\sqrt{T})$, i.e., privacy could be ensured while maintaining the same regret bound as the optimal regret guarantee in the non-private setting.
more » « less
Full Text Available
Federated Multi-armed Bandits with Personalization

Shi, Chengshuai ; Shen, Cong ; Yang, Jing ( April 2021 , Proceedings of The 24th International Conference on Artificial Intelligence and Statistics)
Banerjee, Arindam ; Fukumizu, Kenji (Ed.)
Full Text Available
Product Manifold Learning

Zhang, Sharon ; Moscovich, Amit ; Singer, Amit ( January 2021 , Proceedings of Machine Learning Research)
Banerjee, Arindam ; Fukumizu, Kenji (Ed.)
Full Text Available
An efficient algorithm for generalized linear bandit: Online stochastic gradient descent and thompson sampling

Ding, Qin ; Hsieh, Cho-Jui ; Sharpnack, James ( January 2021 , Proceedings of Machine Learning Research)
Banerjee, Arindam ; Fukumizu, Kenji (Ed.)
We consider the contextual bandit problem, where a player sequentially makes decisions based on past observations to maximize the cumulative reward. Although many algorithms have been proposed for contextual bandit, most of them rely on finding the maximum likelihood estimator at each iteration, which requires 𝑂(𝑡) time at the 𝑡-th iteration and are memory inefficient. A natural way to resolve this problem is to apply online stochastic gradient descent (SGD) so that the per-step time and memory complexity can be reduced to constant with respect to 𝑡, but a contextual bandit policy based on online SGD updates that balances exploration and exploitation has remained elusive. In this work, we show that online SGD can be applied to the generalized linear bandit problem. The proposed SGD-TS algorithm, which uses a single-step SGD update to exploit past information and uses Thompson Sampling for exploration, achieves 𝑂̃ (𝑇‾‾√) regret with the total time complexity that scales linearly in 𝑇 and 𝑑, where 𝑇 is the total number of rounds and 𝑑 is the number of features. Experimental results show that SGD-TS consistently outperforms existing algorithms on both synthetic and real datasets.
more » « less
Full Text Available

« Prev Next »