NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Multi-Armed Bandits With Costly Probes

https://doi.org/10.1109/TIT.2024.3506866

Elumar, Eray Can; Tekin, Cem; Yağan, Osman (January 2025, IEEE Transactions on Information Theory)

Free, publicly-accessible full text available January 1, 2026
Multi-Armed Bandits with Probing

https://doi.org/10.1109/ISIT57864.2024.10619238

Elumar, Eray Can; Tekin, Cem; Yağan, Osman (July 2024, IEEE)

Full Text Available
Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

Jali, Neharika; Qu, Guannan; Wang, Weina; Joshi, Gauri (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics)

We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. But an optimal policy for the multi-server system is unknown and non-trivial to find. While Reinforcement Learning (RL) has been recognized to have great potential for learning policies in such cases, our problem has an exponentially large state space size, rendering standard RL inefficient. In this work, we propose ACHQ, an efficient policy gradient-based algorithm with a low dimensional soft threshold policy parameterization that leverages the underlying queueing structure. We provide stationary-point convergence guarantees for the general case and despite the low-dimensional parameterization prove that ACHQ converges to an approximate global optimum for the special case of two servers. Simulations demonstrate an improvement in expected response time of up to ∼30 over the greedy policy that routes to the fastest available server.
more » « less
Full Text Available
The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond

Woo, Jiin; Joshi, Gauri; Chi, Yuejie (July 2023, International Conference on Machine Learning (ICML))

In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone. Focusing on infinite-horizon tabular Markov decision processes, we provide sample complexity guarantees for both the synchronous and asynchronous variants of federated Q-learning. In both cases, our bounds exhibit a linear speedup with respect to the number of agents and sharper dependencies on other salient problem parameters. Moreover, existing approaches to federated Q-learning adopt an equally-weighted averaging of local Q-estimates, which can be highly sub-optimal in the asynchronous setting since the local trajectories can be highly heterogeneous due to different local behavior policies. Existing sample complexity scales inverse proportionally to the minimum entry of the stationary state-action occupancy distributions over all agents, requiring that every agent covers the entire state-action space. Instead, we propose a novel importance averaging algorithm, giving larger weights to more frequently visited state-action pairs. The improved sample complexity scales inverse proportionally to the minimum entry of the average stationary state-action occupancy distribution of all agents, thus only requiring the agents collectively cover the entire state-action space, unveiling the blessing of heterogeneity.
more » « less
Full Text Available
Tackling heterogeneous traffic in multi-access systems via erasure coded servers

https://doi.org/10.1145/3492866.3549713

Choudhury, Tuhinangshu; Wang, Weina; Joshi, Gauri (October 2022, Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing)

Full Text Available
Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling

Khodadadian, Sajad; Sharma, Pranay; Joshi, Gauri; Maguluri Siva Theja (July 2022, International Conference on Machine Learning (ICML))

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling observations from the environment is usually split across multiple agents. However, transferring these observations from the agents to a central location can be prohibitively expensive in terms of the communication cost, and it can also compromise the privacy of each agent’s local behavior policy. In this paper, we consider a federated reinforcement learning framework where multiple agents collaboratively learn a global model, without sharing their individual data and policies. Each agent maintains a local copy of the model and updates it using locally sampled data. Although having N agents enables the sampling of N times more data, it is not clear if it leads to proportional convergence speedup. We propose federated versions of on-policy TD, off-policy TD and Q-learning, and analyze their convergence. For all these algorithms, to the best of our knowledge, we are the first to consider Markovian noise and multiple local updates, and prove a linear convergence speedup with respect to the number of agents. To obtain these results, we show that federated TD and Q-learning are special cases of a general framework for federated stochastic approximation with Markovian noise, and we leverage this framework to provide a unified convergence analysis that applies to all the algorithms.
more » « less
Full Text Available
Correlated Combinatorial Bandits for Online Resource Allocation

https://doi.org/10.1145/3492866.3549727

Gupta, Samarth; Zuo, Jinhang; Joe-Wong, Carlee; Joshi, Gauri; Yagan, Osman (January 2022, ACM International Symposium on Mobile Ad Hoc Networking and Computing)

Full Text Available
Job Dispatching Policies for Queueing Systems with Unknown Service Rates

https://doi.org/10.1145/3466772.3467047

Choudhury, Tuhinangshu; Joshi, Gauri; Wang, Weina; Shakkottai, Sanjay (July 2021, ACM MobiHoc: International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks)
null (Ed.)
In multi-server queueing systems where there is no central queue holding all incoming jobs, job dispatching policies are used to assign incoming jobs to the queue at one of the servers. Classic job dispatching policies such as join-the-shortest-queue and shortest expected delay assume that the service rates and queue lengths of the servers are known to the dispatcher. In this work, we tackle the problem of job dispatching without the knowledge of service rates and queue lengths, where the dispatcher can only obtain noisy estimates of the service rates by observing job departures. This problem presents a novel exploration-exploitation trade-off between sending jobs to all the servers to estimate their service rates, and exploiting the currently known fastest servers to minimize the expected queueing delay. We propose a bandit-based exploration policy that learns the service rates from observed job departures. Unlike the standard multi-armed bandit problem where only one out of a finite set of actions is optimal, here the optimal policy requires identifying the optimal fraction of incoming jobs to be sent to each server. We present a regret analysis and simulations to demonstrate the effectiveness of the proposed bandit-based exploration policy.
more » « less
Full Text Available
A Unified Approach to Translate Classical Bandit Algorithms to Structured Bandits

https://doi.org/10.1109/ICASSP39728.2021.9413628

Gupta, Samarth; Chaudhari, Shreyas; Mukherjee, Subhojyoti; Joshi, Gauri; Yagan, Osman (June 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
null (Ed.)
Full Text Available
Best-Arm Identification in Correlated Multi-Armed Bandits

https://doi.org/10.1109/JSAIT.2021.3082028

Gupta, Samarth; Joshi, Gauri; Yagan, Osman (June 2021, IEEE Journal on Selected Areas in Information Theory)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records