skip to main content


Title: Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism
We study exploration in stochastic multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls. We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull, but might have reduced throughput due to nonlinear scaling. For example, in simulation-based scientific studies, an expensive simulation can be sped up by running it on multiple cores. This speed-up however, is partly offset by the communication among cores, which results in lower throughput than if fewer cores were allocated to run more trials in parallel. In this paper, we explore these trade-offs in two settings. First, in a fixed confidence setting, we need to find the best arm with a given target success probability as quickly as possible. We propose an algorithm which trades off between information accumulation and throughput and show that the time taken can be upper bounded by the solution of a dynamic program whose inputs are the gaps between the sub-optimal and optimal arms. We also prove a matching hardness result. Second, we present an algorithm for a fixed deadline setting, where we are given a time deadline and need to maximize the probability of finding the best arm. We corroborate our theoretical insights with simulation experiments that show that the algorithms consistently match or outperform baseline algorithms on a variety of problem instances.  more » « less
Award ID(s):
1730628
NSF-PAR ID:
10310422
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the 38th International Conference on Machine Learning
Volume:
139
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Koyejo, S. ; Mohamed, S. ; Agarwal, A. ; Belgrave, D. ; Cho, K. ; Oh, A. (Ed.)
    In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the (ǫ, δ)-PAC setting: given a policy class Π the goal of the learner is to return a policy π ∈ Π whose expected reward is within ǫ of the optimal policy with probability greater than 1 − δ. We characterize the first instance-dependent PAC sample complexity of contextual bandits through a quantity ρΠ, and provide matching upper and lower bounds in terms of ρΠ for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle. 
    more » « less
  2. We consider the problem of dividing limited resources to individuals arriving over T rounds. Each round has a random number of individuals arrive, and individuals can be characterized by their type (i.e., preferences over the different resources). A standard notion of fairness in this setting is that an allocation simultaneously satisfy envy-freeness and efficiency. The former is an individual guarantee, requiring that each agent prefers the agent’s own allocation over the allocation of any other; in contrast, efficiency is a global property, requiring that the allocations clear the available resources. For divisible resources, when the number of individuals of each type are known up front, the desiderata are simultaneously achievable for a large class of utility functions. However, in an online setting when the number of individuals of each type are only revealed round by round, no policy can guarantee these desiderata simultaneously, and hence, the best one can do is to try and allocate so as to approximately satisfy the two properties. We show that, in the online setting, the two desired properties (envy-freeness and efficiency) are in direct contention in that any algorithm achieving additive counterfactual envy-freeness up to a factor of L T necessarily suffers an efficiency loss of at least [Formula: see text]. We complement this uncertainty principle with a simple algorithm, Guarded-Hope, which allocates resources based on an adaptive threshold policy and is able to achieve any fairness–efficiency point on this frontier. Our results provide guarantees for fair online resource allocation with high probability for multiple resource and multiple type settings. In simulation results, our algorithm provides allocations close to the optimal fair solution in hindsight, motivating its use in practical applications as the algorithm is able to adapt to any desired fairness efficiency trade-off. Funding: This work was supported by the National Science Foundation [Grants ECCS-1847393, DMS-1839346, CCF-1948256, and CNS-1955997] and the Army Research Laboratory [Grant W911NF-17-1-0094]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2397 . 
    more » « less
  3. null (Ed.)
    This paper introduces and studies a graph-based variant of the path planning problem arising in hostile environments. We consider a setting where an agent (e.g. a robot) must reach a given destination while avoiding being intercepted by probabilistic entities which exist in the graph with a given probability and move according to a probabilistic motion pattern known a priori. Given a goal vertex and a deadline to reach it, the agent must compute the path to the goal that maximizes its chances of survival. We study the computational complexity of the problem, and present two algorithms for computing high quality solutions in the general case: an exact algorithm based on Mixed-Integer Nonlinear Programming, working well in instances of moderate size, and a pseudo-polynomial time heuristic algorithm allowing to solve large scale problems in reasonable time. We also consider the two limit cases where the agent can survive with probability 0 or 1, and provide specialized algorithms to detect these kinds of situations more efficiently. 
    more » « less
  4. null (Ed.)
    Modern latency-sensitive and real-time systems often use multi-core platforms; thus, tasks on different cores share certain hardware resources, such as the memory bus and certain cache levels. This has two undesirable consequences: (1) tasks can interfere with each other, causing high latency for the system as a whole, and (2) it becomes difficult to meet deadlines, since the worst-case timing of a given task depends on the worst task it might have to compete with. Static partitioning isolates tasks from each other by allocating a certain fraction of the resources to each; however, many tasks execute in different phases (e.g., memory-intensive and CPU-intensive) that have different requirements. Thus, system designers are left with a choice between overprovisioning, based on the most demanding phase, or suboptimal performance. In this paper, we propose a pair of techniques, called DNA and DADNA, to address the above challenge. DNA increases throughput and decreases latency, by building an execution profile of each task to identify the phases, and then dynamically allocating resources based on which task can benefit the most; DADNA further adds support for soft real-time workloads by taking deadlines into account. We have built a prototype of both techniques in the Xen hypervisor; our experimental results show that, compared to a state-of-the-art solution, DNA and DADNA can substantially improve schedulability, reduce job deadline miss ratios, and cut latencies by more than a factor of two even in extremely overloaded situations. 
    more » « less
  5. Mikołaj Bojańczyk and Emanuela Merelli and David P. Woodruff (Ed.)
    The classical coding theorem in Kolmogorov complexity states that if an n-bit string x is sampled with probability δ by an algorithm with prefix-free domain then K(x) ≤ log(1/δ) + O(1). In a recent work, Lu and Oliveira [31] established an unconditional time-bounded version of this result, by showing that if x can be efficiently sampled with probability δ then rKt(x) = O(log(1/δ)) + O(log n), where rKt denotes the randomized analogue of Levin’s Kt complexity. Unfortunately, this result is often insufficient when transferring applications of the classical coding theorem to the time-bounded setting, as it achieves a O(log(1/δ)) bound instead of the information-theoretic optimal log(1/δ). Motivated by this discrepancy, we investigate optimal coding theorems in the time-bounded setting. Our main contributions can be summarised as follows. • Efficient coding theorem for rKt with a factor of 2. Addressing a question from [31], we show that if x can be efficiently sampled with probability at least δ then rKt(x) ≤ (2 + o(1)) · log(1/δ) +O(log n). As in previous work, our coding theorem is efficient in the sense that it provides a polynomial-time probabilistic algorithm that, when given x, the code of the sampler, and δ, it outputs, with probability ≥ 0.99, a probabilistic representation of x that certifies this rKt complexity bound. • Optimality under a cryptographic assumption. Under a hypothesis about the security of cryptographic pseudorandom generators, we show that no efficient coding theorem can achieve a bound of the form rKt(x) ≤ (2 − o(1)) · log(1/δ) + poly(log n). Under a weaker assumption, we exhibit a gap between efficient coding theorems and existential coding theorems with near-optimal parameters. • Optimal coding theorem for pKt and unconditional Antunes-Fortnow. We consider pKt complexity [17], a variant of rKt where the randomness is public and the time bound is fixed. We observe the existence of an optimal coding theorem for pKt, and employ this result to establish an unconditional version of a theorem of Antunes and Fortnow [5] which characterizes the worst-case running times of languages that are in average polynomial-time over all P-samplable distributions. 
    more » « less