skip to main content


Title: Bayesian Optimization for Task Offloading and Resource Allocation in Mobile Edge Computing
Recent years have witnessed the emergence of mobile edge computing (MEC), on the premise of a costeffective enhancement in the computational ability of hardware-constrained wireless devices (WDs) comprising the Internet of Things (IoT). In a general multi-server multi-user MEC system, each WD has a computational task to execute and has to select binary (off)loading decisions, along with the analog-amplitude resource allocation variables in an online manner, with the goal of minimizing the overall energy-delay cost (EDC) with dynamic system states. While past works typically rely on the explicit expression of the EDC function, the present contribution considers a practical setting, where in lieu of system state information, the EDC function is not available in analytical form, and instead only the function values at queried points are revealed. Towards tackling such a challenging online combinatorial problem with only bandit information, novel Bayesian optimization (BO) based approach is put forth by leveraging the multi-armed bandit (MAB) framework. Per time slot, by exploiting temporal information, the discrete offloading decisions are first obtained via the MAB method, and the analog resource allocation variables are subsequently optimized using the BO selection rule. Numerical tests validate the effectiveness of the proposed BO approach.  more » « less
Award ID(s):
2212318 2220292 2126052 2128593 2103256 2102312 1901134
NSF-PAR ID:
10424929
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Asilomar Conference on Signals Systems and Computers
Page Range / eLocation ID:
1086 to 1090
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We propose and evaluate a learning-based framework to address multi-agent resource allocation in coupled wireless systems. In particular we consider, multiple agents (e.g., base stations, access points, etc.) that choose amongst a set of resource allocation options towards achieving their own performance objective /requirements, and where the performance observed at each agent is further coupled with the actions chosen by the other agents, e.g., through interference, channel leakage, etc. The challenge is to find the best collective action. To that end we propose a Multi-Armed Bandit (MAB) framework wherein the best actions (aka arms) are adaptively learned through online reward feedback. Our focus is on systems which are "weakly-coupled" wherein the best arm of each agent is invariant to others' arm selection the majority of the time - this majority structure enables one to develop light weight efficient algorithms. This structure is commonly found in many wireless settings such as channel selection and power control. We develop a bandit algorithm based on the Track-and-Stop strategy, which shows a logarithmic regret with respect to a genie. Finally through simulation, we exhibit the potential use of our model and algorithm in several wireless application scenarios. 
    more » « less
  2. null (Ed.)
    While traditional economics assumes that humans are fully rational agents who always maximize their expected utility, in practice, we constantly observe apparently irrational behavior. One explanation is that people have limited computational power, so that they are, quite rationally, making the best decisions they can, given their computational limitations. To test this hypothesis, we consider the multi-armed bandit (MAB) problem. We examine a simple strategy for playing an MAB that can be implemented easily by a probabilistic finite automaton (PFA). Roughly speaking, the PFA sets certain expectations, and plays an arm as long as it meets them. If the PFA has sufficiently many states, it performs near-optimally. Its performance degrades gracefully as the number of states decreases. Moreover, the PFA acts in a ``human-like'' way, exhibiting a number of standard human biases, like an optimism bias and a negativity bias. 
    more » « less
  3. While traditional economics assumes that humans are fully rational agents who always maximize their expected utility, in practice, we constantly observe apparently irrational behavior. One explanation is that people have limited computational power, so that they are, quite rationally, making the best decisions they can, given their computational limitations. To test this hypothesis, we consider the multi-armed bandit (MAB) problem. We examine a simple strategy for playing an MAB that can be implemented easily by a probabilistic finite automaton (PFA). Roughly speaking, the PFA sets certain expectations, and plays an arm as long as it meets them. If the PFA has sufficiently many states, it performs near-optimally. Its performance degrades gracefully as the number of states decreases. Moreover, the PFA acts in a “human-like” way, exhibiting a number of standard human biases, like an optimism bias and a negativity bias. 
    more » « less
  4. null (Ed.)
    This paper focuses on player modeling in multiplayer adaptive games. While player modeling has received a significant amount of attention, less is known about how to use player modeling in multiplayer games, especially when an experience management AI must make decisions on how to adapt the experience for the group as a whole. Specifically, we present a multi-armed bandit (MAB) approach for modeling groups of multiple players. Our main contributions are a new MAB frame- work for multiplayer modeling and techniques for addressing the new challenges introduced by the multiplayer context, extending previous work on MAB-based player modeling to account for new group-generated phenomena not present in single-user models. We evaluate our approach via simulation of virtual players in the context of multiplayer adaptive exergames. 
    more » « less
  5. We study adaptive video streaming for multiple users in wireless access edge networks with unreliable channels. The key challenge is to jointly optimize the video bitrate adaptation and resource allocation such that the users' cumulative quality of experience is maximized. This problem is a finite-horizon restless multi-armed multi-action bandit problem and is provably hard to solve. To overcome this challenge, we propose a computationally appealing index policy entitled Quality Index Policy, which is well-defined without the Whittle indexability condition and is provably asymptotically optimal without the global attractor condition. These two conditions are widely needed in the design of most existing index policies, which are difficult to establish in general. Since the wireless access edge network environment is highly dynamic with system parameters unknown and time-varying, we further develop an index-aware reinforcement learning (RL) algorithm dubbed QA-UCB. We show that QA-UCB achieves a sub-linear regret with a low-complexity since it fully exploits the structure of the Quality Index Policy for making decisions. Extensive simulations using real-world traces demonstrate significant gains of proposed policies over conventional approaches. We note that the proposed framework for designing index policy and index-aware RL algorithm is of independent interest and could be useful for other large-scale multi-user problems. 
    more » « less