This paper considers utility optimal power control for energy harvesting wireless devices with a finite capacity battery. The distribution information of the underlying wireless environment and harvestable energy is unknown and only out- dated system state information is known at the device controller. This scenario shares similarity with Lyapunov opportunistic optimization and online learning but is different from both. By a novel combination of Zinkevich’s online gradient learning technique and the drift-plus-penalty technique from Lyapunov opportunistic optimization, this paper proposes a learning-aided algorithm that achieves utility within O(epsilon) of the optimal, for any desired epsilon > 0, by using a battery with an O(1/epsilon) capacity. The proposed algorithm has low complexity and makes power investment decisions based on system history, without requiring knowledge of the system state or its probability distribution.
more »
« less
Bregman-style Online Convex Optimization with EnergyHarvesting Constraints
This paper considers online convex optimization (OCO) problems where decisions are constrained by available energy resources. A key scenario is optimal power control for an energy harvesting device with a finite capacity battery. The goal is to minimize a time-average loss function while keeping the used energy less than what is available. In this setup, the distribution of the randomly arriving harvestable energy (which is assumed to be i.i.d.) is unknown, the current loss function is unknown, and the controller is only informed by the history of past observations. A prior algorithm is known to achieve $$O(\sqrtT )$$ regret by using a battery with an $$O(\sqrtT )$$ capacity. This paper develops a new algorithm that maintains this asymptotic trade-off with the number of time steps T while improving dependency on the dimension of the decision vector from $$O(\sqrtn )$$ to $$O(\sqrtłog(n) )$$. The proposed algorithm introduces a separation of the decision vector into amplitude and direction components. It uses two distinct types of Bregman divergence, together with energy queue information, to make decisions for each component.
more »
« less
- Award ID(s):
- 1718477
- PAR ID:
- 10252437
- Date Published:
- Journal Name:
- Proceedings of the ACM on Measurement and Analysis of Computing Systems
- Volume:
- 4
- Issue:
- 3
- ISSN:
- 2476-1249
- Page Range / eLocation ID:
- 1 to 25
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper considers utility optimal power control for energy harvesting wireless devices with a finite capacity battery. The distribution information of the underlying wireless environment and harvestable energy is unknown and only out- dated system state information is known at the device controller. This scenario shares similarity with Lyapunov opportunistic optimization and online learning but is different from both. By a novel combination of Zinkevich’s online gradient learning technique and the drift-plus-penalty technique from Lyapunov opportunistic optimization, this paper proposes a learning-aided algorithm that achieves utility within O(epsilon) of the optimal, for any desired epsilon> 0, by using a battery with an O(1/epsilon) capacity. The proposed algorithm has low complexity and makes power investment decisions based on system history, without requiring knowledge of the system state or its probability distribution.more » « less
-
This paper proves an impossibility result for stochastic network utility maximization for multi-user wireless systems, including multi-access and broadcast systems. Every time slot an access point observes the current channel states and opportunistically selects a vector of transmission rates. Channel state vectors are assumed to be independent and identically distributed with an unknown probability distribution. The goal is to learn to make decisions over time that maximize a concave utility function of the running time average transmission rate of each user. Recently it was shown that a stochastic Frank-Wolfe algorithm converges to utility-optimality with an error of O(log(T)/T ), where T is the time the algorithm has been running. An existing O(1/T ) converse is known. The current paper improves the converse to O(log(T)/T), which matches the known achievability result. The proof uses a reduction from the opportunistic scheduling problem to a Bernoulli estimation problem. Along the way, it refines a result on Bernoulli estimation.more » « less
-
null (Ed.)This paper considers online optimization of a renewal-reward system. A controller performs a sequence of tasks back-to-back. Each task has a random vector of parameters, called the \emph{task type vector}, that affects the task processing options and also affects the resulting reward and time duration of the task. The probability distribution for the task type vector is unknown and the controller must learn to make efficient decisions so that time average reward converges to optimality. Prior work on such renewal optimization problems leaves open the question of optimal convergence time. This paper develops an algorithm with an optimality gap that decays like $$O(1/\sqrt{k})$$, where $$k$$ is the number of tasks processed. The same algorithm is shown to have faster $$O(\log(k)/k)$$ performance when the system satisfies a strong concavity property. The proposed algorithm uses an auxiliary variable that is updated according to a classic Robbins-Monro iteration. It makes online scheduling decisions at the start of each renewal frame based on this variable and on the observed task type. A matching converse is obtained for the strongly concave case by constructing an example system for which all algorithms have performance at best $$\Omega(\log(k)/k)$$. A matching $$\Omega(1/\sqrt{k})$$ converse is also shown for the general case without strong concavity.more » « less
-
This paper considers online optimization of a renewal-reward system. A controller performs a sequence of tasks back-to-back. Each task has a random vector of parameters, called the task type vector, that affects the task processing options and also affects the resulting reward and time duration of the task. The probability distribution for the task type vector is unknown and the controller must learn to make efficient decisions so that time-average reward converges to optimality. Prior work on such renewal optimization problems leaves open the question of optimal convergence time. This paper develops an algorithm with an optimality gap that decays like O(1/√k), where k is the number of tasks processed. The same algorithm is shown to have faster O(log(k)/k) performance when the system satisfies a strong concavity property. The proposed algorithm uses an auxiliary variable that is updated according to a classic Robbins-Monro iteration. It makes online scheduling decisions at the start of each renewal frame based on this variable and the observed task type. A matching converse is obtained for the strongly concave case by constructing an example system for which all algorithms have performance at best Ω(log(k)/k). A matching Ω(1/√k) converse is also shown for the general case without strong concavity.more » « less