This paper considers online optimization of a renewalreward system. A controller performs a sequence of tasks backtoback. Each task has a random vector of parameters, called the task type vector, that affects the task processing options and also affects the resulting reward and time duration of the task. The probability distribution for the task type vector is unknown and the controller must learn to make efficient decisions so that timeaverage reward converges to optimality. Prior work on such renewal optimization problems leaves open the question of optimal convergence time. This paper develops an algorithm with an optimality gap that decays like O(1/√k), where k is the number of tasks processed. The same algorithm is shown to have faster O(log(k)/k) performance when the system satisfies a strong concavity property. The proposed algorithm uses an auxiliary variable that is updated according to a classic RobbinsMonro iteration. It makes online scheduling decisions at the start of each renewal frame based on this variable and the observed task type. A matching converse is obtained for the strongly concave case by constructing an example system for which all algorithms have performance at best Ω(log(k)/k). A matching Ω(1/√k) converse is also shown for the general casemore »
Fast Learning for Renewal Optimization in Online Task Scheduling
This paper considers online optimization of a renewalreward system. A controller performs a sequence of tasks backtoback. Each task has a random vector of parameters, called the \emph{task type vector}, that affects the task processing options and also affects the resulting reward and time duration of the task.
The probability distribution for the task type vector is unknown and the controller must learn to make efficient decisions so that time average reward converges to optimality. Prior work on such renewal optimization problems leaves open the question of optimal convergence time. This paper develops
an algorithm with an optimality gap that decays like $O(1/\sqrt{k})$, where $k$ is the number of tasks processed. The same algorithm is shown to have faster $O(\log(k)/k)$ performance when the system satisfies a strong concavity property. The proposed algorithm uses an auxiliary variable that is updated according to a classic
RobbinsMonro iteration. It makes online scheduling decisions at the start of each renewal frame based on this variable and on the observed task type.
A matching converse is obtained for the strongly concave case by constructing
an example system for which all algorithms have performance at best
$\Omega(\log(k)/k)$. A matching $\Omega(1/\sqrt{k})$ converse is also shown for the more »
 Award ID(s):
 1718477
 Publication Date:
 NSFPAR ID:
 10252441
 Journal Name:
 ArXivorg
 ISSN:
 23318422
 Sponsoring Org:
 National Science Foundation
More Like this


This paper proves an impossibility result for stochastic network utility maximization for multiuser wireless systems, including multiaccess and broadcast systems. Every time slot an access point observes the current channel states and opportunistically selects a vector of transmission rates. Channel state vectors are assumed to be independent and identically distributed with an unknown probability distribution. The goal is to learn to make decisions over time that maximize a concave utility function of the running time average transmission rate of each user. Recently it was shown that a stochastic FrankWolfe algorithm converges to utilityoptimality with an error of O(log(T)/T ), where T is the time the algorithm has been running. An existing O(1/T ) converse is known. The current paper improves the converse to O(log(T)/T), which matches the known achievability result. The proof uses a reduction from the opportunistic scheduling problem to a Bernoulli estimation problem. Along the way, it refines a result on Bernoulli estimation.

This paper considers the fundamental convergence time for opportunistic scheduling over timevarying channels. The channel state probabilities are unknown and algorithms must perform some type of estimation and learning while they make decisions to optimize network utility. Existing schemes can achieve a utility within ε of optimality, for any desired ε > 0, with convergence and adaptation times of O(1/ε^2). This paper shows that if the utility function is concave and smooth, then O(log(1/ε)/ε) convergence time is possible via an existing stochastic variation on the FrankWolfe algorithm, called the RUN algorithm. Next, a converse result is proven to show it is impossible for any algorithm to have convergence time better than O(1/ε), provided the algorithm has no a priori knowledge of channel state probabilities. Hence, RUN is within a logarithmic factor of convergence time optimality. However, RUN has a vanishing stepsize and hence has an infinite adaptation time. Using stochastic FrankWolfe with a fixed step size yields improved O(1/ε^2) adaptation time, but convergence time increases to O(1/ε^2), similar to existing driftpluspenalty based algorithms. This raises important open questions regarding optimal adaptation.

This paper considers the fundamental convergence time for opportunistic scheduling over timevarying channels. The channel state probabilities are unknown and algorithms must perform some type of estimation and learning while they make decisions to optimize network utility. Existing schemes can achieve a utility within ε of optimality, for any desired ε > 0, with convergence and adaptation times of O(1/ε^2). This paper shows that if the utility function is concave and smooth, then O(log(1/ε)/ε) convergence time is possible via an existing stochastic variation on the FrankWolfe algorithm, called the RUN algorithm. Next, a converse result is proven to show it is impossible for any algorithm to have convergence time better than O(1/ε), provided the algorithm has no a priori knowledge of channel state probabilities. Hence, RUN is within a logarithmic factor of convergence time optimality. However, RUN has a vanishing stepsize and hence has an infinite adaptation time. Using stochastic FrankWolfe with a fixed step size yields improved O(1/ε^2) adaptation time, but convergence time increases to O(1/ε^2), similar to existing driftpluspenalty based algorithms. This raises important open questions regarding optimal adaptation.

We present a weighted approach to compute a maximum cardinality matching in an arbitrary bipartite graph. Our main result is a new algorithm that takes as input a weighted bipartite graph G(A cup B,E) with edge weights of 0 or 1. Let w <= n be an upper bound on the weight of any matching in G. Consider the subgraph induced by all the edges of G with a weight 0. Suppose every connected component in this subgraph has O(r) vertices and O(mr/n) edges. We present an algorithm to compute a maximum cardinality matching in G in O~(m(sqrt{w} + sqrt{r} + wr/n)) time. When all the edge weights are 1 (symmetrically when all weights are 0), our algorithm will be identical to the wellknown HopcroftKarp (HK) algorithm, which runs in O(m sqrt{n}) time. However, if we can carefully assign weights of 0 and 1 on its edges such that both w and r are sublinear in n and wr=O(n^{gamma}) for gamma < 3/2, then we can compute maximum cardinality matching in G in o(m sqrt{n}) time. Using our algorithm, we obtain a new O~(n^{4/3}/epsilon^4) time algorithm to compute an epsilonapproximate bottleneck matching of A,B subsetR^2 and an 1/(epsilon^{O(d)}}n^{1+(d1)/(2d1)}) poly logmore »