This paper considers online convex optimization (OCO) with stochastic constraints, which generalizes Zinkevich’s OCO over a known simple fixed set by introducing multiple stochastic functional constraints that are i.i.d. generated at each round and are disclosed to the decision maker only after the decision is made. This formulation arises naturally when decisions are restricted by stochastic environ- ments or deterministic environments with noisy observations. It also includes many important problems as special case, such as OCO with long term constraints, stochastic constrained convex optimization, and deterministic constrained con- vex optimization. To solve this problem, this paper proposes a new algorithm that achieves O(√T ) expected regret and constraint violations and O(√T log(T )) high probability regret and constraint violations. Experiments on a real-world data center scheduling problem further verify the performance of the new algorithm.
more »
« less
A Low Complexity Algorithm with O(sqrt(T)) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints
This paper considers online convex optimization over a complicated constraint set, which
typically consists of multiple functional constraints and a set constraint. The conventional
online projection algorithm (Zinkevich, 2003) can be difficult to implement due to the
potentially high computation complexity of the projection operation. In this paper, we relax
the functional constraints by allowing them to be violated at each round but still requiring
them to be satised in the long term. This type of relaxed online convex optimization
(with long term constraints) was first considered in Mahdavi et al. (2012). That prior
work proposes an algorithm to achieve O(sqrt(T)) regret and O(T^(3/4)) constraint violations for
general problems and another algorithm to achieve an O(T^(2/3)) bound for both regret and
constraint violations when the constraint set can be described by a nite number of linear
constraints. A recent extension in Jenatton et al. (2016) can achieve O(T^(max(theta, 1-theta)) regret
and O(T^(1-theta/2)) constraint violations where theta in (0,1). The current paper proposes a new
simple algorithm that yields improved performance in comparison to prior works. The new
algorithm achieves an O(sqrt(T)) regret bound with O(1) constraint violations.
more »
« less
- Award ID(s):
- 1718477
- NSF-PAR ID:
- 10195772
- Date Published:
- Journal Name:
- Journal of machine learning research
- Volume:
- 21
- Issue:
- 1
- ISSN:
- 1533-7928
- Page Range / eLocation ID:
- 1-24
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper considers online convex optimization (OCO) with stochastic constraints, which generalizes Zinkevich’s OCO over a known simple fixed set by introducing multiple stochastic functional constraints that are i.i.d. generated at each round and are disclosed to the decision maker only after the decision is made. This formulation arises naturally when decisions are restricted by stochastic environ- ments or deterministic environments with noisy observations. It also includes many important problems as special case, such as OCO with long term constraints, stochastic constrained convex optimization, and deterministic constrained con- vex optimization. To solve this problem, this paper proposes a new algorithm that achieves O(√T ) expected regret and constraint violations and O(√T log(T )) high probability regret and constraint violations. Experiments on a real-world data center scheduling problem further verify the performance of the new algorithm.more » « less
-
We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints. The goal is to select a sequence of ac- tions to minimize the regret without violating the safety constraints at any time step (with high probability). The parameters that specify the linear safety constraints are unknown to the algorithm. The algorithm has access to only the noisy observations of constraints for the chosen actions. We pro- pose an algorithm, called the Safe Online Projected Gradient Descent(SO-PGD) algorithm to address this problem. We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret O(T^2/3). While there are many algorithms for online convex optimization (OCO) problems with safety constraints avail- able in the literature, they allow constraint violations during learning/optimization, and the focus has been on characterizing the cumulative constraint violations. To the best of our knowledge, ours is the first work that provides an algorithm with provable guarantees on the regret, without violating the linear safety constraints (with high probability) at any time step.more » « less
-
We consider the online linear optimization problem, where at every step the algorithm plays a point x_t in the unit ball, and suffers lossmore » « less
for some cost vector c_t that is then revealed to the algorithm. Recent work showed that if an algorithm receives a "hint" h_t that has non-trivial correlation with c_t before it plays x_t, then it can achieve a logarithmic regret guarantee, improving on the classical sqrt(T) bound. In this work, we study the question of whether an algorithm really requires a hint at every time step. Somewhat surprisingly, we show that an algorithm can obtain logarithmic regret with just O(sqrt(T)) hints under a natural query model. We give two applications of our result, to the well-studied setting of optimistic regret bounds and to the problem of online learning with abstention. -
In this paper, we study a class of online optimization problems with long-term budget constraints where the objective functions are not necessarily concave (nor convex), but they instead satisfy the Diminishing Returns (DR) property. In this online setting, a sequence of monotone DR-submodular objective functions and linear budget functions arrive over time and assuming a limited total budget, the goal is to take actions at each time, before observing the utility and budget function arriving at that round, to achieve sub-linear regret bound while the total budget violation is sub-linear as well. Prior work has shown that achieving sub-linear regret and total budget violation simultaneously is impossible if the utility and budget functions are chosen adversarially. Therefore, we modify the notion of regret by comparing the agent against the best fixed decision in hindsight which satisfies the budget constraint proportionally over any window of length W. We propose the Online Saddle Point Hybrid Gradient (OSPHG) algorithm to solve this class of online problems. For W = T, we recover the aforementioned impossibility result. However, if W is sublinear in T, we show that it is possible to obtain sub-linear bounds for both the regret and the total budget violation.more » « less