skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Gradient Methods for Online DR-Submodular Maximization with Stochastic Long-Term Constraints
In this paper, we consider the problem of online monotone DR-submodular maximization subject to long-term stochastic constraints. Specifically, at each round $$t\in [T]$$, after committing an action $$\bx_t$$, a random reward $$f_t(\bx_t)$$ and an unbiased gradient estimate of the point $$\widetilde{\nabla}f_t(\bx_t)$$ (semi-bandit feedback) are revealed. Meanwhile, a budget of $$g_t(\bx_t)$$, which is linear and stochastic, is consumed of its total allotted budget $$B_T$$. We propose a gradient ascent based algorithm that achieves $$\frac{1}{2}$$-regret of $$\mathcal{O}(\sqrt{T})$$ with $$\mathcal{O}(T^{3/4})$$ constraint violation with high probability. Moreover, when first-order full-information feedback is available, we propose an algorithm that achieves $(1-1/e)$-regret of $$\mathcal{O}(\sqrt{T})$$ with $$\mathcal{O}(T^{3/4})$$ constraint violation. These algorithms significantly improve over the state-of-the-art in terms of query complexity.  more » « less
Award ID(s):
2149617
PAR ID:
10579448
Author(s) / Creator(s):
; ;
Publisher / Repository:
The Thirty-Eighth Annual Conference on Neural Information Processing Systems
Date Published:
Volume:
37
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cussens, James; Zhang, Kun (Ed.)
    We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step $$t$$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a $(1-1/e)$-regret upper bound of $$\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$$ for a horizon $$T$$, number of base elements $$n$$, and cardinality constraint $$k$$. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods. 
    more » « less
  2. This paper considers online convex optimization over a complicated constraint set, which typically consists of multiple functional constraints and a set constraint. The conventional online projection algorithm (Zinkevich, 2003) can be difficult to implement due to the potentially high computation complexity of the projection operation. In this paper, we relax the functional constraints by allowing them to be violated at each round but still requiring them to be satis ed in the long term. This type of relaxed online convex optimization (with long term constraints) was first considered in Mahdavi et al. (2012). That prior work proposes an algorithm to achieve O(sqrt(T)) regret and O(T^(3/4)) constraint violations for general problems and another algorithm to achieve an O(T^(2/3)) bound for both regret and constraint violations when the constraint set can be described by a nite number of linear constraints. A recent extension in Jenatton et al. (2016) can achieve O(T^(max(theta, 1-theta)) regret and O(T^(1-theta/2)) constraint violations where theta in (0,1). The current paper proposes a new simple algorithm that yields improved performance in comparison to prior works. The new algorithm achieves an O(sqrt(T)) regret bound with O(1) constraint violations. 
    more » « less
  3. We investigate the problem of unconstrained combinatorial multi-armed bandits with full-bandit feedback and stochastic rewards for submodular maximization. Previous works investigate the same problem assuming a submodular and monotone reward function. In this work, we study a more general problem, i.e., when the reward function is not necessarily monotone, and the submodularity is assumed only in expectation. We propose Randomized Greedy Learning (RGL) algorithm and theoretically prove that it achieves a $$\frac{1}{2}$$-regret upper bound of $$\Tilde{\mathcal{O}}(n T^{\frac{2}{3}})$$ for horizon $$T$$ and number of arms $$n$$. We also show in experiments that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings. 
    more » « less
  4. null; null; null; null; null (Ed.)
    We consider an online optimization problem in which the reward functions are DR-submodular, and in addition to maximizing the total reward, the sequence of decisions must satisfy some convex constraints on average. Specifically, at each round t, upon committing to an action x_t, a DR-submodular utility function f_t and a convex constraint function g_t are revealed, and the goal is to maximize the overall utility while ensuring the average of the constraint functions is non-positive (so constraints are satisfied on average). Such cumulative constraints arise naturally in applications where the average resource consumption is required to remain below a specified threshold. We study this problem under an adversarial model and a stochastic model for the convex constraints, where the functions g_t can vary arbitrarily or according to an i.i.d. process over time. We propose a single algorithm which achieves sub-linear regret (with respect to the time horizon T) as well as sub-linear constraint violation bounds in both settings, without prior knowledge of the regime. Prior works have studied this problem in the special case of linear constraint functions. Our results not only improve upon the existing bounds under linear cumulative constraints, but also give the first sub-linear bounds for general convex long-term constraints. 
    more » « less
  5. In this paper, we study a class of online optimization problems with long-term budget constraints where the objective functions are not necessarily concave (nor convex), but they instead satisfy the Diminishing Returns (DR) property. In this online setting, a sequence of monotone DR-submodular objective functions and linear budget functions arrive over time and assuming a limited total budget, the goal is to take actions at each time, before observing the utility and budget function arriving at that round, to achieve sub-linear regret bound while the total budget violation is sub-linear as well. Prior work has shown that achieving sub-linear regret and total budget violation simultaneously is impossible if the utility and budget functions are chosen adversarially. Therefore, we modify the notion of regret by comparing the agent against the best fixed decision in hindsight which satisfies the budget constraint proportionally over any window of length W. We propose the Online Saddle Point Hybrid Gradient (OSPHG) algorithm to solve this class of online problems. For W = T, we recover the aforementioned impossibility result. However, if W is sublinear in T, we show that it is possible to obtain sub-linear bounds for both the regret and the total budget violation. 
    more » « less